We use cookies on this site to enhance your experience.
By selecting “Accept” and continuing to use this website, you consent to the use of cookies.
April 7, 2025
Print | PDFThere is increasing interest in using LLMs as decision-making "agents." Doing so includes many degrees of freedom: which model should be used; how should it be prompted; should it be asked to introspect, conduct chain-of-thought reasoning, etc? Settling these questions -- and more broadly, determining whether an LLM agent is reliable enough to be trusted -- requires a methodology for assessing such an agent's economic rationality. This talk describes one. We survey the economic literature on both strategic and non-strategic decision making, taxonomizing 124 fine-grained "elements" that an agent should exhibit, each of which can be tested in up to 3 distinct ways, grounded in up to 10 distinct domains, and phrased according to 5 perspectives (first-person, second-person, etc).
The generation of benchmark data across this combinatorial space is powered by a novel LLM-assisted data generation protocol that we dub auto-STEER, which generates questions by adapting handcrafted templates to new domains and perspectives. Because it offers an automated way of generating fresh questions, auto-STEER mitigates the risk that LLMs will be trained to overfit evaluation benchmarks; we thus hope that it will serve as a useful tool both for evaluating and fine-tuning models for years to come. Finally, we describe the results of a large-scale empirical experiment with 28 different LLMs, ranging from small open-source models to the current state of the art.
We examined each model's ability to solve problems across our whole taxonomy and present the results across a range of prompting strategies and scoring metrics.
Kevin Leyton-Brown, Distinguished University Scholar and Professor Computer Science and at the University of British Columbia, holds a Canada CIFAR AI Chair at the Alberta Machine Intelligence Institute, is an associate member of the Vancouver School of Economics. He is a Fellow of the Royal Society of Canada (RSC; awarded in 2023), the Association for Computing Machinery (ACM; awarded in 2020), and the Association for the Advancement of Artificial Intelligence (AAAI; awarded in 2018).
He was a member of a team that won the 2018 INFORMS Franz Edelman Award for Achievement in Advanced Analytics, Operations Research and Management Science, described as "the leading O.R. and analytics award in the industry." He holds a PhD and M.Sc. from Stanford University (2003; 2001) and a B.Sc. from McMaster University (1998). He studies artificial intelligence and machine learning with a focus on connections both to microeconomic theory and to the design of algorithms for hard combinatorial problems.
He is the Director of UBC's Center for AI Decision-making and Action and has been a visiting professor at Harvard, Berkley, Stanford, and Microsoft Research New York, and in several countries including Israel. He has received multiple research awards including from Amazon, Facebook and Google.