The large language models (LLMs) that have increasingly taken over the tech world are not “cheap” in many ways. The most prominent LLMs, GPT-4 for instance, took some $100 million to build because of the legal costs of accessing training data; computational power costs for what could be billions or trillions of parameters; the energy and water needed to fuel computation; and the many coders needed to develop the training algorithms that must run cycle after cycle so the machine will “learn.”
What if a researcher needs to do a specialized task that a machine could do more efficiently, but doesn’t have access to a large institution like Washington University in St. Louis that offers access to generative AI tools? Say, a parent wants to prep their child for a difficult test and needs to show many examples of how to solve complicated math problems?
Building one’s own LLM is an onerous prospect because of the numerous forementioned costs but also because making direct use of the big models like GPT-4 and Llama 3.1 might not immediately be suited for the complex reasoning in logic and math the task requires.
It would help if there were a more cost-effective version of a LLM thinker available to the masses, a generic brand for generative AI.
Researchers at WashU decided to tackle this challenge by building an autonomous agent to instruct the reasoning process of LLMs. This agent generates a single set of instructions for each task and those instructions turn out to be extremely effective at improving the reasoning process of different LLMs across all task instances, according to research from the lab of Chenguang Wang, an assistant professor of computer science and engineering at WashU’s McKelvey School of Engineering. The research was done in collaboration with Dawn Song, a professor at the University of California, Berkeley.
Researchers included WashU PhD students Nicholas Crispino and Kyle Montgomery and research analyst Fankun Zeng, who presented their work at a recent conference for machine learning.
This “agent” is a large LLM that serves as a tool to think over the instructions from the web, said Crispino. Given basic task information such as the dataset name, and a few input-only examples, the agent then produces high-quality, step-by-step instructions for tasks.
Those instructions guide the reasoning of the smaller LLMs on certain tasks. It’s a more affordable way to do generative AI because researchers only have to use the large LLM once per dataset, then they hand instructions over to a smaller LLM that can take over.
“We can use the expensive model once and make these nice instructions to guide the reasoning or thinking process of a cheaper model,” Crispino said.
“Our method boosts the performance of state-of-the-art large language models by a large margin,” Montgomery added.
They tested their cost-effective method, called Zero-Shot AgentInstruct, on language processing tasks and compared its performance with zero-shot prompting methods using LLMs Vicuna-13b, Llama-2-70b-chat and GPT-3.5 Turbo.
Compared with “zero-shot chain of thought” prompting, which works via adding the prompt “let’s think step by step,” Zero-Shot AgentInstruct showed better performance across a variety of tasks evaluated on 29 datasets (including 53 subsets).
“Our improvement in thinking and reasoning is striking, particularly in math and logic,” Wang said.
Essentially, they are making use of the powerful LLM models to distill tasks into step-by-step reasoning paths for the other model, like an experienced teacher sharing her or his knowledge with students.
“We’re seeing how far we can push the reasoning capabilities of smaller models using larger models without training,” Crispino said.
Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song and Chenguang Wang. Agent Instructs Large Language Models to be General Zero-Shot Reasoners. In The Forty-first International Conference on Machine Learning (ICML 2024).
This research was funded in part by a Summer Undergraduate Research Award from the Office of Undergraduate Research at Washington University.