When it comes to machine learning, human supervision is often the most expensive and time-consuming part of developing powerful models, such as the large-language models (LLMs) many people now use. Before ChatGPT could have a conversation with you or help plan your next vacation, high-quality annotated datasets and informed feedback from real humans were required to train the model.
This type of human input might be considered strong supervision, as humans provide supervision that is stronger than the capability of machine learning models. However, as machine learning becomes more powerful and surpasses human capabilities, human supervision might become weak in comparison to the performance of machine learning models.
Chien-Ju Ho, an assistant professor of computer science and engineering in the McKelvey School of Engineering at Washington University in St. Louis, received a $133,000 award from OpenAI to explore how and when weak human supervision might be used to improve strong machine learning models.
As an analogy, lower-rated chess players might find it more helpful to learn from players who are slightly better rather than from world-class players, as they are less likely to encounter the same board positions as the world-class players. Ho’s project investigates whether weak human supervision can be used in a similar way to improve the training of machine learning models.
Read more on the McKelvey Engineering website.