Exploring OpenAI’s Mission to Enable AI to Fulfill Any Task

Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he observed his colleagues launching ChatGPT, one of the quickest-growing products ever. Meanwhile, Lightman worked quietly with a team teaching OpenAI’s models to solve high school math competitions.

Currently, that team, MathGen, is pivotal to OpenAI’s leading efforts in creating AI reasoning models, which are the foundation behind AI agents that perform tasks on a computer like humans.

“We aimed to enhance the models’ abilities in mathematical reasoning, which was a weakness at the time,” Lightman stated in an interview with TechCrunch, explaining MathGen’s early goals.

Although OpenAI’s models are not perfect — the latest AI systems still show inaccuracies and the agents have difficulties with complex tasks.

However, OpenAI’s advanced models have significantly improved their mathematical reasoning capabilities. One of OpenAI’s models recently earned a gold medal at the International Math Olympiad, an elite high school math competition. OpenAI anticipates these reasoning skills will extend to other areas, ultimately leading to the general-purpose agents the company has always envisioned.

ChatGPT emerged accidentally as a low-profile research preview that turned into a viral consumer enterprise; however, the development of OpenAI’s agents is the result of a deliberate, extensive effort.

“Eventually, you’ll just request something from the computer, and it’ll execute all these tasks for you,” said OpenAI CEO Sam Altman at OpenAI’s first developer conference in 2023. “The potential benefits are immense.”

Whether these agents will fulfill Altman’s vision is still unknown, but OpenAI amazed the world with the release of its first AI reasoning model, o1, in late 2024. Less than a year later, the 21 foundational researchers behind this advancement became highly sought-after talents in Silicon Valley.

Mark Zuckerberg recruited five researchers from the o1 team for Meta’s superintelligence unit, with some compensation offers exceeding $100 million. One, Shengjia Zhao, was appointed chief scientist of Meta Superintelligence Labs.

The resurgence of OpenAI’s reasoning models and agents relates to a machine learning training technique known as reinforcement learning (RL), offering AI models feedback on their decisions’ accuracy in simulated environments.

RL has been employed for years. In 2016, a Google DeepMind AI system using RL, AlphaGo, gained global recognition after defeating a world champion in the board game Go.

Around this time, an early OpenAI employee, Andrej Karpathy, contemplated utilizing RL to develop an AI agent capable of operating a computer. Yet, it took OpenAI several years to create the necessary models and training techniques.

By 2018, OpenAI developed its initial large language model in the GPT series, trained on vast internet data using extensive GPU clusters. Although GPT models excelled at text processing, leading to ChatGPT, they struggled with basic math.

It wasn’t until 2023 that OpenAI achieved a breakthrough known initially as “Q*” and later “Strawberry,” by integrating LLMs, RL, and test-time computation. The latter allowed models additional time and computing power to plan, verify their work, and reason through problems before responding.

This enabled OpenAI to present a new method called “chain-of-thought” (CoT), enhancing AI’s performance on unfamiliar math questions.

“I noticed the model beginning to reason,” remarked El Kishky. “It recognized errors and retraced steps or got frustrated, resembling human thought processes.”

Although these techniques were not individually new, OpenAI’s unique combination fostered the creation of Strawberry, paving the way for the o1 development. OpenAI quickly realized that AI reasoning models’ planning and fact-checking capabilities were beneficial for empowering AI agents.

“We resolved an issue I had been tackling for years,” recounted Lightman. “It was one of the highlights of my research career.”

With AI reasoning models, OpenAI identified two new axes to enhance AI models: employing more computational power during AI models’ post-training and allowing more time and processing power when models address a question.

“OpenAI focuses a lot on future scalability,” said Lightman.

Following the 2023 Strawberry breakthrough, OpenAI formed an “Agents” team led by Daniel Selsam to advance this new approach, sources revealed to TechCrunch. While the team was named “Agents,” OpenAI initially didn’t distinguish much between reasoning models and agents. The aim was to develop AI systems capable of executing complex tasks.

Eventually, Selsam’s team was integrated into a broader project to develop the o1 reasoning model, involving leaders like OpenAI’s co-founder Ilya Sutskever, chief research officer Mark Chen, and chief scientist Jakub Pachocki.

OpenAI needed to allocate critical resources — particularly talent and GPUs — to create o1. Throughout OpenAI’s history, researchers negotiated with leaders for resources; demonstrating breakthroughs was an effective way to secure them.

“OpenAI’s research is very much

HRPX – Smarter News. For a Smarter World.

Exploring OpenAI’s Mission to Enable AI to Fulfill Any Task

Leave a Reply Cancel reply