绿色检查
链接复制到剪贴板

OpenAI o1: A New Series of OpenAI Models for AI Reasoning

Find out about the newly launched OpenAI o1 models and what makes them special. We'll also take a look at how they work and their impact on the future of AI.

The AI community has been buzzing with speculation about the next step for OpenAI’s GPT models, with many referring to it as “Project Strawberry.” The reason behind this is that if you prompt GPT-4o by asking how many R's are in the word "strawberry," it will tell you that there are two R's in the word "strawberry." It may seem strange, considering how powerful GPT-4o is. However, the model is built to process the subtext, not the exact words. It was rumored that the next model will aim to solve this. Sam Altman further fueled these rumors by posting pictures of strawberries on his X (formerly known as Twitter) account.

With OpenAI’s latest announcement on Thursday, September 12th, we finally have an answer to the speculation! OpenAI o1, a new series of AI models designed to slow down and think before responding, has been released. Interestingly, OpenAI o1 can reason better and answer the question about strawberries correctly! In this article, we’ll discuss what OpenAI o1 is, how it works, where it can be used, and what it means for the future of AI. Let’s get started!

Fig 1. An example of prompting OpenAI o1 about strawberries.

New Advancements in AI by OpenAI

In July 2024, OpenAI executives shared that OpenAI’s research is nearing a human level of problem-solving, referred to as level 2 of AI. It’s clear that this level focuses on reasoning, as OpenAI introduces its new model series, OpenAI o1, as thinking before it answers. OpenAI o1 is a new LLM (large language model), an AI model that understands and generates human-like text by learning patterns from massive amounts of language data. It has been designed to handle complex problems requiring in-depth reasoning. 

Fig 2. OpenAI’s Perspective on The Stages of AI.

The model has been trained using reinforcement learning, a technique where the model learns to make better decisions through trial and error by receiving rewards or penalties for its actions. The reinforcement learning algorithm helps the model think more effectively by following a chain of thought. OpenAI also shared that o1’s performance keeps improving with more reinforcement learning during training and with more time spent "thinking" during problem-solving, showing that both extended training and thoughtful processing help boost the model's abilities.

While OpenAI o1 is a significant advancement for complex reasoning, it is still an early model and lacks some features that make ChatGPT useful, such as browsing the web or uploading files and images. For many common tasks, GPT-4o might still be more capable for now. However, OpenAI o1 marks a big step forward in AI's ability to handle complex reasoning, which is why OpenAI is starting a new series and calling it OpenAI o1.

How the New OpenAI Models Enhance AI Reasoning

OpenAI o1 can be used for tasks like decoding ciphers, solving programming challenges, answering math problems, tackling crosswords, and even handling complex topics in science, safety, and healthcare. In an amusing nod to the project’s code name, OpenAI showed the model’s reasoning skills by cracking a cipher that revealed the message "THERE ARE THREE R’S IN STRAWBERRY." 

Beyond solving ciphers, OpenAI o1 is also skilled in coding. It performs well in competitive programming challenges like those on Codeforces, a platform where programmers solve complex coding problems under timed conditions. In these challenges, the model achieves high Elo ratings (a scoring system that measures skill levels based on performance against other competitors) and outperforms previous models. It also excels in math and performs well on exams like the American Invitational Mathematics Examination (AIME). 

Fig 3. Benchmarking o1’s Coding Abilities.

These advancements position OpenAI o1 as a significant upgrade from earlier models like GPT-4o. It opens up new possibilities for AI in areas such as business, development, research, and healthcare. For example, in genetics research, OpenAI o1 can quickly go through a large number of research papers, picking out key findings and connections between genetic markers and diseases. It understands complex scientific language and can summarize important points, helping researchers focus on the most relevant information. 

A Closer Look at the Chain of Thought

We saw earlier that OpenAI o1 introduces a "Chain of Thought" reasoning process. It enables the model to tackle complex problems in a manner similar to human cognitive strategies. The model can break down challenges into smaller, manageable steps and iteratively refine its approach. Unlike earlier models that relied on immediate pattern recognition, o1 optimizes its decision-making by exploring multiple reasoning paths, learning from both successes and mistakes through reinforcement learning.

OpenAI has decided to keep these raw chains of thought hidden from users, instead offering summaries that provide insight into the model's reasoning without exposing every step. This decision helps prevent the misuse of the model's thought process while still allowing developers to monitor and refine AI safety and alignment. By observing the hidden chains internally, developers can ensure that o1 adheres to ethical guidelines and avoids harmful behavior.

Benchmarking OpenAI o1

OpenAI o1 shows major improvements over GPT-4o in several benchmarks that test reasoning and problem-solving abilities. On the American Invitational Mathematics Examination (AIME) 2024, a challenging math exam for top high school students, o1 achieved a 74% accuracy rate with just one sample per problem, compared to GPT-4o's 12%. With consensus across 64 samples, its accuracy increased to 83%, and by using a refined re-ranking method with 1,000 samples, it reached 93%, placing it among the top 500 students nationally. 

Beyond math, o1 also performed exceptionally well on benchmarks testing scientific knowledge, like the GPQA Diamond, which covers PhD-level questions in chemistry, physics, and biology. Remarkably, o1 outperformed human experts with PhDs on this test, making it the first AI model to do so. It also outdid GPT-4o on 54 out of 57 categories in the MMLU benchmark, which tests understanding across a diverse set of subjects, including history, law, and science.

Fig 4. Benchmarking OpenAI o1.

Get Hands-on With OpenAI o1

OpenAI has introduced two new AI models in the o1 series: o1-preview and o1-mini. The o1-preview model is designed to think more deeply before responding, excelling at complex reasoning tasks in science, coding, and math. It offers advanced problem-solving capabilities for users tackling challenging projects. In contrast, o1-mini is a smaller, faster, and more cost-effective model optimized specifically for STEM reasoning, particularly math and coding. While it may have less broad world knowledge, o1-mini nearly matches o1-preview's performance on key evaluations like the AIME math competition and Codeforces coding challenges, all at 80% less cost.

Fig 5. Comparing OpenAI Models.

You can try out these models through various OpenAI platforms. ChatGPT Plus and Team users can access both o1-preview and o1-mini via the model picker, experiencing enhanced reasoning capabilities directly in ChatGPT. Developers with API usage tier 5 access can start prototyping with these models, though some advanced features are still in development. OpenAI also plans to make o1-mini available to all ChatGPT Free users soon. By exploring these models, you can experience firsthand the advancements in AI reasoning and choose the one that best fits your needs.

Ethical AI Considerations Made By OpenAI

OpenAI has focused on ethics and safety while developing the o1 model series. Before releasing the o1-preview and o1-mini models, they conducted thorough evaluations, including external tests and internal checks for risks such as disallowed content, hallucinations, and bias. The models are designed with advanced reasoning abilities to better understand and follow safety rules. 

OpenAI has also implemented safeguards like blocklists and safety classifiers to manage risks. The o1 model has a medium overall risk rating. It has low risks in areas like cybersecurity and model autonomy and medium risks in areas such as CBRN (Chemical, Biological, Radiological, and Nuclear) content and persuasion. OpenAI's Safety Advisory Group and Board have reviewed these safety measures to ensure the model is safe and ethical to use.

Fig 6. OpenAI o1 Scorecard.

From Rumors to Reality: OpenAI o1 Takes the Stage

OpenAI o1 is a big step forward in AI reasoning, turning some of the early rumors into reality. Unlike GPT-4o, the o1 series thinks more deeply by using a "Chain of Thought" approach, breaking down complex problems into smaller steps for better responses. Currently available as an early preview in ChatGPT and the API, OpenAI plans to add features like web browsing and file and image uploads. OpenAI also shared that they plan to keep developing and releasing models in the GPT series, alongside the new OpenAI o1 series. As AI continues to evolve, advancements like these are paving the way for more powerful, intuitive, and versatile AI systems that can better assist and understand human needs.

Keep up with the latest in AI by joining our community! Head over to our GitHub repository to see how we’re pioneering AI solutions in sectors such as manufacturing and healthcare. 🚀

Facebook 徽标Twitter 徽标LinkedIn 徽标复制链接符号

在此类别中阅读更多内容

让我们共同打造人工智能的未来

开始您的未来机器学习之旅