OpenAI has announced a breakthrough in its latest advancements in artificial intelligence, claiming that its next-generation models o3 and o3-mini, designed to simulate reasoning at unprecedented levels.
These models, successors to o1, were revealed by CEO Sam Altman last week during OpenAI's last event of the "12 Days of OpenAI" series. OpenAI named this new model family "o3" instead of "o2" to avoid trademark conflicts with British telecom provider O2, he said.
The o3 models incorporate a new feature OpenAI calls "private chain of thought," enabling the AI to internally deliberate and plan responses - what the company refers to as "simulated reasoning" (SR), an approach beyond the capabilities of traditional large language models (LLMs).
More to read:
Valued at $150 billion, yet OpenAI is a loss-generating enterprise
The o3 model achieved groundbreaking scores on the ARC-AGI benchmark, a visual reasoning test that had gone unbeaten since its inception in 2019. With scores of 75.7% in low-compute scenarios and 87.5% in high-compute settings, the model demonstrated human-level performance (85%) on the benchmark.
Other achievements include:
• 96.7% on the 2024 American Invitational Mathematics Exam, missing only one question.
• 87.7% on the GPQA Diamond benchmark, covering advanced biology, physics, and chemistry.
• 25.2% on EpochAI's Frontier Math benchmark, surpassing all previous models, none of which scored above 2%.
More to read:
OpenAI’s latest chatbot recommends nuclear strike in simulated conflict
OpenAI also debuted o3-mini, a scaled-down variant featuring an adaptive "thinking time" option that allows users to choose between low, medium, and high processing speeds. Higher compute settings deliver better results, and o3-mini has already outperformed the o1 range on the Codeforces benchmark, a popular competitive programming test.
OpenAI's is not alone in its pursuit for AGI and development of SR models, ArsTechnica notes. Google's Gemini 2.0 Flash Thinking Experimental was announced just one day prior, while DeepSeek and Alibaba have also introduced their own reasoning models, DeepSeek-R1 and QwQ, respectively.
More to read:
As OpenAI departs from its initial mission, AI scientist warns it will become an Orwellian project
These SR models represent a shift from traditional AI development. Rather than relying solely on training improvements - which have shown diminishing returns - SR models focus on iterative "chain of thought" processes at runtime. This approach allows for scalable, brute-force reasoning capabilities during inference.
OpenAI plans to release o3-mini to the public in late January, with the full o3 model following shortly after. For now, the models are available exclusively to researchers for safety testing and evaluation.
A video record of OpenAI’s event on YouTube.
***
NewsCafe is an independent outlet that cares about big issues. Our sources of income amount to ads and donations from readers. You can support us via PayPal: office[at]rudeana.com or paypal.me/newscafeeu, or https://buymeacoffee.com/newscafe . Any amount is welcome.