Advanced Yet Flawed: OpenAI's o3 and o4-mini Under Scrutiny

OpenAI o3 Image / Screenshot of Sam Altman OpenAI CEO X

OpenAI’s latest reasoning-based AI models, ChatGPT o3 and o4-mini, have shown a significant increase in hallucinations despite performance improvements. Hallucinations occur when AI provides false or irrelevant information as if it were true.

TechCrunch reported on Sunday that OpenAI’s internal benchmark test, PersonQA, revealed alarming hallucination rates: 33% for o3 and 48% for o4-mini.

These rates have more than doubled compared to their predecessors. The previous models, o1 and o3-mini, had hallucination rates of 16% and 14.8% respectively.

Surprisingly, o3 and o4-mini exhibited more frequent hallucinations than even the non-reasoning model GPT-4o.

On April 16, OpenAI unveiled the o3 and o4-mini, touting them as the most advanced reasoning models to date and the final standalone AI reasoning models for ChatGPT.

Both models excelled in mathematics, coding, and science tests. They demonstrated impressive performance in university-level problems involving image and text interpretation, with o3 achieving 82.9% accuracy and o4-mini reaching 81.6%.

In the SWE-benchmark test for coding skills, o3 and o4-mini scored 69.1% and 68.1% respectively, surpassing both the previous o3-mini (49.3%) and competitor AI Claude 3.7 Sonnet (62.3%).

However, experts warn that high hallucination rates could undermine the reliability of these improved models.

Transluce, a nonprofit AI research institute, found evidence suggesting o3 may manipulate tasks during its answer derivation process.

Sarah Schwettmann, Transluce’s co-founder, told TechCrunch that o3’s high hallucination rate could make it less practical than other versions.

OpenAI has yet to provide a clear explanation or solution for the high hallucination rates of o3 and o4-mini. The company acknowledged in a technical report that further research is necessary.

Tesla Dips Over 3% Amid Rising ‘Musk Risk’ and Declining European Sales

WORK OR DIE: Kim Jong Un Demands Citizens Sacrifice Their Lives For His ‘Unconditional’ Party Policies

Report: North Korea Strengthens Ties With China, Russia Ahead of Key Party Anniversaries

Advanced Yet Flawed: OpenAI’s o3 and o4-mini Under Scrutiny

Check Out Our Content

U.S. Nuclear Submarine Squadron to Deploy to Australia This Year Under AUKUS Pact

Privacy, Cheating Concerns Grow as AI Smart Glasses Gain Popularity; U.S. Lawmakers Push for Regulation

Four Iran Team Staff Granted U.S. Visas Ahead of World Cup, Football Federation President Still Denied Entry

Anthropic AI Export Curbs Tied to Concerns Over Potential China-Linked Access to Mythos Models

North Korea Condemns U.S. Missile Sale to South Korea, Says ‘Arms Exports Are War Exports’

Understanding the Mythos Shock: How Asia Can Build Its Own AI Sovereignty

Asia Market Soars: How the U.S.-Iran Peace Deal Influences Energy Prices and Inflation

Korean Fugitive Captured in Laos: The Shocking 8-Year Manhunt for a Double Murder Suspect

Yoon Seok Yeol Sentenced to 30 Years: What This Means for South Korea’s Political Landscape?

Most Popular Articles

U.S. Nuclear Submarine Squadron to Deploy to Australia This Year Under AUKUS Pact

Privacy, Cheating Concerns Grow as AI Smart Glasses Gain Popularity; U.S. Lawmakers Push for Regulation

Four Iran Team Staff Granted U.S. Visas Ahead of World Cup, Football Federation President Still Denied Entry

Anthropic AI Export Curbs Tied to Concerns Over Potential China-Linked Access to Mythos Models

North Korea Condemns U.S. Missile Sale to South Korea, Says ‘Arms Exports Are War Exports’

Understanding the Mythos Shock: How Asia Can Build Its Own AI Sovereignty

Asia Market Soars: How the U.S.-Iran Peace Deal Influences Energy Prices and Inflation

Korean Fugitive Captured in Laos: The Shocking 8-Year Manhunt for a Double Murder Suspect

Cars

Tech

future

health