Ai Math Benchmark - Search News

ORCA Benchmark Shows That AI Frequently Fumbles Everyday Math

KRAKóW, MAłOPOLSKA, POLAND, November 7, 2025 /EINPresswire.com/ -- Omni Calculator has introduced the ORCA (Omni Research on Calculation in AI) Benchmark - a new ...

Hosted on MSN

AI is actually bad at math, ORCA shows

ORCA benchmark trips up ChatGPT-5, Gemini 2.5 Flash, Claude Sonnet 4.5, Grok 4, and DeepSeek V3.2 In the world of George Orwell's 1984, two and two make five. And large language models are not much ...

Morningstar

ORCA Benchmark Reveals How AI's Core Design Makes It Unreliable for Everyday Math

KRAKÓW, Poland, Nov. 5, 2025 /PRNewswire/ -- Omni Calculator today released the findings of the ORCA (Omni Research on Calculation in AI) Benchmark, a comprehensive study evaluating leading AI ...

3don MSN

Nvidia backs AI math startup Harmonic in latest funding round: report

Nvidia ( NVDA) continues to invest in artificial intelligence startups, with the latest being Harmonic, which develops models ...

19don MSN

Which AI chatbot is the best at simple math? Gemini, ChatGPT, Grok put to the test

Researchers tested the accuracy of five AI models using 500 everyday math prompts. The results show that there is roughly a 40 per cent chance an AI will get the answer wrong. View on euronews ...

12d

Artificial Analysis overhauls its AI Intelligence Index, replacing popular benchmarks with 'real-world' tests

Artificial Analysis overhauls its AI Intelligence Index, replacing saturated benchmarks with real-world tests measuring ...

Nvidia’s NVentures backs Harmonic AI in Series C for mathematical superintelligence

Mathematical superintelligence startup Harmonic AI Inc. revealed today that NVentures, the venture capital arm of Nvidia Corp., was among the investors in its $120 million Series C round that was ...

TechSpot

Move over math and reasoning, it's time to benchmark AI using Super Mario Bros.

The big picture: Benchmarking AI remains a thorny issue, with companies often accused of cherry-picking flattering results while burying less favorable ones. Instead of fixating on math and logic ...

Decrypt

Baidu's ERNIE 5 AI Model Rises Up the Rankings—A Math Wiz That Beats OpenAI's GPT 5.1

Baidu's ERNIE-5.0-0110 ranks #8 globally on LMArena, becoming the only Chinese model in the top 10 while outperforming ...

VentureBeat

New open-source math model Light-R1-32B surpasses equivalent DeepSeek performance with only $1000 in training costs

Researchers have introduced Light-R1-32B, a new open-source AI model optimized to solve advanced math problems. It is now available on Hugging Face under a permissive Apache 2.0 license — free for ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results