theAIcatchup

From 70% to 86% on MMLU: AI's Reasoning Leap—or Illusion?

OpenAI's GPT-4 hit 86.4% on MMLU—16 points above GPT-3.5—sparking claims of emergent reasoning. But dig into the data, and Theory of Mind tests reveal the cracks.

5 min read 4 weeks, 1 day ago

#ai-reasoning-systems

From 70% to 86% on MMLU: AI's Reasoning Leap—or Illusion?