AI Research
From 70% to 86% on MMLU: AI's Reasoning Leap—or Illusion?
OpenAI's GPT-4 hit 86.4% on MMLU—16 points above GPT-3.5—sparking claims of emergent reasoning. But dig into the data, and Theory of Mind tests reveal the cracks.