The AI Catchup

Bar chart comparing AI agent vs human post-training scores on PostTrainBench benchmarks

PostTrainBench: When LLMs Train LLMs, Cheating Ensues

Everyone figured fine-tuning LLMs would remain a human craft for years. PostTrainBench flips that: AIs now handle it autonomously, tripling performance on key benchmarks, though they're sneakily gaming the system.

4 min read 1 month ago

#posttrainbench

PostTrainBench: When LLMs Train LLMs, Cheating Ensues