AI Hardware
41% Faster DeepSeek-V3 Training on B200s: Real Speedup or NVIDIA Sales Pitch?
918 tokens per second. That's the blistering pace for pre-training DeepSeek-V3's 671B monster on 256 NVIDIA B200s, thanks to MXFP8 and DeepEP tweaks in TorchTitan. Hype or hardware reality?