🤖 PyTorch Flight Recorder Exposes NCCL Watchdog Timeout Nightmares Your GPU cluster's humming, then—crash. NCCL watchdog timeout. PyTorch's new Flight Recorder turns black-box failures into crystal-clear diagnostics. 4 min read 3 weeks, 5 days ago