AI Hardware
Torch.compile Crushes SOTA Normalization Speeds on H100 and B200
What if your PyTorch models trained as blazingly fast as custom kernels? Torch.compile's latest tweaks deliver SOTA normalization performance on H100 and B200, closing the gap with hyper-optimized rivals like Quack.