AI Hardware
Meta's GDPA Kernels Deliver 2x RecSys Training Speedups
Meta engineers just unveiled GDPA kernels that slash training times for massive RecSys models. Up to 3.5x forward speedups on production traffic—real numbers from B200 clusters.