theAIcatchup

TurboQuant: The Restaurant Hack That's Freeing Up AI's GPU Bloat

What if AI memory woes boiled down to a diner shorthand trick? TurboQuant's spin on KV cache compression promises gigabytes saved— but does it deliver without hallucinations?

5 min read 3 weeks, 6 days ago

🔧

AI Hardware

Meta's GDPA Kernels Deliver 2x RecSys Training Speedups

Meta engineers just unveiled GDPA kernels that slash training times for massive RecSys models. Up to 3.5x forward speedups on production traffic—real numbers from B200 clusters.

4 min read 4 weeks, 1 day ago

#gpu-optimization

TurboQuant: The Restaurant Hack That's Freeing Up AI's GPU Bloat

Meta's GDPA Kernels Deliver 2x RecSys Training Speedups