#transformer-optimization

ThunderKittens 2.0 Unleashes Blazing GPU Kernels

ThunderKittens 2.0 isn't just faster—it's a blueprint for squeezing every flop from your GPU. Stanford's Hazy Research just rewrote the rules for transformer kernels.

5 min read 4 weeks ago

Illustration of KV cache reusing key and value vectors during LLM text generation

Large Language Models

KV Caches: The Hidden Speed Boost Powering Your Daily AI Chats

Next time your AI assistant spits out a response in seconds, thank the KV cache. It's quietly revolutionizing how we run massive language models without breaking the bank on compute.

5 min read 4 weeks, 1 day ago