Large Language Models
LLM Black Box Cracked: Prefill, Decode, KV Cache Exposed
Everyone figured LLMs just 'think' magically. Wrong. Prefill slams your prompt parallel, decode drips tokens slow—KV cache glues it efficient, but it's a memory hog too.