Large Language Models

Gemma 4 Launch: Google's Open AI Win

Everyone figured Google's next 'open' model would be another tease. Gemma 4? Apache licensed, runs like hell on your laptop – game actually changed.

Gemma 4 model benchmarks running on RTX 4090 with high tokens per second

Key Takeaways

  • Gemma 4's Apache 2.0 license enables true open development, with blazing local inference on consumer hardware.
  • Hermes Agent's harness innovations outperform models alone, shifting focus to engineering loops.
  • Ecosystem day-zero support signals Google's serious open pivot, but watch for subtle lock-ins.

Google’s Gemma 4 launch. That’s the bombshell everyone’s buzzing about – and no, it’s not your typical half-baked open-wash from Mountain View.

Folks expected the usual: flashy benchmarks on proprietary charts, weights that come with a leash. You know, like Gemma 2’s CC-BY-NC-SA nonsense that screamed ‘free but don’t touch.’ But this? Apache 2.0. Full stop. Day-zero ecosystem frenzy. Changes everything for devs sick of begging Sam Altman for scraps.

Look, @fchollet nailed it:

Google’s strongest open model yet and recommended the JAX backend in KerasHub.

François doesn’t mince words. And Demis Hassabis? He’s crowing about efficiency – claims it smokes models 10x bigger on their leaderboard. Smug? Sure. But early numbers back it.

Gemma 4: Runs on What Now?

RTX 4090 owners, rejoice. @basecampbernie clocked 162 tokens per second decode, 262K context on a single card – 19.5 GB VRAM. That’s not cloud fairy dust; that’s your rig turning into a beast.

Weaker metal? Mac Mini M4 with 16GB pulls 34 tok/s on the 26B A4B MoE. Phones, laptops – @kimmonismus says E4B tier shoves ‘useful AI’ right onto your pocket rocket. iPhone via Swift MLX? Done. TurboQuant slashes KV cache memory from 13GB to 5GB at 128K context. Speed hit? Minor. Worth it.

But here’s my unique dig: this echoes IBM’s 2000 Linux pivot. Back then, they dumped proprietary Unix to crush Microsoft – open ecosystem exploded. Google pulling this now? Desperation against xAI’s Grok openness, or real bet-the-farm? Watch for TPU lock-in creeping back; their ‘efficiency’ charts scream self-serving.

Arena bumps it to Pareto frontier. @arena spotted gains over Gemma 3 beyond mere scale. Critics gripe – normalize by FLOP or active params, damn it. Elo worship? Tired trope.

Ecosystem? vLLM, llama.cpp, Ollama, Intel Xe, Unsloth, Hugging Face – all lit up hour one. MoE visuals from @osanseviero? Chef’s kiss for architecture nerds.

Why Hermes Agent Is Stealing the Show

Agents. Everyone’s chasing the holy grail. OpenClaw? Cute, but flaky on marathons.

Enter Hermes Agent. Devs flipping like pancakes: @Zeneca, @Everlier swear it’s stabler, hungrier for long hauls. Korean deep-dive from @supernovajunn? Harness + memory loop crushes – autonomous skills, procedural recall, reliability spike.

Nous didn’t phone it in. Pluggable memory: Honcho, mem0, the works. Inline TUI diffs. Credential pools. Clean core. Plug your own backend – easy mode.

Big shift: harnesses rule. @Vtrivedy10’s model-harness loop? Trace data mines failures into tunings, tool tweaks. Raw fuel? Agent traces. Open models ‘good enough’ – now engineering laps them to frontier.

Demand for open harnesses? Exploding. Closed products feel like 90s Windows bloat.

Is the License Shift a Trap?

Apache 2.0. @ClementDelangue, @QuixiAI cheer ‘real’ open-weights. Downstream bliss.

Skeptic hat on. Google’s past: ‘open’ with gotchas. Multimodality, agents, on-device – sweet, but Google AI Studio collateral reeks of funnel to Vertex. Day-zero support shines, yet TPU bias lingers.

Benchmarks positive, not blind faith. 26B A4B MoE? Killer. 31B Pareto king. But presentation? @stochasticchasm demands fair comps. Fair.

Prediction: this sparks Europe AI boom. Pods with OpenClaw, Pi creators next week? Latent Space metrics already top-tier. AIE Europe livestream – hit that bell, algorithm fiends.

Harness matters more than ever. Gemma 4 + Hermes? Your local agentic workflow just woke up.

Corporate hype? Google spins ‘outperforms 10x larger.’ Cute chart. Real world: consumer hardware proofs steal the show.

And the pod love? Marc Andreessen episode crushing. Gemma reviews flood in. Latent Space’s AINews – search past issues, opt-out emails if you must.

Why Does This Matter for Open AI Devs?

Short answer: freedom. No more Llama Guard nonsense. Apache means fork, fine-tune, monetize – no NC handcuffs.

Local inference? Phones to 4090s. Edge computing explodes.

Agents? Hermes proves scaffolding > raw params. Trace loops close the gap.

Google’s play pressures OpenAI, Anthropic. Open season.

But call the spin: ‘defining release’? Ecosystem ready because they primed it. Still, kudos – better than vaporware.

Wrapping the week: subreddits, Twitters scanned. No Discords this round. Gemma 4 dominates.


🧬 Related Insights

Frequently Asked Questions

What is Gemma 4 and why the hype?

Google’s latest open model under Apache 2.0 – MoE design crushes reasoning, agents, multimodal on laptops and phones.

Is Gemma 4 truly open source?

Yes, Apache 2.0 weights. Fork away – unlike prior restrictive licenses.

Can Hermes Agent run Gemma 4 locally?

Absolutely. Pluggable memory, TUI tweaks make it a beast for long tasks on consumer gear.

James Kowalski
Written by

Investigative tech reporter focused on AI ethics, regulation, and societal impact.

Frequently asked questions

What is Gemma 4 and why the hype?
Google's latest open model under Apache 2.0 – MoE design crushes reasoning, agents, multimodal on laptops and phones.
Is Gemma 4 truly open source?
Yes, Apache 2.0 weights. Fork away – unlike prior restrictive licenses.
Can Hermes Agent run Gemma 4 locally?
Absolutely. Pluggable memory, TUI tweaks make it a beast for long tasks on consumer gear.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Latent Space

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.