Large Language Models

use-local-llm: React Hooks for Local LLMs

You've got Ollama humming on localhost, but React integrations demand needless servers. Enter use-local-llm: a featherweight hook that bypasses the middleman for instant, private AI chats.

use-local-llm: The 2.8KB Hook Unlocking Local AI Straight in React Browsers — theAIcatchup

Key Takeaways

  • use-local-llm enables direct browser-to-local-LLM streaming with React hooks, skipping backends entirely.
  • At 2.8KB with zero deps, it prototypes faster than bloated SDKs like Vercel AI.
  • Prioritizes privacy and speed for Ollama/LM Studio users, signaling a client-first AI shift.

Your terminal spits back tokens from Gemma on localhost:11434, flawless via curl. But swap to your React app, and bam—every SDK screams for a backend you don’t need.

use-local-llm changes that. This tiny library—2.8 KB gzipped, zero dependencies—delivers React hooks that stream local LLMs right in the browser. No API routes. No Next.js cruft. Just fetch() to your Ollama, LM Studio, or llama.cpp server, tokens flowing back like they should.

Here’s the thing. We’ve been here before. Remember when SPAs exploded, and devs ditched server-rendered pages for client-side routing? Tools like React Router slashed the backend tax on prototypes. use-local-llm does the same for AI—local inference, unencumbered. My unique take: this isn’t just a hook; it’s the spark for an architectural pivot where browsers become AI inference engines, slashing cloud bills before they even hit your wallet.

Why Does Every AI SDK Force a Backend?

Vercel AI SDK rules production—adapters galore, scales like a dream. But it’s built for OpenAI’s moat: auth keys tucked server-side, usage tracked, traffic firewalled.

Vercel AI SDK requires an API layer. Your React app POSTs to your Next.js server, which then calls the LLM and streams back. This makes sense for production apps using OpenAI or Anthropic, because you need the backend for authentication, cost tracking, and security.

Spot on—for clouds. But your local Gemma? It’s already firewalled on your machine. That “layer” adds latency (browser → server → localhost), code (deploy a route), and mental overhead. Why prototype like it’s 2022?

And look—Vercel’s hype machine glosses over this. They pitch it as the React+AI standard, but that’s ecosystem lock-in dressed as best practice. Local devs get left prototyping in the stone age.

How use-local-llm Streams Without the Bloat

One hook. That’s the pitch, and it delivers.

function Chat() {
  const { messages, send, isStreaming } = useOllama("gemma3:1b");
  return (
    <div>
      {messages.map((m, i) => (
        <p key={i}>
          <strong>{m.role}:</strong> {m.content}
        </p>
      ))}
      <button onClick={() => send("Hello!")} disabled={isStreaming}>
        {isStreaming ? "Generating..." : "Send"}
      </button>
    </div>
  );
}

Boom. Messages persist. Streaming UI updates. Abort mid-token? Handled. All in ~3KB.

It auto-sniffs backends by port—11434 for Ollama, 1234 for LM Studio, 8080 for llama.cpp. Native protocols mean no translation tax; tokens hit optimal speed.

Want token-by-token? Plug in onToken. Model picker? useModelList() fetches your local roster, no config hell.

Feature Vercel AI SDK use-local-llm
Backend Yes No
Size 50KB+ 2.8KB
Privacy Data to cloud Stays local
Setup 10min+ 2min

Privacy wins huge here. No tokens phoning home—your prototype stays yours.

But dig deeper. The guts? Async generators (streamChat, streamGenerate) that play nice anywhere: React hooks, Vue, vanilla JS, even Node. Browser fetch() hits the endpoint, parses SSE or native streams, yields tokens. React’s useEffect + AbortController handles lifecycle. Simple. Elegant. No magic.

Is use-local-llm the Local AI Prototype Killer?

Yes—if you’re knee-deep in local models. Installs clean, no peer dep drama beyond React. Prototypes spin up in minutes: pick Gemma 1B for speed, Llama 3 for heft, all browser-direct.

Why now? Ollama’s explosion—easy pulls, GPU accel—makes local viable. But UIs lagged. This bridges it.

Prediction: expect forks for Svelte, Solid. Or browser-native inference via WebGPU. use-local-llm proves the blueprint—client-first AI, backend-optional.

Corporate spin check: Vercel won’t touch this; it undercuts their serverless pitch. Good. Competition sharpens everyone.

Short version? Ditch the proxy for local joy. Your React app deserves it.

Why Does This Matter for Local AI Devs?

Architectural shift underway. Cloud LLMs gatekeep with APIs; local ones beg for direct pipes. This hook exposes the why: browsers are Turing-complete enough for streaming UI + inference proxy.

Friction killed early React-AI hacks—CORS woes, stalled streams. Solved.

One caveat—localhost CORS. Ollama flips it with --api-host 0.0.0.0, but prod? VPN or tunnel it. Still, prototypes? Perfection.

And the insight original misses: this revives the hacker ethos. 2010s Node scripts chatting to APIs? Now browsers to local LLMs. Full-stack in one tab.


🧬 Related Insights

Frequently Asked Questions

What is use-local-llm and how do I install it?

It’s a zero-dep React library for streaming local LLMs in-browser. npm i use-local-llm—done.

Does use-local-llm work with Ollama and LM Studio?

Yep, auto-detects ports and protocols for Ollama (11434), LM Studio (1234), llama.cpp (8080).

Can I use use-local-llm in production apps?

Great for prototypes/privacy; pair with Vercel SDK for cloud scale.

Priya Sundaram
Written by

Hardware and infrastructure reporter. Tracks GPU wars, chip design, and the compute economy.

Frequently asked questions

What is use-local-llm and how do I install it?
It's a zero-dep React library for streaming local LLMs in-browser. `npm i use-local-llm`—done.
Does use-local-llm work with Ollama and LM Studio?
Yep, auto-detects ports and protocols for Ollama (11434), LM Studio (1234), llama.cpp (8080).
Can I use use-local-llm in production apps?
Great for prototypes/privacy; pair with Vercel SDK for cloud scale.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.