AI Classifies User Feedback with Embeddings

Feedback piles up fast. Keywords fail. But embeddings? They grasp 'dark mode hurts my eyes' as the same plea. A game-changer for solo devs.

Embeddings: The AI Trick That Tamed My Feedback Avalanche — theAIcatchup

Key Takeaways

  • Embeddings auto-group feedback by semantic meaning, crushing keyword limits in milliseconds.
  • Pgvector on Postgres + Voyage AI = cheap, scalable clustering without new DBs.
  • Threshold 0.85 hits sweet spot; regen names sparingly to slash costs.

Everyone figured sorting user feedback meant slogging through spreadsheets or hacking keyword matches—tedious, error-prone grunt work that stole hours from actual building. But this? AI classifying user feedback with embeddings flips the script. Suddenly, ‘Please add dark mode’ snuggles up next to ‘It hurts my eyes at night’ in the same smart cluster, zero keywords shared. It’s like giving your database a sixth sense for human intent.

Boom.

Picture it: You’re a solo dev, like the Lazy Developer behind FeedMission. You’ve shipped in 7 days (EP.04 magic), users love it, feedback floods in. At 10 items? Easy. At 50? Soul-crushing. Three pleas for dark mode, worded worlds apart. Manual grouping? Nope. Enter Claude’s whisper: embeddings.

And here’s the wonder—it’s not hype. This 188-line clustering.ts beast runs in the background, auto-groups via cosine similarity (threshold 0.85), even regenerates cluster names with Claude when they swell. Sentiment scores tag along (-1.0 to 1.0). All in your existing Postgres with pgvector. No fancy vector DB needed.

“Please add dark mode” and “It hurts my eyes at night” share zero words. But they’re the same request. How do you tell a computer that? You convert the sentence into 1024 numbers. Similar meanings produce similar numbers.”

That quote nails it. Embeddings from Voyage AI’s voyage-3-lite model spit out a 1024-float vector per feedback blurb. Store ‘em in Prisma’s Unsupported vector(1024) field (raw SQL for queries, since Prisma’s shy on pgvector). Then bam—cosine distance math in the DB finds kin in milliseconds.

Why Not Just Spam Claude with Pairwise Comparisons?

Sure, you could lob every duo at Claude: “Are these similar?” Works for 10 feedbacks. But 100? That’s 4,950 API calls. Wallet-draining nightmare, plus latency hell. Embeddings? Generate once, query forever. Parallelize with Promise.all—sentiment from Haiku alongside, dropping 500ms to 300ms. Efficiency porn.

Food delivery analogy crushes it here. Punch “late night cravings,” snag fried chicken or pork feet. Meaning over matches. That’s embeddings—semantic sorcery baked into numbers.

But wait, pitfalls lurk. Numbers trip it up: “3 buttons” vs. “30 buttons” vectors hug too close. Negations flop too—“Dark mode great” clusters with “terrible.” Feedback’s mostly requests, though, so 80% golden. Fix the 20% later. Pragmatic genius.

Code peek seals the deal:

// lib/ai/embeddings.ts
const response = await fetch('https://api.voyageai.com/v1/embeddings', {
  body: JSON.stringify({
    model: 'voyage-3-lite',
    input: ["Please add dark mode"],
  }),
})
// → [0.0234, -0.0891, 0.0412, ...] (1024 numbers)

Raw SQL query’s the muscle:

SELECT id, title,
1 - (embedding <=> $1::vector) as similarity
FROM "Feedback"
WHERE "projectId" = $2
AND id != $3  -- exclude self (important!)
ORDER BY embedding <=> $1::vector
LIMIT 5

Forgot that id != $3? Every feedback orphans into its own perfect-match cluster (similarity 1.0 to itself). Brutal trap—hours lost, fixed in one line. Classic vector search gotcha.

Threshold tinkering? Gold table from tests:

Threshold Result
0.70 Too loose—dark mode with generic UI tweaks
0.80 Borderline—CSV export hugs data download
0.85 Sweet spot—eye pain joins dark mode plea
0.90 Too tight—only twins match

0.85 it is. Best match with clusterId? Join. Else? Claude coins a fresh name/summary. Regen at multiples of 3—cost-smart.

How Does This Reshape Solo Dev Life?

This isn’t just a hack. It’s AI as your tireless product manager. Feedback auto-clusters, surfaces priorities (dark mode screams loudest), frees you to code. Supabase Postgres handles scale—no Pinecone tax. Hybrid Prisma + raw SQL? Battle-tested ugly that wins.

My unique spin: Remember Google’s leap from keyword regex to BERT embeddings? Same vibe. Early 2000s search was keyword hell—“jaguar speed” got cars, not cats. Embeddings unlocked intent. Here, feedback sorting echoes that shift. Bold prediction: In two years, every indie tool embeds feedback like this. AI agents will auto-prioritize roadmaps, A/B test summaries, even draft changelogs. Product management? Obsolete for solos. Platform shift, incoming.

Energy surges thinking about it—your app evolves itself from user whispers. Wonderstruck yet?

Traps aside, it’s deploy-ready. after() pattern pipelines it silent. 20 test feedbacks proved it. Real-world? FeedbackMission’s thriving.

Scale queen? Pgvector laughs at thousands—indexing cranks speed. Costs? Pennies per batch. Beats manual forever.

But corporate spin check: None here—Lazy Developer’s transparent, warts-out. No vaporware.

What Are Embeddings, Really, for Feedback?

Vectors capturing essence. Voyage AI nails lightweight accuracy. Claude Haiku sentiments add flavor—neutral pleas vs. frustrated rants. Promise.all parallelism? Chef’s kiss.

Wander a sec: Why pgvector over Pinecone? You’re Supabase’d already. No migration pain. Hybrid Prisma? Annoying, but functional. Future Prisma support? Fingers crossed.

This blueprint? Fork it. Tweak threshold. Ship faster.


🧬 Related Insights

Frequently Asked Questions

What does AI classifying user feedback with embeddings mean?

It turns text into number clouds (vectors), groups similar meanings via DB math—no keywords needed. Handles paraphrases like a human.

How to set up pgvector for feedback clustering?

Enable extension in Postgres/Supabase, add Unsupported vector field in Prisma, query with cosine distance SQL. Exclude self-matches!

Will embeddings replace manual feedback review?

Covers 80-90%, flags outliers for you. Perfect for scaling indies—AI your first PM.

Aisha Patel
Written by

Former ML engineer turned writer. Covers computer vision and robotics with a practitioner perspective.

Frequently asked questions

What does AI classifying user feedback with embeddings mean?
It turns text into number clouds (vectors), groups similar meanings via DB math—no keywords needed. Handles paraphrases like a human.
How to set up pgvector for feedback clustering?
Enable extension in Postgres/Supabase, add Unsupported vector field in Prisma, query with cosine distance SQL. Exclude self-matches!
Will embeddings replace manual feedback review?
Covers 80-90%, flags outliers for you. Perfect for scaling indies—AI your first PM.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.