Observability in Go: Logs First

Dozens of panics hit Grafana's Go services daily—until logs transformed them into alertable metrics. Here's the no-BS path to observability that actually scales.

Grafana Labs engineers discussing observability strategies for Go applications on Big Tent podcast

Key Takeaways

  • Start observability in Go with stdlib logs—derive metrics like panic counts from them.
  • Use tracing for distributed systems; context propagation ties it together.
  • eBPF unlocks kernel visibility; pair with pprof for full-stack debugging.

Grafana Labs’ Go services log dozens of panics weekly, spiking alerts when unchecked.

But flip that: those same logs, parsed smartly, birth metrics you can graph and alarm on. No magic. Just logs doing heavy lifting.

That’s the opener from episode 8 of Grafana’s Big Tent podcast. Mat Ryer chats with Donia Chaiehloudj from Isovalent (Cisco), Charles Korn and Bryan Boreham from Grafana. They’re knee-deep in Go observability—where to kick off, pitfalls to dodge, and why observability in Go demands starting stupid-simple.

Look, Go’s no-frills ethos shines here. Forget bloated frameworks day one. These folks push stdlib logging. Why? Stability. No vendor lock, no rug-pull deprecations.

Donia nails it:

I would go simple to start. We know that we are always refactoring along the way and that priorities change, like real life. But I would try to go for the Go standard library as much as possible, because we know that it’s stable and not going to be archived tomorrow.

Spot on. Refactor pressure? Off the table when you’re not wedded to some hip library that ghosts contributors (open source’s dirty secret these days).

Why Logs Beat Everything Else in Go Observability

Charles Korn doubles down: logs everywhere, first.

Dump ‘em to console, file, Loki—whatever. Easy entry. Then? Derive metrics. Grafana does this with panics: regex the stack traces in logs, count ‘em, metric-ify. Boom—dashboard showing panic rates. Alerts fire if they climb.

We’ve got a bunch of Go services at Grafana Labs, and unfortunately, occasionally they panic, and they dump the trace to their logs and they get stuck to standard error, and they get picked up by our logging system. And it’s really useful to be able to show that on a graph—how often a thing’s panicking.

Here’s my take—and it’s one the pod skips: this mirrors Unix’s golden age. Pipes turned raw logs into grep-able gold. Go’s log/slog package? It’s the modern pipe. Predict this: by Go 2, expect native log-to-metric hooks in stdlib. Why fight it when simplicity wins?

But logs ain’t perfect. Go lacks typed errors—mostly strings. “File not found” could be anywhere. Parsing hell for aggregation. Still, start here. Context next.

Shove request IDs, spans into logs early. Stdlib’s context package—free, battle-tested. Ties chaos together without tracing overhead.

When Should You Actually Bother with Tracing in Go?

Bryan Boreham calls tracing the superpower. Parent-child links. Every span begins, ends. Pinpoints hot paths in 20-30 line scripts even.

His bar? Low. But scaling’s the trigger. Distributed systems—frontends pinging backends, microservices handoff. Pass trace context everywhere. Suddenly, one request’s journey lights up across services.

Tracing adds that explicit parent-child relationship, and everything’s always got a beginning and an end.

Fair. Logs scale linearly; traces explode in value with services. Problem? Setup friction. Logs: print, done. Traces: middleware, exporters, OTLP. People balk.

Pitfall Charles flags: Go panics dump traces to stderr anyway. use that—no extra code.

Profiling slots in too. pprof—stdlib gem. CPU, mem hotspots, no agents. But pair with traces for why a function hogs cycles.

eBPF: The Nuclear Option for Go Visibility

Systems grow hairy. App logs blind you to kernel guts, network stacks. Enter eBPF. Donia and Isovalent’s turf (they built Cilium on it).

eBPF probes without recompiles. Socket delays, syscall stalls—Go apps gain x-ray vision. But it’s advanced. Know your distro supports it (kernel 5.3+). Tools like Pixie or Tetragon lower the bar.

Why now? Go’s rise in cloud-native (Kubernetes, etcd) demands it. Logs/metrics/traces cover app-layer; eBPF fills the moat.

Corporate spin check: Grafana hypes Loki/Prometheus. Solid, but don’t swallow whole. OpenTelemetry’s span explosion? Trim ruthless—Go hates bloat.

Standardize early: keys for traceparent, baggage. Slog’s structured logs shine here. JSON? Loki loves it.

Scaling tip: sample traces (head-based, tail-based). 1% catches outliers without bankruptcy.

pprof war stories: Bryan profiles everything. Low bar pays off—spot goroutine leaks before prod melts.

Go’s error strings gripe again. Wrappers like pkg/errors or slog’s attrs help. Future: generics might type errors natively.

Real-world refactor: Start logs+context. Add metrics (Prometheus client_golang—stdlib-adjacent). Traces when latency puzzles emerge. eBPF for outages.

This stack? Architectural shift. Go favors composition over inheritance—observability too. No monolith collector. Pipe logs to Loki, metrics Prometheus, traces Tempo. Grafana unifies.

Bold call: Go devs ignoring observability? dinosaurs by 2026. Kubernetes mandates it. Simplicity scales.

Why Does Observability in Go Matter for Your Next Project?

Solo script? Logs suffice. Team app? Full stack. Cost? Negligible—stdlib zero. ROI? Panics-to-metrics alone saves weekends.

Donia’s lib picks: otel, but vetted. Avoid zombies.

Mat’s chill: imperfect start ok. Iterate.

Wrapping threads: observability in Go thrives on stdlib restraint. Logs bootstrap. Traces connect. eBPF conquers. Echoes Go’s manifesto—clear, efficient, boring-reliable.


🧬 Related Insights

Frequently Asked Questions

What is observability in Go?

It’s logs, metrics, traces, profiling fused to debug live systems. Starts with stdlib log/slog, scales to distributed tracing.

Should I start Go observability with logs or traces?

Logs. Dead simple, derive metrics from them. Traces shine in multi-service setups.

When to use eBPF for Go apps?

Kernel/network blind spots. Production outages where app signals fail—like Cilium users do.

Marcus Rivera
Written by

Tech journalist covering AI business and enterprise adoption. 10 years in B2B media.

Frequently asked questions

What is observability in Go?
It's logs, metrics, traces, profiling fused to debug live systems. Starts with stdlib log/slog, scales to distributed tracing.
Should I start Go observability with logs or traces?
Logs. Dead simple, derive metrics from them. Traces shine in multi-service setups.
When to use eBPF for Go apps?
Kernel/network blind spots. Production outages where app signals fail—like Cilium users do.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Grafana Blog

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.