Everyone figured running a Linux binary was straightforward: execve syscall, load the code, jump. Simple, right? Dead wrong. This deep dive into how Linux executes binaries—straight from a 25-year kernel veteran’s notes—flips that script, exposing ELF’s guts, dynamic linking’s tricks, and the runtime machinations that keep your apps humming without bloating memory.
It’s not just trivia. Understanding this shifts how you debug crashes, optimize loaders, or even ponder alternatives like WebAssembly modules sneaking into Linux land.
“After 25 years working with Linux internals I wrote this article. It’s a deep dive into how Linux executes binaries, focusing on ELF internals and dynamic linking. Covers GOT/PLT, relocations, and what actually happens at runtime (memory mappings, syscalls, dynamic loader).”
That quote—from the source itself—nails it. No fluff.
What Happens When You Type ‘./myapp’?
Execve kicks off the party. But forget the fairy tale. Kernel grabs the file descriptor, mmaps it temporarily, scans for the ELF magic bytes (0x7f ‘E’ ‘L’ ‘F’). No match? Permission denied or wrong format—bash complains, you’re done.
ELF header next: 64 bytes dictating architecture (x86-64? ARM?), entry point, program headers count. Kernel iterates those phdrs—loadable segments mostly. Text (.text) gets RX (read-execute), data RW. Heap? Stack? Those come later, via brk/mmap.
But here’s the twist no newbie groks: dynamic binaries (most of ‘em) aren’t self-contained. They defer to ld.so, the dynamic loader. Kernel notices the PT_INTERP program header—say, /lib64/ld-linux-x86-64.so.2—mmaps that as the real interpreter. Your binary? Now auxv data passed to ld.so’s entry point.
Ld.so takes the baton. Parses your ELF again (yes, redundantly), applies relocations, resolves symbols via dlopen/dlsym under the hood. Memory mappings explode: vdso for fast syscalls, libc shared object loaded once system-wide.
Short version? It’s a handoff relay race.
Why Dynamic Linking? The Lazy Genius of GOT and PLT
Static linking bloats everything—each app drags its libc copy. Dynamic? Share one libc across processes. Elegant. But runtime symbol resolution? Costly if naive.
Enter GOT (Global Offset Table) and PLT (Procedure Linkage Table). PLT stubs—tiny trampolines—in your binary’s .plt section. First call to printf? Jumps to PLT entry, which hits ld.so’s resolver. Ld.so scans libraries’ .dynsym, .hash, fills GOT slot with real address. Next call? Direct jump, lazy binding magic.
Relocations seal it. R_X86_64_GLOB_DAT for absolute addresses, RELA for relatives. Ld.so crunches them post-load, before your main(). Miss a symbol? “undefined symbol: foo”—segfault city averted, just a loud whine.
And the unique insight nobody’s yelling about? This mirrors container orchestration avant la lettre. Shared libs = kernel namespaces; PLT indirection = service mesh proxies. Linux was microservices-ready decades ago—Docker just repackaged the plumbing.
Runtime Realities: Syscalls, Mappings, and Gotchas
Post-load, your binary’s stacked: ELF segments at fixed vaddrs (ASLR randomizes ‘em now, PIE-style), argv/envp on stack, auxv vector spilling heap pointers, AT_SYSINFO_EHDR for vdso.
First syscall? Often mmap for TLS (thread-local storage), then brk(0) sizing heap. Ld.so fakes your entry point till init/fini arrays run—constructors/destructors firing like C++ dtors.
Debug this mess? Strace reveals syscalls galore—mmap2, mprotect, munmap. Perf? Sample PLT hits during cold starts. GDB? Ld.so’s your new parent; set follow-fork-mode.
But pitfalls lurk. Position-Independent Executables (PIE) shuffle everything—great for security, hell for hardcoded offsets. Lazy binding bites multithreaded apps without _init locking. And symbol versioning? Glibc’s gold standard, preventing ABI breaks silently.
Is ELF Still King, or Facing Challengers?
Sure, WebAssembly creeps in via WASI, promising portable binaries sans OS quirks. But ELF? Baked into every distro, battle-tested across architectures. Bold prediction: even WASM-on-Linux will lean on ELF loaders for hybrid runtimes—think wasmtime embedding .so relics.
Critique time. Corporate PR (Red Hat, Canonical) spins “modern” as Rust crates or eBPF, glossing ELF’s endurance. It’s no hype—it’s infrastructure. Ignore at your peril.
Look, if you’re assembling Go bins or Rust exes, grok this. It explains why strip –strip-unneeded shrinks ‘em, why ldd lists deps, why LD_PRELOAD hacks env vars.
Why Does Mastering ELF Change How You Code?
Devs chase frameworks; this grounds you in metal. Next crash with “relocation truncated”? You’ll smirk, tweak your linker script. Optimize a daemon? Static-pie hybrid, minimal PLT. Embed in IoT? musl libc slims the dance.
And yeah, it’s architectural: dynamic linking’s indirection scales like cloud autoscaling—pay only for first resolution.
Shift happens when you see binaries not as black boxes, but layered cakes of headers, stubs, shared glory.
🧬 Related Insights
- Read more: Google ADK: Forging AI That Actually Crunches Your Taxes
- Read more: Go Observability: Logs First, Then Unlock Traces and eBPF – Grafana’s Blueprint
Frequently Asked Questions
What is ELF format in Linux?
ELF (Executable Linkable Format) structures binaries with headers dictating load segments, symbols, dynamics—kernel’s blueprint for execution.
How does dynamic linking work at runtime?
Ld.so resolves symbols lazily via PLT/GOT: first call proxies to loader, fills table, later calls direct—sharing code across processes.
Why use PLT and GOT in Linux binaries?
They enable runtime binding without upfront cost, supporting shared libraries and position independence for secure, efficient apps.