Ever wondered why your AI-generated patients emerge from Synthea with pristine records that crumble under real-world scrutiny?
Synthea modules. That’s the battleground. These JSON state machines—85 of ‘em stock—pump out encounters, conditions, labs, meds for diseases like diabetes or hypertension. Need celiac? Migraine? Write your own. Simple enough, until the codes hit.
SNOMED for conditions. LOINC for labs. RxNorm for pills. Feed ‘Claude, craft a celiac module,’ and boom: 396331005. Looks legit. Duodenal biopsy? 12866006. Validates? Sure. But plot twist—it’s pneumococcal vaccine, not biopsy. Hallucinated hell.
You can’t eyeball these seven-digit fakes. Only a terminology server knows. And that’s where this workflow flips the script.
Why Do Synthea Modules Fail So Spectacularly?
Synthea’s vanilla run? 10,000 patients, zero coronary heart disease, zero Alzheimer’s. Modules don’t talk— no comorbidities, no real-life cascades. An 80-year-old racks up 74 ‘conditions,’ mostly admin fluff, while top Medicare real-worlders hover at eight.
“You can’t tell a valid code from a hallucinated one by looking at it. The only way to know is to check it against a terminology server.”
That’s the raw truth from the source. LLMs pattern-match training scraps; they don’t query truth.
But here’s the fix: tx.fhir.org. Free FHIR server, no keys. Curl it.
curl -s "https://tx.fhir.org/r4/CodeSystem/\$validate-code?system=http://snomed.info/sct&code=396331005" | jq '.parameter[] | select(.name=="result" or .name=="display")'
Spits back: result true, display “Coeliac disease.” Boom. Wrong code? False.
Hunt codes too: filter ‘celiac disease,’ snag the right ones. Five lines, validated.
Can Claude Actually Build Valid Modules Now?
Enter the Claude Code skill. claude install github:mock-health/samples/synthea-module-skill. Then: claude "/synthea create a celiac disease module".
Six steps, automated:
Check Synthea’s 85—don’t duplicate.
Research prevalence, diagnostics, treatments.
Validate every code against tx.fhir.org.
Spit JSON per schema.
Build: ./gradlew build.
Run: ./run_synthea -m celiac -p 1.
Peek FHIR bundle.
Look at the table they validated:
| Concept | System | Code | Display |
|---|---|---|---|
| Celiac disease | SNOMED-CT | 396331005 | Coeliac disease |
| … | … | … | … |
Ironclad.
And the module skeleton? Dead simple.
{
"name": "Celiac Disease",
"states": {
"Initial": { "type": "Initial", "distributed_transition": [{"distribution": 0.01, "transition": "Onset"}, {"distribution": 0.99, "transition": "Terminal"}] },
"Onset": { "type": "ConditionOnset", "codes": [{ "system": "SNOMED-CT", "code": "396331005", "display": "Coeliac disease" }], "direct_transition": "Terminal" },
"Terminal": { "type": "Terminal" }
},
"gmf_version": 2
}
Scale it: Encounter for EGD (76009000), biopsy (235261009), gluten-free (160671006), labs like tTG IgA (LOINC 31017-7), ferrous sulfate RxNorm.
The Architectural Shift Hiding in Plain Sight
This isn’t just a hack—it’s the future of domain-specific code gen. Remember early JavaScript? No linters, devs shipping regex hallucinations that nuked prod. Then ESLint: validate before commit. Same here. Medical sims demand grounded generation—query servers inline, or bust.
Synthea’s isolationist modules? Fine for demos. But health AI—drug trials, privacy-safe training data—craves interaction. My bold call: within two years, every FHIR-adjacent LLM tool bundles terminology oracles like tx.fhir.org. No more ‘plausible poison.’ Mock Health’s skill? It’s the canary.
Corporate spin check: Synthea’s open-source purity shines, but those CDC benchmark fails scream ‘use with skepticism.’ This Claude bridge doesn’t fix core limits—it arms you to extend wisely.
Deeper why: Synthetic data’s exploding for LLMs fine-tuned on HIPAA walls. Valid modules mean realistic bundles—no corrupt FHIR crashing your validator. We’ve seen pilots where hallucinated SNOMEDs inflated prevalence 10x, skewing ML models.
Workflow scales. Fork the skill, tweak for your gap—GERD, long COVID. Or chain to Bedrock, GPTs. The curl’s universal.
Pitfalls? Skills.md nails ‘em: schema rigidity, transition gotchas (direct vs. distributed), state types (Guard for logic, Delay for timelines).
Why Developers Should Care About This Now
Health tech’s devtools vacuum. No Copilot for FHIR. This fills it. Skeptical? Run their 10k patients yourself—spot the voids. Then build one module. Feel the rush of validated output.
Unique angle: Parallels early compilers mandating symbol tables. LLMs were interpreters spewing syntax; now, they’re compilers with runtime checks. Health data’s next.
🧬 Related Insights
- Read more: Ditching Claude’s LLM Roulette: Why Duckflux Delivers Deterministic AI Pipelines
- Read more: Claude Code’s Dirty Secret: .claudeignore Stops the Node_Modules Madness
Frequently Asked Questions
What is a Synthea module and how do I make one with Claude?
Synthea modules are JSON state machines for generating synthetic patient FHIR data. Install the Claude skill from github:mock-health/samples/synthea-module-skill, then prompt “/synthea create [disease] module”—it validates all codes automatically.
How do I validate SNOMED or LOINC codes for Synthea?
Use tx.fhir.org: curl “https://tx.fhir.org/r4/CodeSystem/\$validate-code?system=[system]&code=[code]” and check ‘result’: true. Free, no auth.
Does Synthea generate realistic comorbidities?
No—modules run independently, so no interactions. 10k patients often miss major diseases like heart disease; extend with custom modules for better fidelity.