Claude Writes Valid Synthea Modules Guide

LLMs spit out plausible medical codes that secretly wreck your Synthea simulations. Here's the dead-simple workflow—and Claude skill—that validates every one before it poisons your data.

Claude's Secret Weapon: Forging Bulletproof Synthea Modules Without Hallucinated Codes — theAIcatchup

Key Takeaways

  • LLMs hallucinate medical codes that look real but corrupt FHIR—fix with tx.fhir.org validation curls.
  • Claude Code skill automates full Synthea module creation: research, code lookup, build, test.
  • Synthea's limits (no comorbidities, over-diagnosed elders) demand custom modules for serious use.

Ever wondered why your AI-generated patients emerge from Synthea with pristine records that crumble under real-world scrutiny?

Synthea modules. That’s the battleground. These JSON state machines—85 of ‘em stock—pump out encounters, conditions, labs, meds for diseases like diabetes or hypertension. Need celiac? Migraine? Write your own. Simple enough, until the codes hit.

SNOMED for conditions. LOINC for labs. RxNorm for pills. Feed ‘Claude, craft a celiac module,’ and boom: 396331005. Looks legit. Duodenal biopsy? 12866006. Validates? Sure. But plot twist—it’s pneumococcal vaccine, not biopsy. Hallucinated hell.

You can’t eyeball these seven-digit fakes. Only a terminology server knows. And that’s where this workflow flips the script.

Why Do Synthea Modules Fail So Spectacularly?

Synthea’s vanilla run? 10,000 patients, zero coronary heart disease, zero Alzheimer’s. Modules don’t talk— no comorbidities, no real-life cascades. An 80-year-old racks up 74 ‘conditions,’ mostly admin fluff, while top Medicare real-worlders hover at eight.

“You can’t tell a valid code from a hallucinated one by looking at it. The only way to know is to check it against a terminology server.”

That’s the raw truth from the source. LLMs pattern-match training scraps; they don’t query truth.

But here’s the fix: tx.fhir.org. Free FHIR server, no keys. Curl it.

curl -s "https://tx.fhir.org/r4/CodeSystem/\$validate-code?system=http://snomed.info/sct&code=396331005" | jq '.parameter[] | select(.name=="result" or .name=="display")'

Spits back: result true, display “Coeliac disease.” Boom. Wrong code? False.

Hunt codes too: filter ‘celiac disease,’ snag the right ones. Five lines, validated.

Can Claude Actually Build Valid Modules Now?

Enter the Claude Code skill. claude install github:mock-health/samples/synthea-module-skill. Then: claude "/synthea create a celiac disease module".

Six steps, automated:

Check Synthea’s 85—don’t duplicate. Research prevalence, diagnostics, treatments. Validate every code against tx.fhir.org. Spit JSON per schema. Build: ./gradlew build. Run: ./run_synthea -m celiac -p 1. Peek FHIR bundle.

Look at the table they validated:

Concept System Code Display
Celiac disease SNOMED-CT 396331005 Coeliac disease

Ironclad.

And the module skeleton? Dead simple.

{
"name": "Celiac Disease",
"states": {
"Initial": { "type": "Initial", "distributed_transition": [{"distribution": 0.01, "transition": "Onset"}, {"distribution": 0.99, "transition": "Terminal"}] },
"Onset": { "type": "ConditionOnset", "codes": [{ "system": "SNOMED-CT", "code": "396331005", "display": "Coeliac disease" }], "direct_transition": "Terminal" },
"Terminal": { "type": "Terminal" }
},
"gmf_version": 2
}

Scale it: Encounter for EGD (76009000), biopsy (235261009), gluten-free (160671006), labs like tTG IgA (LOINC 31017-7), ferrous sulfate RxNorm.

The Architectural Shift Hiding in Plain Sight

This isn’t just a hack—it’s the future of domain-specific code gen. Remember early JavaScript? No linters, devs shipping regex hallucinations that nuked prod. Then ESLint: validate before commit. Same here. Medical sims demand grounded generation—query servers inline, or bust.

Synthea’s isolationist modules? Fine for demos. But health AI—drug trials, privacy-safe training data—craves interaction. My bold call: within two years, every FHIR-adjacent LLM tool bundles terminology oracles like tx.fhir.org. No more ‘plausible poison.’ Mock Health’s skill? It’s the canary.

Corporate spin check: Synthea’s open-source purity shines, but those CDC benchmark fails scream ‘use with skepticism.’ This Claude bridge doesn’t fix core limits—it arms you to extend wisely.

Deeper why: Synthetic data’s exploding for LLMs fine-tuned on HIPAA walls. Valid modules mean realistic bundles—no corrupt FHIR crashing your validator. We’ve seen pilots where hallucinated SNOMEDs inflated prevalence 10x, skewing ML models.

Workflow scales. Fork the skill, tweak for your gap—GERD, long COVID. Or chain to Bedrock, GPTs. The curl’s universal.

Pitfalls? Skills.md nails ‘em: schema rigidity, transition gotchas (direct vs. distributed), state types (Guard for logic, Delay for timelines).

Why Developers Should Care About This Now

Health tech’s devtools vacuum. No Copilot for FHIR. This fills it. Skeptical? Run their 10k patients yourself—spot the voids. Then build one module. Feel the rush of validated output.

Unique angle: Parallels early compilers mandating symbol tables. LLMs were interpreters spewing syntax; now, they’re compilers with runtime checks. Health data’s next.


🧬 Related Insights

Frequently Asked Questions

What is a Synthea module and how do I make one with Claude?

Synthea modules are JSON state machines for generating synthetic patient FHIR data. Install the Claude skill from github:mock-health/samples/synthea-module-skill, then prompt “/synthea create [disease] module”—it validates all codes automatically.

How do I validate SNOMED or LOINC codes for Synthea?

Use tx.fhir.org: curl “https://tx.fhir.org/r4/CodeSystem/\$validate-code?system=[system]&code=[code]” and check ‘result’: true. Free, no auth.

Does Synthea generate realistic comorbidities?

No—modules run independently, so no interactions. 10k patients often miss major diseases like heart disease; extend with custom modules for better fidelity.

Aisha Patel
Written by

Former ML engineer turned writer. Covers computer vision and robotics with a practitioner perspective.

Frequently asked questions

What is a Synthea module and how do I make one with Claude?
Synthea modules are JSON state machines for generating synthetic patient FHIR data. Install the Claude skill from github:mock-health/samples/synthea-module-skill, then prompt "/synthea create [disease] module"—it validates all codes automatically.
How do I validate SNOMED or LOINC codes for Synthea?
Use tx.fhir.org: curl "https://tx.fhir.org/r4/CodeSystem/\$validate-code
Does Synthea generate realistic comorbidities?
No—modules run independently, so no interactions. 10k patients often miss major diseases like heart disease; extend with custom modules for better fidelity.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.