Zero dollars spent. That’s the lifetime cloud bill for a system scraping Brazilian cattle prices every morning — arroba de boi, novilhas — preserving a perfect temporal trail in Git commits.
And here’s the kicker: this isn’t some weekend hack that crumbled. It’s been humming along, daily, turning a GitHub repo into a zero-cost database that any broke dev (or cash-strapped startup) could steal tomorrow.
Look, we’ve all been there — staring at AWS pricing tiers, knowing a cron job plus a tiny RDS instance would do the trick, but your credit card’s gasping. This guy’s fix? Ditch it all. Use GitHub Actions to run Python scrapers at 9 AM UTC sharp, overwrite two JSON files with fresh quotes, then git commit and push. Boom — each day’s snapshot locked in git log, queryable forever via git bisect or log greps.
What the Hell is Git-as-a-Database?
Git wasn’t built for this. Linus Torvalds dreamed up a content-addressable filesystem for kernel patches, not price histories. But damn, does it fit. Immutability? Check. Chronological? Every commit timestamps itself. Cheap queries? git log –oneline spits diffs in seconds.
“Todo o histórico de preços da pecuária fica perfeitamente preservado no git log, onde cada commit diário atua como um snapshot exato daquele dia, pronto para ser consumido por análises de séries temporais.”
That’s from the original post — spot-on. Traditional DBs? Overkill for one JSON update a day. Why INSERT into Dynamo when a git push does the same, plus versioning baked in?
The pipeline’s a single YAML file, lean as a haiku. Triggers: cron at market open (0 9 * * *), push for deploys, workflow_dispatch for manual kicks if the site’s flaky.
Runner’s ubuntu-latest. Checkout code. Setup Python 3.9. Pip requirements. Fire scrapers: scraper_boi.py, scraper_novilha.py. Then the genius bit — git config bot identity, add JSONs, commit only if changed (git diff –quiet guards against empty pushes), push.
But wait — GitHub Actions runners are read-only by default. Can’t write without perms. Solution? Slap permissions: contents: write on the job. No tokens, no secrets. Just that.
It’s elegant. Efêmera infrastructure — spins up, scrapes, commits, vanishes. No 24/7 server tax.
Why Does GitHub Actions Beat Serverless for This?
Serverless sounds free-ish, but Lambda’s 400k free requests/month? Fine, but add Timestream or whatever for time-series, and poof — egress fees, provisioned junk. This? Pure free tier Actions (2000 minutes/month — your daily minute-long job laughs at that quota).
Here’s my unique angle, one the original skips: this echoes the 1970s Unix philosophy reborn. Back then, Ken Thompson et al. built tools that did one thing, chained via pipes — no bloat. Git-as-DB? Same vibe. Scrapers pipe to Git, Git pipes to your analysis scripts. No ETL cathedral; just porcelain commands. Predict this: in a world of $10B infra unicorns, we’ll see “GitDB” primitives explode for IoT sensors, personal finance trackers. Why? Constraints breed invention, and zero-budget kids like this one are the canaries.
But let’s poke holes — because skepticism’s my job. Scalability? One repo per dataset, sure, but 10k daily commits? GitHub chokes (rate limits, repo bloat). Queries? git log on 365 days is fine; 5 years? Fork it to BigQuery if needed. Security? Public repo means public prices — fine for cattle quotes, dicey for SSNs.
Still, for indie scraping, newsletters, side hustles — it’s gold.
The YAML guts, for copy-pasters:
name: Atualização Diária (Boi e Novilha) on: push: branches: [ main ] schedule: - cron: ‘0 9 * * *’ workflow_dispatch: jobs: raspagem-completa: runs-on: ubuntu-latest permissions: contents: write steps: - uses: actions/checkout@v3 - uses: actions/setup-python@v4 with: python-version: ‘3.9’ - run: pip install -r requirements.txt - run: python scraper_boi.py - run: python scraper_novilha.py - run: | git config –global user.name ‘Bot do Scraper’ git config –global user.email ‘[email protected]’ git add cotacoes_boi_hoje.json cotacoes_novilha_hoje.json git diff –quiet && git diff –staged –quiet || (git commit -m “Dados atualizados: $(date +’%Y-%m-%d’)” && git push)
Tweak scrapers to hit pecuary sites (Cepea, whatever), parse tables — done.
Can You Hack This for Your Own Data?
Absolutely. Weather station? Commit CSVs daily. Crypto prices? JSON snapshots. Stock tweets? Why not.
Extend it: post-commit, trigger a Pages site with Plotly charts from git log. Or webhooks to Slack. The repo’s your graph DB now — edges in commit messages.
Corporate spin? Nah, this is raw ingenuity, no VC fluff. But GitHub won’t push it — they want you on Codespaces, Copilot billing. Ignore ‘em.
Word count here clocks ~950, but the real win’s replication. Fork, run, profit.
🧬 Related Insights
- Read more: The Lie Detector Test Every Tech Leader Ignores: A 20-Year-Old MBA Hack Resurfaces
- Read more: Solo Dev’s Turna: A Lean Shift Calendar App That Actually Solves Rotas for Millions
Frequently Asked Questions
How do I query historical data from this Git DB?
Use git log –follow cotacoes_boi_hoje.json | grep date, or script jq on each commit’s file version via git show.
Does GitHub Actions free tier handle daily runs forever?
Yes — 2000 minutes/month covers thousands of 1-minute jobs. Private repos get 2000; public unlimited for OSS.
What if my scraper fails?
Manual dispatch via GitHub UI, or alerts via Actions notifications. Add error steps to Slack/email.