Bedrock AgentCore NAT Gateway Costs Warning

An idle voice agent on Bedrock AgentCore Runtime racked up 659 GB of inbound NAT Gateway traffic in six days — costing $29. Turns out, it's not WebRTC; it's relentless S3 pulls for warm pool VMs.

659 GB Through a NAT Gateway: The Idle Bedrock Agent That Ate $29 — theAIcatchup

Key Takeaways

  • Bedrock AgentCore's warm pool triggers massive S3 image pulls via NAT Gateway, even when idle — 659 GB in 6 days cost $29.
  • Fix instantly with a free S3 Gateway VPC Endpoint; mandate it for all private VPCs.
  • Architectural insight: AI agent low-latency demands hidden cloud cost traps — trim images, add endpoints proactively.

659 gigabytes. That’s the torrent of data that slammed through a NAT Gateway in just six days, all pinned to an innocent-looking Bedrock AgentCore Runtime agent tucked into a VPC.

A cost anomaly alert hit last week — $29 unexpectedly tagged to Elastic Block Store, but the real villain? NatGateway-Bytes. The setup? A fresh VPC for a voice agent needing outbound internet via NAT for WebRTC TURN relays. Suspect locked in. But proof first.

CloudWatch metrics painted the picture. BytesOutToDestination? A measly 2.1 GB total. Laughable. But BytesInFromDestination exploded:

Date Inbound through NAT
Mar 26 6.3 GB
Mar 27 240.3 GB
Mar 28 149.1 GB
Mar 29 149.8 GB
Mar 30 102.3 GB
Mar 31 15.0 GB
Apr 01 5.4 GB (partial)

Unbalanced flows. Not WebRTC. ActiveConnectionCount hummed at ~90 around the clock, spiking hourly between 850 MB and 430 MB. No users — CloudTrail showed zero InvokeAgentRuntime calls during peak days.

VPC Flow Logs clinched it. Top talkers? IPs like 52.216.58.42 hammering the NAT’s private IP at 10.0.0.144. Every one traced back to S3 in us-east-1. All inbound pulls, funneled through NAT.

Why Did an Idle Agent Pull 659 GB from S3?

Bedrock’s AgentCore Runtime keeps a warm pool of VMs for snappy starts. Each VM yanks the container image from ECR — and ECR layers live in S3. That image? 435 MB compressed.

AgentCore Runtime maintains a warm pool of VMs to ensure low-latency invocations. Each VM in the pool pulls the container image from ECR — and ECR stores image layers in S3.

21 UpdateAgentRuntime calls on March 27 (debug hell) kicked off re-provisioning waves — 10 VMs per round, multiple rounds. Boom, 240 GB spike. Then steady recycling kept ~150 GB/day flowing. After 72 idle hours, downscale to 1 VM — traffic drops to 15 GB.

Expected behavior, says the service team. But here’s my take: this echoes the wild west of early EC2, when EBS snapshots silently drained wallets before everyone learned VPC endpoints. Bedrock’s pitching low-latency AI agents — yet the architecture bets your bill on forgetting S3 basics.

Punchy fix? S3 Gateway VPC Endpoint. Free. No hourly fees, no data processing. Routes S3 traffic internally.

resource "aws_vpc_endpoint" "s3" {
  vpc_id = aws_vpc.main.id
  service_name = "com.amazonaws.${var.aws_region}.s3"
  route_table_ids = [
    aws_route_table.private.id,
    aws_route_table.public.id,
  ]
}

Terraform apply. Costs vanish.

Is Every VPC-Only Bedrock Setup a Cost Trap?

Look, if you’re spinning VPCs with private subnets and NAT for Bedrock agents — or any workload hitting S3/ECR — you’re rolling dice without this endpoint. It’s not just Bedrock; it’s any container runtime oblivious to your network.

But dig deeper. AgentCore’s warm pool recycles aggressively — fresh VMs mean fresh pulls. Scale to production? 50 VMs? That’s gigabytes per cycle, times your redeploys. AWS won’t charge for the pulls (S3 inbound free), but NAT? $0.045/GB outbound-equivalent on inbound too. Scales nasty.

Corporate spin? Bedrock docs nod to VPC mode for security/isolation, but bury networking gotchas. They assume you’re a VPC wizard. Newcomers? Baited.

My prediction: AWS rolls out auto-endpoints or warm-pool image caching by Q3. Until then, script it into every VPC template. No excuses.

How Does Bedrock’s Warm Pool Architecture Really Work?

Warm pools aren’t new — Lambda warmed functions years ago. But here, it’s VM-scale for agent runtimes handling voice, WebRTC, stateful sessions. Each UpdateAgentRuntime triggers async reprovisioning. Idle? Still recycles to stay ‘fresh.’ Downscales after 72 hours, sure — but that’s three days of burn.

Container images bloat this. Trim yours ruthlessly — multi-stage builds, slim bases. But even 100 MB layers stack up across 10 VMs.

Test it yourself. Deploy a minimal AgentCore in VPC, monitor NAT metrics. Watch the S3 flood.

And that steady 90 connections? S3 keep-alives, likely. Relentless.

Broader shift: AI agents demand always-ready infra, but cloud pricing lags. VPC endpoints — gateway for S3, interface for others — are your shield. Mandate them in IaC. Saves souls, wallets too.

Why Does This Matter for AWS AI Devs?

Bedrock’s VPC mode unlocks enterprise wins — data exfiltration blocks, custom networking. But costs lurk in defaults. We’ve seen it before: NAT bills from misrouted logs, metrics. Now agents amplify.

Unique angle: this isn’t hype backlash; it’s architectural adolescence. Bedrock’s racing OpenAI territory, but infra maturity trails. Devs, bake endpoints day zero. Ops, alert on NAT bytes.

Production tip — pair with ECR VPC endpoints too. Interface type, but charged hourly. Still cheaper than NAT floods.

Word to AWS: document warm-pool S3 dependency boldly. Or auto-provision endpoints on VPC enable.


🧬 Related Insights

Frequently Asked Questions

What causes NAT Gateway costs in Bedrock AgentCore VPC?

Idle warm pool VMs pull container images from ECR (backed by S3) through NAT, spiking inbound bytes — up to 150 GB/day for 10 VMs.

How to fix S3 traffic costs in AWS VPC with NAT Gateway?

Add a free S3 Gateway VPC Endpoint to route tables; traffic bypasses NAT entirely. One Terraform resource.

Does Bedrock AgentCore warm pool always download from S3?

Yes, each VM reprovisions pulls ECR images stored in S3. Recycles periodically, even idle — downscales after 72 hours.

Sarah Chen
Written by

AI research editor covering LLMs, benchmarks, and the race between frontier labs. Previously at MIT CSAIL.

Frequently asked questions

What causes NAT Gateway costs in Bedrock AgentCore VPC?
Idle warm pool VMs pull container images from ECR (backed by S3) through NAT, spiking inbound bytes — up to 150 GB/day for 10 VMs.
How to fix S3 traffic costs in AWS VPC with NAT Gateway?
Add a free S3 Gateway VPC Endpoint to route tables; traffic bypasses NAT entirely. One Terraform resource.
Does Bedrock AgentCore warm pool always download from S3?
Yes, each VM reprovisions pulls ECR images stored in S3. Recycles periodically, even idle — downscales after 72 hours.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.