The bill landed on a Tuesday: $29 in unexpected AWS charges, and the culprit was listed under Amazon Elastic Block Store. Except it wasn’t. The usage type told the real story — NatGateway-Bytes. Six hundred fifty-nine gigabytes had silently flowed through a NAT Gateway in a week, and the engineer who built it had no idea why.
This is the kind of cost anomaly that haunts cloud architects at 2 AM. It’s not catastrophic—$29 is pocket change—but it’s a symptom of something much larger: a hidden tax on running containerized workloads in AWS VPCs, and one that’s trivially preventable.
The Detective Work
The engineer had recently spun up a voice agent on Bedrock AgentCore Runtime in VPC mode, deploying it across a private subnet with a NAT Gateway handling outbound internet traffic (necessary for WebRTC TURN relay connectivity). The deployment was fresh. The cost spike was immediate. The connection seemed obvious, but the numbers didn’t add up.
CloudWatch metrics on the NAT Gateway revealed the real smoking gun: BytesOutToDestination showed only 2.1 GB total over six days. The outbound traffic was negligible. But BytesInFromDestination told a completely different story. On March 27 alone, 240.3 GB flowed inbound through the NAT Gateway.
Here’s where it gets weird. ActiveConnectionCount hovered around 90 connections, 24/7, even when nobody was using the agent. And the traffic pattern was metronomically regular—alternating between 850 MB and 430 MB per hour, around the clock. This wasn’t user activity. This was something automated. Something relentless.
CloudTrail confirmed it: zero InvokeAgentRuntime events during the heaviest traffic days. The agent was completely idle. Yet the data kept flowing.
VPC Flow Logs revealed the culprits. Seven IP addresses were responsible for nearly all the traffic:
52.216.58.42 → 10.0.0.144: 270.1 MB 16.15.207.229 → 10.0.0.144: 263.7 MB 16.15.191.63 → 10.0.0.144: 263.6 MB
When those IPs were cross-referenced against AWS’s official IP range database, they all resolved to Amazon S3 in us-east-1. Every gigabyte was S3 traffic. Not WebRTC. Not application logs. Just S3 pulling something repeatedly, over and over, through a NAT Gateway that shouldn’t have been in the path at all.
Why This Happened (And Why AgentCore Didn’t Tell You)
After filing a support case, AWS’s Bedrock team provided the explanation. AgentCore Runtime maintains a warm pool of virtual machines to ensure low-latency agent invocations. Think of it as a standing army of pre-warmed containers, ready to spring into action the moment you invoke an agent. The problem: those VMs need their container images.
ECR stores container image layers in S3. The engineer’s container image was 435 MB compressed. Every VM in the warm pool—there were 10 of them by default—pulled that image independently. And they didn’t just pull it once.
Three factors combined to produce the 659 GB bill:
The Deployment Spike. On March 27, the engineer made 21 UpdateAgentRuntime API calls during heavy debugging. Each one triggered an asynchronous warm pool re-provisioning cycle. Multiple rounds of 10-VM provisioning, each yanking the 435 MB image from S3, produced the 240 GB spike that day alone.
The Recycling Tax. The warm pool continued cycling VMs over the following days to keep them fresh and ready for invocation. With 10 VMs each pulling the image periodically, the steady 150 GB/day on March 28-30 is consistent with routine image refreshes.
The Downscale. After approximately 72 hours with zero invocations, the warm pool automatically downscaled from 10 VMs to 1 VM. This explains the abrupt drop from 150 GB/day to 15 GB/day on March 31.
The Free Fix That Should’ve Been Day-One Infrastructure
The solution is both obvious and embarrassing once you know about it: an S3 Gateway VPC Endpoint.
A Gateway Endpoint routes S3 traffic directly through the AWS internal network, bypassing the NAT Gateway entirely. Unlike Interface Endpoints, Gateway Endpoints charge nothing—no hourly fee, no data processing charge, nothing. It takes one Terraform resource:
resource “aws_vpc_endpoint” “s3” { vpc_id = aws_vpc.main.id service_name = “com.amazonaws.${var.aws_region}.s3” route_table_ids = [ aws_route_table.private.id, aws_route_table.public.id, ] }
One terraform apply and the NAT Gateway data transfer cost collapses to near zero.
The Larger Question
But here’s what really stings: why would you ever create a VPC with private subnets and a NAT Gateway without including an S3 Gateway Endpoint as default infrastructure? It’s free. It takes one resource. It prevents exactly this kind of surprise billing. There’s genuinely no downside.
This isn’t a Bedrock-specific problem, either. Any containerized workload that pulls images from ECR—which stores everything in S3—will face the same silent cost. Lambda functions pulling container images. ECS tasks in private subnets. EC2 instances downloading software from S3. Every one of them is a potential NAT Gateway bill waiting to happen.
AWS could’ve made this more transparent. Bedrock’s documentation could’ve warned you about the warm pool behavior. The platform team could’ve automatically provisioned an S3 endpoint by default. But they didn’t. Instead, you get a surprise bill and a forensic investigation.
The lesson isn’t technical. It’s about visibility. Cloud costs hide in plain sight when they’re split across multiple services—S3 bandwidth here, NAT Gateway data transfer there, ECR image pulls somewhere else. You don’t see the total until the anomaly alert arrives.
If you’re running anything in a VPC right now, go add an S3 Gateway Endpoint. Do it today. It won’t hurt. It will almost certainly save you money. And it’s the kind of defensive infrastructure that should’ve been standard practice since NAT Gateways existed.
🧬 Related Insights
- Read more: Higress Joins CNCF as Alibaba’s AI Gateway Bet—And Nginx Has Until 2026 to Worry
- Read more: How One Developer Built a Production Site for ₹0: The AWS-Cloudflare Blueprint That Actually Works
Frequently Asked Questions
What is Bedrock AgentCore Runtime? It’s AWS’s managed service for deploying voice and text agents with low-latency responses. It maintains a warm pool of pre-provisioned VMs to ensure quick invocation times. The trade-off: those VMs refresh their container images regularly, which can generate surprising S3 and NAT Gateway costs if not properly isolated with VPC endpoints.
Do I need an S3 Gateway Endpoint in every VPC? Yes, if your VPC has private subnets and a NAT Gateway. It’s free and eliminates data transfer charges for any S3 traffic. If you’re not using S3, it won’t hurt anything. If you are using S3 (and you probably are), not having one is leaving money on the table every single day.
Will this happen with other AWS services running in VPCs? Absolutely. Any service that pulls container images from ECR—Lambda with container images, ECS, AppRunner—will generate the same S3 traffic. Anything that downloads data from S3 within a private subnet will route through your NAT Gateway and cost money unless you have a Gateway Endpoint in place.