AWS Lambda VPC Configuration: The Complete Guide to Private Networking and Cost Optimization with NAT Instances#
Three months ago, I got a $400 AWS bill that made me question everything I knew about serverless architecture. The culprit? NAT Gateways supporting Lambda functions that barely handled 10,000 requests per month. That’s when I learned the hard way that “serverless” doesn’t mean “cheap” when you need proper networking.
After weeks of deep-diving into VPC configurations, NAT alternatives, and cost optimization, I cut our Lambda networking costs by 85% while maintaining security and performance. Here’s everything I learned about running Lambda functions in VPCs the cost-effective way.
Why Lambda Functions Need VPCs (And Why It Gets Expensive)#
Lambda functions run in AWS’s managed infrastructure by default. They can access the internet and public AWS services but can’t reach resources in your private VPC like RDS databases, ElastiCache clusters, or internal services.
The moment you need to access private resources, you have two options:
- Make your resources public (security nightmare)
- Put Lambda in a VPC (networking complexity + costs)
Here’s what happened to our costs:
1
2
3
4
5
6
7
8
9
10
| Before VPC:
- Lambda execution: $12/month
- RDS (public): $45/month
- Total: $57/month
After VPC (naive approach):
- Lambda execution: $12/month
- RDS (private): $45/month
- NAT Gateway: $400/month (!!!)
- Total: $457/month
|
That 8x cost increase was a wake-up call.
Understanding VPC Networking for Lambda#
When Lambda functions run in a VPC, they need to access AWS services and the internet through specific networking paths. Here’s the architecture that costs money:
1
2
| Lambda (Private Subnet) → NAT Gateway → Internet Gateway → AWS Services
→ Route Table → Private Resources (RDS, etc.)
|
The expensive part? NAT Gateways charge for data processing AND hourly uptime:
- $45.60/month per NAT Gateway (24/7 uptime)
- $0.045 per GB processed
- Multi-AZ setup = multiple NAT Gateways
For a modest serverless application, this easily becomes your highest cost.
The Complete VPC Setup (Infrastructure as Code)#
Let’s build a proper VPC configuration using Terraform. This setup provides secure networking for Lambda functions with internet access:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
| # vpc.tf
resource "aws_vpc" "lambda_vpc" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "lambda-vpc"
Environment = "production"
}
}
# Internet Gateway for public subnets
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.lambda_vpc.id
tags = {
Name = "lambda-igw"
}
}
# Public subnets (for NAT Gateway/Instance)
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.lambda_vpc.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "public-subnet-${count.index + 1}"
Type = "public"
}
}
# Private subnets (for Lambda functions)
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.lambda_vpc.id
cidr_block = "10.0.${count.index + 10}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "private-subnet-${count.index + 1}"
Type = "private"
}
}
# Database subnets (for RDS)
resource "aws_subnet" "database" {
count = 2
vpc_id = aws_vpc.lambda_vpc.id
cidr_block = "10.0.${count.index + 20}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "database-subnet-${count.index + 1}"
Type = "database"
}
}
# Route table for public subnets
resource "aws_route_table" "public" {
vpc_id = aws_vpc.lambda_vpc.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "public-rt"
}
}
# Associate public subnets with public route table
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
data "aws_availability_zones" "available" {
state = "available"
}
|
The Expensive Way: NAT Gateway#
The standard approach uses managed NAT Gateways:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
| # nat-gateway.tf (EXPENSIVE!)
resource "aws_eip" "nat" {
count = 2
domain = "vpc"
depends_on = [aws_internet_gateway.main]
tags = {
Name = "nat-eip-${count.index + 1}"
}
}
resource "aws_nat_gateway" "main" {
count = 2
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "nat-gateway-${count.index + 1}"
}
depends_on = [aws_internet_gateway.main]
}
# Private route tables (one per AZ for HA)
resource "aws_route_table" "private" {
count = 2
vpc_id = aws_vpc.lambda_vpc.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = {
Name = "private-rt-${count.index + 1}"
}
}
# Associate private subnets with private route tables
resource "aws_route_table_association" "private" {
count = length(aws_subnet.private)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
|
Monthly cost for this setup:
- 2 NAT Gateways: $91.20 (24/7 uptime)
- Data processing: ~$50-200 depending on usage
- Total: $140-290/month just for networking
The Cost-Effective Way: NAT Instance#
Here’s how to replace expensive NAT Gateways with a single NAT instance:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
| # nat-instance.tf
# Security group for NAT instance
resource "aws_security_group" "nat_instance" {
name_prefix = "nat-instance-"
vpc_id = aws_vpc.lambda_vpc.id
# Allow inbound traffic from private subnets
ingress {
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [for subnet in aws_subnet.private : subnet.cidr_block]
}
ingress {
from_port = 0
to_port = 65535
protocol = "udp"
cidr_blocks = [for subnet in aws_subnet.private : subnet.cidr_block]
}
# Allow SSH access (for management)
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # Restrict this in production
}
# Allow all outbound traffic
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "nat-instance-sg"
}
}
# Get latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
# IAM role for NAT instance (for CloudWatch, SSM, etc.)
resource "aws_iam_role" "nat_instance" {
name = "nat-instance-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "nat_instance_ssm" {
role = aws_iam_role.nat_instance.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
resource "aws_iam_instance_profile" "nat_instance" {
name = "nat-instance-profile"
role = aws_iam_role.nat_instance.name
}
# NAT Instance
resource "aws_instance" "nat" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t3.nano" # $3.80/month!
key_name = var.key_pair_name
vpc_security_group_ids = [aws_security_group.nat_instance.id]
subnet_id = aws_subnet.public[0].id
iam_instance_profile = aws_iam_instance_profile.nat_instance.name
# Disable source/destination check (required for NAT)
source_dest_check = false
user_data = base64encode(templatefile("${path.module}/nat-instance-setup.sh", {
vpc_cidr = aws_vpc.lambda_vpc.cidr_block
}))
tags = {
Name = "nat-instance"
Purpose = "NAT for Lambda functions"
}
lifecycle {
create_before_destroy = true
}
}
# Elastic IP for NAT instance
resource "aws_eip" "nat_instance" {
instance = aws_instance.nat.id
domain = "vpc"
tags = {
Name = "nat-instance-eip"
}
}
# Single route table for all private subnets
resource "aws_route_table" "private" {
vpc_id = aws_vpc.lambda_vpc.id
route {
cidr_block = "0.0.0.0/0"
instance_id = aws_instance.nat.id
}
tags = {
Name = "private-rt"
}
}
# Associate all private subnets with the single route table
resource "aws_route_table_association" "private" {
count = length(aws_subnet.private)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private.id
}
|
Here’s the NAT instance setup script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
| #!/bin/bash
# nat-instance-setup.sh
# Update system
yum update -y
# Enable IP forwarding
echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf
sysctl -p
# Configure iptables for NAT
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -A FORWARD -i eth0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i eth0 -o eth0 -j ACCEPT
# Save iptables rules
iptables-save > /etc/sysconfig/iptables
# Install iptables-services to persist rules
yum install -y iptables-services
systemctl enable iptables
systemctl start iptables
# Install CloudWatch agent for monitoring
yum install -y amazon-cloudwatch-agent
# Configure automatic security updates
yum install -y yum-cron
systemctl enable yum-cron
systemctl start yum-cron
# Create a monitoring script
cat << 'EOF' > /usr/local/bin/nat-health-check.sh
#!/bin/bash
# Simple health check script
ping -c 3 8.8.8.8 > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "NAT instance healthy: $(date)" >> /var/log/nat-health.log
else
echo "NAT instance unhealthy: $(date)" >> /var/log/nat-health.log
fi
EOF
chmod +x /usr/local/bin/nat-health-check.sh
# Add to crontab
echo "*/5 * * * * /usr/local/bin/nat-health-check.sh" | crontab -
# Log completion
echo "NAT instance setup completed: $(date)" >> /var/log/nat-setup.log
|
Lambda VPC Configuration#
Now configure your Lambda functions to use the private subnets:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
| # lambda.tf
resource "aws_security_group" "lambda" {
name_prefix = "lambda-"
vpc_id = aws_vpc.lambda_vpc.id
# Allow outbound internet access (through NAT)
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
# Allow access to RDS
egress {
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_blocks = [for subnet in aws_subnet.database : subnet.cidr_block]
}
tags = {
Name = "lambda-sg"
}
}
resource "aws_lambda_function" "api" {
filename = "lambda.zip"
function_name = "api-handler"
role = aws_iam_role.lambda.arn
handler = "index.handler"
runtime = "nodejs18.x"
timeout = 30
vpc_config {
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.lambda.id]
}
environment {
variables = {
DATABASE_URL = aws_db_instance.main.endpoint
REDIS_URL = aws_elasticache_cluster.main.cache_nodes.0.address
}
}
depends_on = [
aws_iam_role_policy_attachment.lambda_vpc,
aws_cloudwatch_log_group.lambda,
]
}
# Lambda execution role with VPC permissions
resource "aws_iam_role" "lambda" {
name = "lambda-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "lambda_vpc" {
role = aws_iam_role.lambda.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
}
|
RDS and ElastiCache in Private Subnets#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
| # rds.tf
resource "aws_db_subnet_group" "main" {
name = "main-db-subnet-group"
subnet_ids = aws_subnet.database[*].id
tags = {
Name = "main-db-subnet-group"
}
}
resource "aws_security_group" "rds" {
name_prefix = "rds-"
vpc_id = aws_vpc.lambda_vpc.id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.lambda.id]
}
tags = {
Name = "rds-sg"
}
}
resource "aws_db_instance" "main" {
identifier = "main-postgres"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.t3.micro"
allocated_storage = 20
storage_encrypted = true
db_name = "maindb"
username = "dbadmin"
password = var.db_password
vpc_security_group_ids = [aws_security_group.rds.id]
db_subnet_group_name = aws_db_subnet_group.main.name
backup_retention_period = 7
skip_final_snapshot = true
tags = {
Name = "main-database"
}
}
|
Cost Comparison: Real Numbers#
Here’s what our actual costs look like:
NAT Gateway Approach (2 AZs):#
1
2
3
4
5
| Monthly Costs:
- NAT Gateway uptime (2 × $45.60): $91.20
- Data processing (500GB): $22.50
- Elastic IPs (2 × $3.65): $7.30
- Total: $121.00/month
|
NAT Instance Approach:#
1
2
3
4
5
| Monthly Costs:
- t3.nano instance: $3.80
- Data transfer: $0 (same AZ)
- Elastic IP: $0 (attached to instance)
- Total: $3.80/month
|
Savings: $117.20/month (97% cost reduction)
NAT Instance High Availability#
The single point of failure concern is real. Here’s how to address it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
| # nat-ha.tf - Auto-recovering NAT instance
resource "aws_launch_template" "nat" {
name_prefix = "nat-instance-"
image_id = data.aws_ami.amazon_linux.id
instance_type = "t3.nano"
key_name = var.key_pair_name
vpc_security_group_ids = [aws_security_group.nat_instance.id]
iam_instance_profile {
name = aws_iam_instance_profile.nat_instance.name
}
user_data = base64encode(templatefile("${path.module}/nat-instance-ha-setup.sh", {
vpc_cidr = aws_vpc.lambda_vpc.cidr_block
route_table_id = aws_route_table.private.id
elastic_ip_id = aws_eip.nat_instance.id
}))
tag_specifications {
resource_type = "instance"
tags = {
Name = "nat-instance-ha"
}
}
}
resource "aws_autoscaling_group" "nat" {
name = "nat-instance-asg"
vpc_zone_identifier = [aws_subnet.public[0].id]
target_group_arns = []
health_check_type = "EC2"
health_check_grace_period = 300
min_size = 1
max_size = 1
desired_capacity = 1
launch_template {
id = aws_launch_template.nat.id
version = "$Latest"
}
tag {
key = "Name"
value = "nat-instance-asg"
propagate_at_launch = false
}
}
|
Enhanced setup script with auto-recovery:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
| #!/bin/bash
# nat-instance-ha-setup.sh
# Basic NAT setup (same as before)
yum update -y
echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf
sysctl -p
# Install AWS CLI
yum install -y awscli
# Get instance metadata
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
# Auto-attach Elastic IP
aws ec2 associate-address \
--instance-id $INSTANCE_ID \
--allocation-id ${elastic_ip_id} \
--region $REGION
# Update route table to point to this instance
aws ec2 replace-route \
--route-table-id ${route_table_id} \
--destination-cidr-block 0.0.0.0/0 \
--instance-id $INSTANCE_ID \
--region $REGION
# Configure iptables (same as before)
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -A FORWARD -i eth0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i eth0 -o eth0 -j ACCEPT
iptables-save > /etc/sysconfig/iptables
yum install -y iptables-services
systemctl enable iptables
systemctl start iptables
# Health check with auto-recovery
cat << 'EOF' > /usr/local/bin/nat-health-monitor.sh
#!/bin/bash
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
# Check internet connectivity
ping -c 3 8.8.8.8 > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo "Internet connectivity failed, terminating instance for ASG replacement"
aws ec2 terminate-instances --instance-ids $INSTANCE_ID --region $REGION
fi
EOF
chmod +x /usr/local/bin/nat-health-monitor.sh
echo "*/2 * * * * /usr/local/bin/nat-health-monitor.sh" | crontab -
|
NAT instances can handle significant traffic. Here’s sizing guidance:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| t3.nano (1 vCPU, 0.5GB):
- Good for: <1Gbps, dev/staging
- Lambda functions: <100 concurrent
t3.micro (1 vCPU, 1GB):
- Good for: 1-2Gbps, small production
- Lambda functions: 100-500 concurrent
t3.small (2 vCPU, 2GB):
- Good for: 2-5Gbps, medium production
- Lambda functions: 500-1000 concurrent
t3.medium (2 vCPU, 4GB):
- Good for: 5-10Gbps, large production
- Lambda functions: >1000 concurrent
|
Monitor these CloudWatch metrics:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
| # monitoring.tf
resource "aws_cloudwatch_metric_alarm" "nat_cpu" {
alarm_name = "nat-instance-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "300"
statistic = "Average"
threshold = "80"
alarm_description = "This metric monitors nat instance cpu utilization"
dimensions = {
InstanceId = aws_instance.nat.id
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
resource "aws_cloudwatch_metric_alarm" "nat_network" {
alarm_name = "nat-instance-high-network"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "NetworkPacketsOut"
namespace = "AWS/EC2"
period = "300"
statistic = "Average"
threshold = "100000"
alarm_description = "This metric monitors nat instance network utilization"
dimensions = {
InstanceId = aws_instance.nat.id
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
|
Lambda Cold Start Considerations#
VPC Lambda functions have additional cold start overhead due to ENI (Elastic Network Interface) creation. Here’s how to minimize it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| // Provisioned concurrency for critical functions
resource "aws_lambda_provisioned_concurrency_config" "api" {
function_name = aws_lambda_function.api.function_name
provisioned_concurrent_executions = 5 // Keep 5 warm
qualifier = aws_lambda_function.api.version
}
// Connection pooling for database connections
const { Pool } = require('pg');
// Create pool outside handler (reused across invocations)
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 1, // Important: Lambda can't share connections
idleTimeoutMillis: 30000,
});
exports.handler = async (event) => {
// Reuse connection pool
const client = await pool.connect();
try {
const result = await client.query('SELECT * FROM users WHERE id = $1', [event.userId]);
return {
statusCode: 200,
body: JSON.stringify(result.rows[0])
};
} finally {
client.release(); // Return to pool
}
};
|
Security Best Practices#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
| # security.tf
# VPC Flow Logs for monitoring
resource "aws_flow_log" "vpc" {
iam_role_arn = aws_iam_role.flow_log.arn
log_destination = aws_cloudwatch_log_group.vpc_flow_log.arn
traffic_type = "ALL"
vpc_id = aws_vpc.lambda_vpc.id
}
resource "aws_cloudwatch_log_group" "vpc_flow_log" {
name = "/aws/vpc/flowlogs"
retention_in_days = 30
}
# Network ACLs for additional security
resource "aws_network_acl" "private" {
vpc_id = aws_vpc.lambda_vpc.id
subnet_ids = aws_subnet.private[*].id
# Allow inbound from NAT instance
ingress {
protocol = "-1"
rule_no = 100
action = "allow"
cidr_block = aws_subnet.public[0].cidr_block
}
# Allow outbound to anywhere
egress {
protocol = "-1"
rule_no = 100
action = "allow"
cidr_block = "0.0.0.0/0"
}
tags = {
Name = "private-nacl"
}
}
# WAF for API Gateway (if using)
resource "aws_wafv2_web_acl" "api" {
name = "api-waf"
scope = "REGIONAL"
default_action {
allow {}
}
# Rate limiting
rule {
name = "RateLimitRule"
priority = 1
override_action {
none {}
}
statement {
rate_based_statement {
limit = 2000
aggregate_key_type = "IP"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "RateLimitRule"
sampled_requests_enabled = true
}
action {
block {}
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "apiWAF"
sampled_requests_enabled = true
}
}
|
When NOT to Use NAT Instances#
NAT instances aren’t always the right choice:
Avoid NAT instances when:
- You need guaranteed 10Gbps+ throughput
- Your Lambda functions process >1TB/month data
- You have strict compliance requiring managed services
- Your team lacks AWS networking expertise
- You need multi-region failover
Stick with NAT Gateways when:
- Cost isn’t a primary concern
- You want AWS-managed infrastructure
- You need maximum reliability and performance
- Your workload justifies the cost
Cost Optimization Strategies#
Beyond NAT instances, here are additional cost optimizations:
1. VPC Endpoints for AWS Services#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| # Avoid NAT charges for AWS service calls
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.lambda_vpc.id
service_name = "com.amazonaws.${data.aws_region.current.name}.s3"
route_table_ids = [aws_route_table.private.id]
tags = {
Name = "s3-vpc-endpoint"
}
}
resource "aws_vpc_endpoint" "dynamodb" {
vpc_id = aws_vpc.lambda_vpc.id
service_name = "com.amazonaws.${data.aws_region.current.name}.dynamodb"
route_table_ids = [aws_route_table.private.id]
tags = {
Name = "dynamodb-vpc-endpoint"
}
}
|
2. Lambda Function Optimization#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| // Reduce network calls by batching
const batchSize = 25; // DynamoDB batch limit
const batches = [];
for (let i = 0; i < items.length; i += batchSize) {
batches.push(items.slice(i, i + batchSize));
}
// Process batches in parallel
const results = await Promise.all(
batches.map(batch => dynamodb.batchWrite({
RequestItems: {
[tableName]: batch.map(item => ({
PutRequest: { Item: item }
}))
}
}).promise())
);
|
3. Right-sizing Instances#
1
2
3
4
5
6
7
8
9
10
11
12
| # Monitor NAT instance usage
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--start-time 2025-07-01T00:00:00Z \
--end-time 2025-07-27T23:59:59Z \
--period 3600 \
--statistics Average,Maximum
# If avg CPU < 10% for a week, downsize to t3.nano
# If avg CPU > 80% for sustained periods, upsize
|
Deployment and Testing#
Here’s a complete deployment script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
| #!/bin/bash
# deploy-lambda-vpc.sh
set -e
echo "Deploying Lambda VPC infrastructure..."
# Validate Terraform
terraform validate
# Plan deployment
terraform plan -out=tfplan
# Apply with approval
terraform apply tfplan
# Test NAT instance
echo "Testing NAT instance connectivity..."
NAT_IP=$(terraform output -raw nat_instance_public_ip)
# SSH to NAT instance and test connectivity
ssh -i ~/.ssh/your-key.pem ec2-user@$NAT_IP << 'EOF'
# Test internet connectivity
ping -c 3 8.8.8.8
# Test AWS service connectivity
curl -s https://s3.amazonaws.com
# Check iptables rules
sudo iptables -t nat -L
EOF
# Deploy test Lambda function
echo "Deploying test Lambda function..."
zip lambda-test.zip test-function.js
aws lambda create-function \
--function-name vpc-test \
--runtime nodejs18.x \
--role $(terraform output -raw lambda_role_arn) \
--handler test-function.handler \
--zip-file fileb://lambda-test.zip \
--vpc-config SubnetIds=$(terraform output -raw private_subnet_ids),SecurityGroupIds=$(terraform output -raw lambda_security_group_id)
# Test Lambda function
echo "Testing Lambda function..."
aws lambda invoke \
--function-name vpc-test \
--payload '{"test": "data"}' \
response.json
cat response.json
echo ""
echo "Deployment completed successfully!"
|
Test Lambda function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
| // test-function.js
const https = require('https');
const { Pool } = require('pg');
exports.handler = async (event) => {
const results = {
timestamp: new Date().toISOString(),
tests: {}
};
// Test internet connectivity
try {
await new Promise((resolve, reject) => {
https.get('https://httpbin.org/ip', (res) => {
let data = '';
res.on('data', chunk => data += chunk);
res.on('end', () => {
results.tests.internet = { success: true, ip: JSON.parse(data).origin };
resolve();
});
}).on('error', reject);
});
} catch (error) {
results.tests.internet = { success: false, error: error.message };
}
// Test database connectivity
try {
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 1,
});
const client = await pool.connect();
const result = await client.query('SELECT version()');
client.release();
await pool.end();
results.tests.database = { success: true, version: result.rows[0].version };
} catch (error) {
results.tests.database = { success: false, error: error.message };
}
return {
statusCode: 200,
body: JSON.stringify(results, null, 2)
};
};
|
Final Thoughts#
Moving from NAT Gateways to NAT instances saved us $1,400 annually while maintaining functionality. The key lessons:
- Understand your traffic patterns - Most Lambda workloads don’t need NAT Gateway throughput
- Monitor everything - Set up proper alerting for the NAT instance
- Start small - Begin with t3.nano and scale up if needed
- Use VPC endpoints - Eliminate NAT charges for AWS service calls
- Test thoroughly - Validate connectivity and performance before production
The “serverless” promise of Lambda is powerful, but VPC networking costs can quickly spiral out of control. With proper architecture and cost-conscious choices, you can have secure, private Lambda functions without breaking the bank.
Is the complexity worth it? For most production applications requiring database access, absolutely. The security benefits of private subnets combined with 85%+ cost savings make this approach a no-brainer.
Running Lambda functions in VPCs? I’d love to hear about your networking setup and cost optimizations. Find me on Twitter @TheLogicalDev.
All infrastructure code tested with Terraform 1.5+ and AWS Provider 5.0+. Costs calculated using us-east-1 pricing as of July 2025.