Clicking through the AWS console to set up AI infrastructure does not scale. Here is how to manage your entire AI stack as version-controlled, reproducible code.
AI infrastructure is complex. A typical production setup involves model endpoints, vector databases, caching layers, API gateways, VPCs, IAM roles, secrets, auto-scaling rules, and monitoring. Managing this through console clicks leads to undocumented configurations, environment drift, and impossible-to-reproduce deployments. Terraform turns all of this into declarative code that can be reviewed, versioned, and applied consistently across development, staging, and production environments. This guide covers how to use Terraform with AWS Bedrock to build reproducible AI infrastructure.
# Enable access to foundation models
resource "aws_bedrock_model_invocation_logging_configuration" "ai_logging" {
logging_config {
embedding_data_delivery_enabled = true
s3_config {
bucket_name = aws_s3_bucket.ai_logs.id
}
}
}
# Provisioned throughput for consistent latency
resource "aws_bedrock_provisioned_model_throughput" "claude" {
provisioned_model_name = "claude-production"
model_arn = "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet"
model_units = 1
}
# Isolated VPC for AI workloads
module "ai_vpc" {
source = "terraform-aws-modules/vpc/aws"
name = "ai-production"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
enable_nat_gateway = true
}
# VPC endpoint for Bedrock (no internet traversal)
resource "aws_vpc_endpoint" "bedrock" {
vpc_id = module.ai_vpc.vpc_id
service_name = "com.amazonaws.us-east-1.bedrock-runtime"
vpc_endpoint_type = "Interface"
subnet_ids = module.ai_vpc.private_subnets
}
Deploy your FastAPI application as a containerized service with auto-scaling:
# ECS service with auto-scaling
resource "aws_ecs_service" "ai_api" {
name = "ai-api-service"
cluster = aws_ecs_cluster.ai.id
task_definition = aws_ecs_task_definition.ai_api.arn
desired_count = var.min_instances
launch_type = "FARGATE"
}
resource "aws_appautoscaling_target" "ai_api" {
max_capacity = var.max_instances
min_capacity = var.min_instances
resource_id = "service/${aws_ecs_cluster.ai.name}/${aws_ecs_service.ai_api.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
Provision Redis for semantic caching, session state, and task queues:
resource "aws_elasticache_replication_group" "ai_cache" {
replication_group_id = "ai-semantic-cache"
description = "Redis cache for LLM responses"
node_type = "cache.r7g.large"
num_cache_clusters = 2
engine = "redis"
engine_version = "7.0"
subnet_group_name = aws_elasticache_subnet_group.ai.name
}
# Use variable files per environment
terraform workspace select production
terraform apply -var-file=envs/production.tfvars
# production.tfvars
min_instances = 2
max_instances = 10
redis_node_type = "cache.r7g.large"
bedrock_model_units = 2
Never hard-code API keys. Use AWS Secrets Manager with Terraform:
resource "aws_secretsmanager_secret" "openai_key" {
name = "ai/openai-api-key"
}
# Reference in ECS task definition
secrets = [{
name = "OPENAI_API_KEY"
valueFrom = aws_secretsmanager_secret.openai_key.arn
}]
Terraform if your team spans multiple clouds or prefers declarative configuration. AWS CDK if you are all-in on AWS and prefer writing infrastructure in Python or TypeScript. Both work well for AI workloads.
Bedrock model access is a configuration, not a deployment. Changing the model ID in your Terraform variables and applying is a zero-downtime change. For provisioned throughput, plan for a brief provisioning period.
Terraform manages Lambda functions well. See our analysis of when Lambda works for AI to decide if it fits your workload before writing the Terraform.
We build Terraform modules for AI infrastructure. From model endpoints to vector stores to monitoring -- all as code.
Automate Your Infrastructure