Infrastructure as Code for AI: Terraform + AWS Bedrock Guide

AI infrastructure is complex. A typical production setup involves model endpoints, vector databases, caching layers, API gateways, VPCs, IAM roles, secrets, auto-scaling rules, and monitoring. Managing this through console clicks leads to undocumented configurations, environment drift, and impossible-to-reproduce deployments. Terraform turns all of this into declarative code that can be reviewed, versioned, and applied consistently across development, staging, and production environments. This guide covers how to use Terraform with AWS Bedrock to build reproducible AI infrastructure.

Why IaC Matters for AI Infrastructure

Reproducibility: Spin up an identical AI environment in 10 minutes. New team member onboarding, disaster recovery, and multi-region deployment all become straightforward.
Auditability: Every infrastructure change is a Git commit. You know who changed what, when, and why. Essential for compliance-sensitive deployments.
Cost visibility: Running terraform plan shows you exactly what resources will be created and their estimated cost before you deploy.
Environment parity: Dev, staging, and production use the same Terraform modules with different variable values. No more "works in dev, breaks in prod."

The AI Infrastructure Stack in Terraform

Module 1: AWS Bedrock Model Access

# Enable access to foundation models

resource "aws_bedrock_model_invocation_logging_configuration" "ai_logging" {

logging_config {

embedding_data_delivery_enabled = true

s3_config {

bucket_name = aws_s3_bucket.ai_logs.id

}

# Provisioned throughput for consistent latency

resource "aws_bedrock_provisioned_model_throughput" "claude" {

provisioned_model_name = "claude-production"

model_arn = "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet"

model_units = 1

}

Module 2: VPC and Networking

# Isolated VPC for AI workloads

module "ai_vpc" {

source = "terraform-aws-modules/vpc/aws"

name = "ai-production"

cidr = "10.0.0.0/16"

azs = ["us-east-1a", "us-east-1b"]

private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]

public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]

enable_nat_gateway = true

}

# VPC endpoint for Bedrock (no internet traversal)

resource "aws_vpc_endpoint" "bedrock" {

vpc_id = module.ai_vpc.vpc_id

service_name = "com.amazonaws.us-east-1.bedrock-runtime"

vpc_endpoint_type = "Interface"

subnet_ids = module.ai_vpc.private_subnets

}

Module 3: ECS/Fargate for FastAPI

Deploy your FastAPI application as a containerized service with auto-scaling:

# ECS service with auto-scaling

resource "aws_ecs_service" "ai_api" {

name = "ai-api-service"

cluster = aws_ecs_cluster.ai.id

task_definition = aws_ecs_task_definition.ai_api.arn

desired_count = var.min_instances

launch_type = "FARGATE"

}

resource "aws_appautoscaling_target" "ai_api" {

max_capacity = var.max_instances

min_capacity = var.min_instances

resource_id = "service/${aws_ecs_cluster.ai.name}/${aws_ecs_service.ai_api.name}"

scalable_dimension = "ecs:service:DesiredCount"

service_namespace = "ecs"

}

Module 4: ElastiCache (Redis) for Caching

Provision Redis for semantic caching, session state, and task queues:

resource "aws_elasticache_replication_group" "ai_cache" {

replication_group_id = "ai-semantic-cache"

description = "Redis cache for LLM responses"

node_type = "cache.r7g.large"

num_cache_clusters = 2

engine = "redis"

engine_version = "7.0"

subnet_group_name = aws_elasticache_subnet_group.ai.name

}

Environment Management with Terraform Workspaces

# Use variable files per environment

terraform workspace select production

terraform apply -var-file=envs/production.tfvars

# production.tfvars

min_instances = 2

max_instances = 10

redis_node_type = "cache.r7g.large"

bedrock_model_units = 2

Secrets Management

Never hard-code API keys. Use AWS Secrets Manager with Terraform:

resource "aws_secretsmanager_secret" "openai_key" {

name = "ai/openai-api-key"

}

# Reference in ECS task definition

secrets = [{

name = "OPENAI_API_KEY"

valueFrom = aws_secretsmanager_secret.openai_key.arn

}]

Frequently Asked Questions

Terraform or AWS CDK for AI infrastructure?

Terraform if your team spans multiple clouds or prefers declarative configuration. AWS CDK if you are all-in on AWS and prefer writing infrastructure in Python or TypeScript. Both work well for AI workloads.

How do I handle model changes in Terraform?

Bedrock model access is a configuration, not a deployment. Changing the model ID in your Terraform variables and applying is a zero-downtime change. For provisioned throughput, plan for a brief provisioning period.

What about Lambda-based AI deployments?

Terraform manages Lambda functions well. See our analysis of when Lambda works for AI to decide if it fits your workload before writing the Terraform.

Automate Your AI Infrastructure

We build Terraform modules for AI infrastructure. From model endpoints to vector stores to monitoring -- all as code.

Automate Your Infrastructure