Verified by Garnet Grid

How to Automate Infrastructure Testing

Test your infrastructure like application code. Covers Terraform testing, policy-as-code, drift detection, chaos engineering basics, and CI/CD integration.

“It works in staging” is the infrastructure equivalent of “it works on my machine.” Infrastructure testing prevents the 2 AM incident caused by a misconfigured security group that slipped through code review.


Step 1: Terraform Testing

# Terraform validate — syntax and configuration check
terraform init
terraform validate
terraform plan -out=plan.tfplan

# Automated plan analysis
terraform show -json plan.tfplan | \
  jq '.resource_changes[] | select(.change.actions | index("delete"))' | \
  jq '.address'
# ^^ Alert if any resources are being destroyed

Terratest (Go-Based Integration Testing)

package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
    http_helper "github.com/gruntwork-io/terratest/modules/http-helper"
)

func TestWebServer(t *testing.T) {
    t.Parallel()

    opts := &terraform.Options{
        TerraformDir: "../modules/web-server",
        Vars: map[string]interface{}{
            "instance_type": "t3.micro",
            "environment":   "test",
        },
    }

    defer terraform.Destroy(t, opts)
    terraform.InitAndApply(t, opts)

    // Verify the web server is reachable
    url := terraform.Output(t, opts, "url")
    http_helper.HttpGetWithRetry(t, url, nil, 200, "OK", 10, 5)

    // Verify security group rules
    sgId := terraform.Output(t, opts, "security_group_id")
    assert.NotEmpty(t, sgId)
}

Step 2: Policy-as-Code

# OPA (Open Policy Agent) — Terraform plan validation
# policy/security.rego

package terraform.security

# Deny public S3 buckets
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_s3_bucket"
    resource.change.after.acl == "public-read"
    msg := sprintf("S3 bucket '%s' must not be public", [resource.address])
}

# Deny instances without encryption
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_instance"
    not resource.change.after.root_block_device[0].encrypted
    msg := sprintf("EC2 '%s' must have encrypted root volume", [resource.address])
}

# Require tags on all resources
deny[msg] {
    resource := input.resource_changes[_]
    not resource.change.after.tags.Environment
    msg := sprintf("Resource '%s' must have 'Environment' tag", [resource.address])
}
# Run in CI/CD
terraform plan -out=plan.tfplan
terraform show -json plan.tfplan > plan.json
opa eval --data policy/ --input plan.json "data.terraform.security.deny"

Step 3: Drift Detection

# Terraform drift detection
terraform plan -detailed-exitcode
# Exit code 0 = no changes
# Exit code 1 = error
# Exit code 2 = changes detected (drift!)

# Schedule daily drift checks
# .github/workflows/drift-detection.yml
name: Infrastructure Drift Detection
on:
  schedule:
    - cron: '0 8 * * *'  # Daily at 8 AM
jobs:
  detect-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: |
          terraform plan -detailed-exitcode 2>&1 || EXIT_CODE=$?
          if [ "$EXIT_CODE" = "2" ]; then
            echo "⚠️ DRIFT DETECTED"
            # Send Slack notification
            curl -X POST "$SLACK_WEBHOOK" \
              -d '{"text":"⚠️ Infrastructure drift detected!"}'
          fi

Step 4: CI/CD Pipeline for Infrastructure

# .github/workflows/infrastructure.yml
name: Infrastructure CI/CD
on:
  pull_request:
    paths: ['terraform/**']
  push:
    branches: [main]
    paths: ['terraform/**']

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform fmt -check -recursive
      - run: terraform init -backend=false
      - run: terraform validate

  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: tfsec
        uses: aquasecurity/tfsec-action@v1.0.0
      - name: checkov
        uses: bridgecrewio/checkov-action@v12

  plan:
    needs: [validate, security-scan]
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: terraform plan -no-color -out=plan.tfplan
      - uses: actions/github-script@v7
        with:
          script: |
            // Post plan output as PR comment
            const output = `#### Terraform Plan 📖\n\`\`\`\n${process.env.PLAN}\n\`\`\``;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            });

  apply:
    needs: [validate, security-scan]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: terraform apply -auto-approve

Infrastructure Testing Checklist

  • terraform validate in CI (every PR)
  • terraform fmt enforced (no style drift)
  • Security scanning (tfsec/checkov) in CI
  • Policy-as-code (OPA/Sentinel) for guardrails
  • Integration tests (Terratest) for critical modules
  • Drift detection running daily
  • Plan output posted as PR comments
  • Apply only from main branch (no manual applies)
  • State file encrypted and access-controlled
  • Blast radius limited (state file per environment)

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For infrastructure advisory, visit garnetgrid.com. :::