Verified by Garnet Grid

How to Manage Database Schema Migrations Safely

Deploy database changes without downtime. Covers blue-green migrations, backward compatibility, rollback strategies, and migration testing.

Schema migrations are the scariest deploys. A bad migration can take down production, corrupt data, or lock tables for hours. This guide covers how to make them boring and safe.


The Golden Rules

  1. Every migration must be backward compatible — old code must work with new schema
  2. Never rename or drop columns in one step — use expand-and-contract
  3. Always test migrations against a production-size dataset — timing matters
  4. Have a rollback plan before you start — not after it fails

Step 1: Expand and Contract Pattern

Adding a Column (Safe)

-- Step 1: Add column (nullable first)
ALTER TABLE users ADD COLUMN display_name VARCHAR(255);

-- Step 2: Backfill data
UPDATE users SET display_name = first_name || ' ' || last_name
WHERE display_name IS NULL;

-- Step 3: Add NOT NULL (after backfill complete)
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;
ALTER TABLE users ALTER COLUMN display_name SET DEFAULT '';

Renaming a Column (Safe)

-- WRONG: ALTER TABLE users RENAME COLUMN name TO full_name;
-- This breaks all existing code immediately!

-- RIGHT: Expand-and-contract over 3 deploys

-- Deploy 1: Add new column, backfill, dual-write
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
UPDATE users SET full_name = name;
-- App code: write to BOTH columns, read from full_name

-- Deploy 2: Stop reading old column
-- App code: only read/write full_name

-- Deploy 3: Drop old column (after verification period)
ALTER TABLE users DROP COLUMN name;

Step 2: Migration Framework

# Using Alembic (Python/SQLAlchemy)

# alembic/versions/001_add_display_name.py
"""Add display_name column"""

revision = '001'
down_revision = None

from alembic import op
import sqlalchemy as sa

def upgrade():
    op.add_column('users',
        sa.Column('display_name', sa.String(255), nullable=True)
    )

def downgrade():
    op.drop_column('users', 'display_name')
# Migration commands
alembic upgrade head          # Apply all pending migrations
alembic downgrade -1          # Rollback last migration
alembic history               # Show migration history
alembic current               # Show current version

# Generate migration from model changes
alembic revision --autogenerate -m "add display_name"

Step 3: Testing Migrations

# Test migrations against production-size data
import time
import psycopg2

def test_migration_timing(conn, migration_sql, table_name):
    """Estimate migration time on production data"""
    cur = conn.cursor()

    # Get row count
    cur.execute(f"SELECT COUNT(*) FROM {table_name}")
    row_count = cur.fetchone()[0]

    # Time the migration on a sample
    sample_size = min(100000, row_count)
    start = time.time()
    cur.execute(f"""
        CREATE TEMP TABLE test_migration AS
        SELECT * FROM {table_name} LIMIT {sample_size}
    """)
    # Apply migration to sample
    cur.execute(migration_sql.replace(table_name, "test_migration"))
    elapsed = time.time() - start

    # Extrapolate
    estimated_total = (elapsed / sample_size) * row_count
    print(f"Rows: {row_count:,}")
    print(f"Sample time: {elapsed:.2f}s ({sample_size:,} rows)")
    print(f"Estimated full: {estimated_total:.0f}s ({estimated_total/60:.1f} min)")

    if estimated_total > 300:  # > 5 minutes
        print("⚠️  WARNING: Consider batched migration or background job")

    conn.rollback()

Step 4: Lock-Free Migrations

-- PostgreSQL: CREATE INDEX CONCURRENTLY (no table lock)
CREATE INDEX CONCURRENTLY idx_orders_customer
ON orders (customer_id);

-- Add column with default (PostgreSQL 11+ is instant)
ALTER TABLE orders ADD COLUMN priority INTEGER DEFAULT 0;

-- Batch updates instead of single UPDATE
DO $$
DECLARE
    batch_size INT := 10000;
    rows_updated INT;
BEGIN
    LOOP
        UPDATE users
        SET display_name = first_name || ' ' || last_name
        WHERE display_name IS NULL
          AND id IN (
            SELECT id FROM users
            WHERE display_name IS NULL
            LIMIT batch_size
            FOR UPDATE SKIP LOCKED
          );
        GET DIAGNOSTICS rows_updated = ROW_COUNT;
        EXIT WHEN rows_updated = 0;
        PERFORM pg_sleep(0.1);  -- Brief pause to reduce load
        RAISE NOTICE 'Updated % rows', rows_updated;
    END LOOP;
END $$;

Migration Safety Checklist

  • Migration is backward compatible with current code
  • Tested on production-size dataset (timing verified)
  • Rollback migration written and tested
  • No table-level locks on tables > 100K rows
  • CREATE INDEX CONCURRENTLY used (not CREATE INDEX)
  • Batch updates used for large backfills
  • Migration reviewed by DBA or senior engineer
  • Deployed during low-traffic window
  • Monitoring in place during and after migration
  • Verification queries confirm data integrity

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database consulting, visit garnetgrid.com. :::