How to Manage Database Schema Migrations Safely
Deploy database changes without downtime. Covers blue-green migrations, backward compatibility, rollback strategies, and migration testing.
Schema migrations are the scariest deploys. A bad migration can take down production, corrupt data, or lock tables for hours. This guide covers how to make them boring and safe.
The Golden Rules
- Every migration must be backward compatible — old code must work with new schema
- Never rename or drop columns in one step — use expand-and-contract
- Always test migrations against a production-size dataset — timing matters
- Have a rollback plan before you start — not after it fails
Step 1: Expand and Contract Pattern
Adding a Column (Safe)
-- Step 1: Add column (nullable first)
ALTER TABLE users ADD COLUMN display_name VARCHAR(255);
-- Step 2: Backfill data
UPDATE users SET display_name = first_name || ' ' || last_name
WHERE display_name IS NULL;
-- Step 3: Add NOT NULL (after backfill complete)
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;
ALTER TABLE users ALTER COLUMN display_name SET DEFAULT '';
Renaming a Column (Safe)
-- WRONG: ALTER TABLE users RENAME COLUMN name TO full_name;
-- This breaks all existing code immediately!
-- RIGHT: Expand-and-contract over 3 deploys
-- Deploy 1: Add new column, backfill, dual-write
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
UPDATE users SET full_name = name;
-- App code: write to BOTH columns, read from full_name
-- Deploy 2: Stop reading old column
-- App code: only read/write full_name
-- Deploy 3: Drop old column (after verification period)
ALTER TABLE users DROP COLUMN name;
Step 2: Migration Framework
# Using Alembic (Python/SQLAlchemy)
# alembic/versions/001_add_display_name.py
"""Add display_name column"""
revision = '001'
down_revision = None
from alembic import op
import sqlalchemy as sa
def upgrade():
op.add_column('users',
sa.Column('display_name', sa.String(255), nullable=True)
)
def downgrade():
op.drop_column('users', 'display_name')
# Migration commands
alembic upgrade head # Apply all pending migrations
alembic downgrade -1 # Rollback last migration
alembic history # Show migration history
alembic current # Show current version
# Generate migration from model changes
alembic revision --autogenerate -m "add display_name"
Step 3: Testing Migrations
# Test migrations against production-size data
import time
import psycopg2
def test_migration_timing(conn, migration_sql, table_name):
"""Estimate migration time on production data"""
cur = conn.cursor()
# Get row count
cur.execute(f"SELECT COUNT(*) FROM {table_name}")
row_count = cur.fetchone()[0]
# Time the migration on a sample
sample_size = min(100000, row_count)
start = time.time()
cur.execute(f"""
CREATE TEMP TABLE test_migration AS
SELECT * FROM {table_name} LIMIT {sample_size}
""")
# Apply migration to sample
cur.execute(migration_sql.replace(table_name, "test_migration"))
elapsed = time.time() - start
# Extrapolate
estimated_total = (elapsed / sample_size) * row_count
print(f"Rows: {row_count:,}")
print(f"Sample time: {elapsed:.2f}s ({sample_size:,} rows)")
print(f"Estimated full: {estimated_total:.0f}s ({estimated_total/60:.1f} min)")
if estimated_total > 300: # > 5 minutes
print("⚠️ WARNING: Consider batched migration or background job")
conn.rollback()
Step 4: Lock-Free Migrations
-- PostgreSQL: CREATE INDEX CONCURRENTLY (no table lock)
CREATE INDEX CONCURRENTLY idx_orders_customer
ON orders (customer_id);
-- Add column with default (PostgreSQL 11+ is instant)
ALTER TABLE orders ADD COLUMN priority INTEGER DEFAULT 0;
-- Batch updates instead of single UPDATE
DO $$
DECLARE
batch_size INT := 10000;
rows_updated INT;
BEGIN
LOOP
UPDATE users
SET display_name = first_name || ' ' || last_name
WHERE display_name IS NULL
AND id IN (
SELECT id FROM users
WHERE display_name IS NULL
LIMIT batch_size
FOR UPDATE SKIP LOCKED
);
GET DIAGNOSTICS rows_updated = ROW_COUNT;
EXIT WHEN rows_updated = 0;
PERFORM pg_sleep(0.1); -- Brief pause to reduce load
RAISE NOTICE 'Updated % rows', rows_updated;
END LOOP;
END $$;
Migration Safety Checklist
- Migration is backward compatible with current code
- Tested on production-size dataset (timing verified)
- Rollback migration written and tested
- No table-level locks on tables > 100K rows
-
CREATE INDEX CONCURRENTLYused (notCREATE INDEX) - Batch updates used for large backfills
- Migration reviewed by DBA or senior engineer
- Deployed during low-traffic window
- Monitoring in place during and after migration
- Verification queries confirm data integrity
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database consulting, visit garnetgrid.com. :::