Verified by Garnet Grid

Data Governance: Building Trust in Your Data

Implement data governance that actually works. Covers data catalog setup, quality rules, ownership models, lineage tracking, and compliance automation.

Data governance fails when it’s treated as a bureaucratic exercise. It succeeds when people see it as “being able to trust the numbers in the dashboard.” This guide gives you the practical implementation path.


Step 1: Establish Data Ownership

Every dataset needs exactly one owner. Not a committee — a person.

Data DomainOwner RoleStewardResponsibilities
CustomerHead of SalesCRM AdminDefine what “customer” means, quality rules
FinancialCFO / ControllerFinance AnalystAccuracy of reporting figures
ProductVP ProductProduct OpsCatalog accuracy, pricing integrity
EmployeeCHROHR Systems AdminPII handling, access controls
OperationalCOOData EngineerPipeline uptime, data freshness

RACI for Data Decisions

DecisionOwnerStewardData EngConsumers
Define business rulesARCI
Data quality thresholdsARCI
Schema changesCARI
Access requestsARCI
Incident responseIARC

Step 2: Build Your Data Catalog

# Example: Register datasets in a lightweight catalog
# Using Great Expectations for documentation

import great_expectations as gx

context = gx.get_context()

# Add a data source
datasource = context.sources.add_postgres(
    "production_db",
    connection_string="postgresql://..."
)

# Create an expectation suite (quality contract)
suite = context.add_expectation_suite("customers_quality")

# Define expectations
suite.add_expectation(
    gx.expectations.ExpectColumnValuesToNotBeNull(column="customer_id")
)
suite.add_expectation(
    gx.expectations.ExpectColumnValuesToBeUnique(column="email")
)
suite.add_expectation(
    gx.expectations.ExpectColumnValuesToBeBetween(
        column="lifetime_value", min_value=0, max_value=10000000
    )
)

# Run validation
results = context.run_checkpoint("daily_checkpoint")
print(f"Validation passed: {results.success}")

Step 3: Implement Data Quality Rules

Quality Dimensions

DimensionDefinitionExample Check
CompletenessNo critical nullsNOT NULL on required fields
AccuracyValues match realityRevenue matches source system
ConsistencySame value everywhereCustomer name same in CRM + billing
TimelinessData is fresh enoughDashboard updates within 1 hour
UniquenessNo duplicatesPrimary key uniqueness
ValidityConforms to business rulesEmail matches regex, age > 0

Automated Quality Pipeline

-- Daily data quality checks (dbt tests pattern)

-- Test: No null customer IDs
SELECT COUNT(*) AS failures
FROM customers
WHERE customer_id IS NULL;

-- Test: Email format validation
SELECT COUNT(*) AS failures
FROM customers
WHERE email NOT LIKE '%_@_%.__%';

-- Test: Revenue must be positive
SELECT COUNT(*) AS failures
FROM orders
WHERE total_amount < 0;

-- Test: Referential integrity
SELECT COUNT(*) AS failures
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;

-- Test: Data freshness (updated within 2 hours)
SELECT CASE
    WHEN MAX(updated_at) < NOW() - INTERVAL '2 hours'
    THEN 1 ELSE 0
END AS stale
FROM orders;

Step 4: Track Data Lineage

Understanding where data comes from and where it goes is essential for trust and debugging.

Source Systems          Transformation          Consumption
┌──────────┐     ┌─────────────────┐     ┌──────────────┐
│ CRM      │────▶│  ETL Pipeline   │────▶│  Dashboard   │
│ (SFDC)   │     │  (Airflow/dbt)  │     │  (Power BI)  │
└──────────┘     └─────────────────┘     └──────────────┘
┌──────────┐            │                ┌──────────────┐
│ ERP      │────▶───────┤               │  ML Model    │
│ (D365)   │            │                │  (Forecast)  │
└──────────┘            ▼                └──────────────┘
┌──────────┐     ┌─────────────────┐     ┌──────────────┐
│ Website  │────▶│  Data Warehouse │────▶│  Ad-hoc SQL  │
│ (GA4)    │     │  (Snowflake)    │     │  (Analysts)  │
└──────────┘     └─────────────────┘     └──────────────┘

dbt Lineage

# dbt model with documentation and lineage
# models/marts/customers.sql
{{ config(
    materialized='table',
    description='Customer dimension with lifetime metrics',
    meta={
        'owner': 'sales_team',
        'sla': 'refreshed by 6 AM daily',
        'pii': true
    }
) }}

SELECT
    c.customer_id,
    c.full_name,
    c.email,
    c.created_at,
    COALESCE(SUM(o.total_amount), 0) AS lifetime_value,
    COUNT(o.order_id) AS total_orders,
    MAX(o.order_date) AS last_order_date
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o
    ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3, 4

Step 5: Access Control and Classification

Data Classification Tiers

TierLabelExamplesAccess
Public🟢Marketing content, pricingAnyone
Internal🟡Revenue metrics, KPIsAll employees
Confidential🟠Customer PII, contractsNeed-to-know
Restricted🔴SSN, payment data, healthRole-based + audit

Implementation

-- Row-level security in PostgreSQL
CREATE POLICY region_access ON customers
    FOR SELECT
    USING (region = current_setting('app.user_region'));

-- Column-level masking
CREATE VIEW customers_masked AS
SELECT
    customer_id,
    full_name,
    CASE
        WHEN current_user IN ('analyst', 'report_user')
        THEN '***@***.***'
        ELSE email
    END AS email,
    CASE
        WHEN current_user IN ('analyst', 'report_user')
        THEN 'XXX-XX-' || RIGHT(ssn, 4)
        ELSE ssn
    END AS ssn
FROM customers;

Governance Checklist

  • Data ownership assigned (one owner per domain)
  • Data catalog with searchable metadata
  • Quality rules defined for every critical dataset
  • Automated quality checks running daily
  • Data lineage documented (source → transform → consumption)
  • Classification tiers defined and enforced
  • Row/column-level security implemented for PII
  • Access request process with approval workflow
  • Quarterly data quality review meetings
  • Incident response plan for data quality failures

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data strategy consulting, visit garnetgrid.com. :::