AI Governance Certification (CS-003)

AI Governance vs. AI Evaluation: What's the Difference?

Evaluation measures whether AI systems perform as intended. Governance is the institutional framework ensuring evaluation happens, its results are acted upon, and accountability is maintained. Think of it this way: evaluation is the measurement layer. Governance is the institutional layer that makes measurement coherent and effective.

Evaluation without governance: Teams measure things in isolation. Some teams rigorously evaluate; others don't. Results sit in reports unread. No one enforces standards. Measurement changes nothing.

Governance without evaluation: Bureaucracy that measures nothing. Committees meet, policies exist, but they're divorced from actual measurement. This is box-ticking governance — all structure, no substance.

Governance WITH evaluation: Policies require evaluation. Committees review eval results and make decisions. Standards ensure eval is rigorous and consistent. Accountability flows from measurement.

$5.4B

AI governance market by 2028

62%

orgs deploying AI without formal governance

incident reduction with mature governance

The AI Governance Framework Components

AI Inventory

Every deployed AI system documented: name, deployment context (internal/customer-facing/critical), domain, autonomy level. This is foundational. You can't govern what you don't track.

Risk Stratification

Classify systems by risk tier. High-risk systems get stringent evaluation and monitoring. Low-risk systems get lighter governance. Resource constraints are real; strategic risk stratification optimizes governance effort.

Risk tier criteria: autonomy level (how much can the AI decide without human intervention), reversibility of decisions (are decisions hard to undo?), population affected (how many users?), regulated domain (healthcare, finance, etc.)?

Policy Documents

Clear policies covering: model selection (which models can be used?), data governance (how is training data sourced and maintained?), deployment authorization (who approves production deployment?), ongoing monitoring (what metrics must be tracked?), incident response (what constitutes an AI incident?)

Standards

Technical standards for eval methodology (CS-001 through CS-004 in the eval.qa framework). These ensure consistency across teams and domains.

Processes

Regular review cycles (when is eval done?), escalation paths (if eval uncovers problems, who decides remediation?), exception handling (when can policies be waived?)

Accountability

Named owners for each AI system. Clear escalation chains. Transparent decision-making. "Who approved this deployment?" must have a clear answer.

Audit Trails

Immutable logs of: evaluation results, deployment decisions, changes to models or data, incident reports. Critical for regulatory compliance and post-mortem analysis.

Risk Classification Frameworks

EU AI Act Risk Tiers

Unacceptable Risk: Banned. Examples: AI for mass scoring of social credit.

High Risk: Subject to strict requirements. Examples: hiring, credit decisions, medical diagnosis, law enforcement. Requires impact assessments, human oversight, clear documentation.

Limited Risk: Transparency obligations. Examples: chatbots must disclose they're AI.

Minimal Risk: No requirements.

NIST AI RMF Risk Categories

Not tied to specific risk tiers but categorizes risks: performance, security, resilience, privacy, fairness, accountability, transparency. Assess org risk tolerance for each.

eval.qa Internal Classification

Tier 1 (Critical): Mission-critical, regulated domains, large user populations. Examples: core product recommendation, compliance systems.

Tier 2 (Operational): Customer-facing, operational impact. Examples: customer support chatbot.

Tier 3 (Low-Stake): Internal tools, limited impact. Examples: internal documentation search.

Policy Architecture for AI Governance

Model Governance Policy

Defines: which AI models can be used, approval process for new models, model update procedures, vendor management for third-party models. Example: "Only models with documented training data and third-party safety audit approval can be deployed to production."

Data Governance for AI

Training data lineage, PII handling, data retention, handling of biased or problematic data. Example: "All training data must be documented with source, date, and any known limitations. Biased data subsets must be documented and handled explicitly."

Evaluation Policy

Minimum eval requirements before deployment, evaluation cadence in production, when to halt updates. Example: "Tier 1 systems require 80+ hour eval before deployment. Tier 2 require 30+ hours. Eval must cover core functionality, edge cases, and adversarial scenarios."

Incident Response Policy

What constitutes an AI incident (unintended behavior, security breach, performance degradation), escalation path, notification requirements, remediation timeline. Example: "AI errors affecting >100 customers = critical incident. Notify exec team within 1 hour. Remediate within 24 hours."

Vendor Management Policy

For third-party AI systems or models: SLAs, audit rights, data handling requirements, exit procedures. Example: "All AI vendor contracts must include 30-day wind-down clause and require vendors to provide model weights and training data upon contract termination."

Human Override Policy

When must humans be in the loop? What authority do they have? Can humans override AI recommendations? Example: "High-risk decisions must have human review before execution. Humans may override AI with documented justification."

The AI Governance Committee

Charter

Formal charter defining: authority (can the committee block deployments?), scope (all AI systems or only certain domains?), membership, meeting cadence, decision-making process.

Recommended Composition

CTO or CAIO (chair), Legal, Risk, Data Privacy, AI Engineering lead, Product, External ethics advisor. This mix balances technical expertise, business perspective, and governance concerns.

Responsibilities

Approve/reject AI deployments
Review eval results for high-risk systems
Approve policy exceptions
Respond to AI incidents
Set strategic direction for AI governance

Documentation

Meeting minutes (decisions made, dissenting opinions), decision rationale (why was this deployment approved?), action items. This creates accountability and allows audit trails.

Audit-Ready AI Governance

If regulators audit your AI program, what will they want to see?

What Regulators Look For

EU AI Act (Articles 9-17): Risk management (was the system identified as high-risk and subjected to assessments?), technical documentation (is there clear documentation of training data, model architecture, eval results?), human oversight (are humans involved in key decisions?), transparency (are users told when interacting with AI?).

FDA SaMD Guidance: AI/ML-based medical software must have: performance specifications (how accurate is it?), benefit/risk analysis, validation evidence (testing and eval), post-market surveillance plan.

The 12-Document Governance Evidence Pack

Organizations audited should have these 12 documents ready:

AI System Inventory and Risk Stratification
AI Governance Policy Framework
AI Governance Committee Charter
Evaluation Standards and Procedures (CS-001 through CS-004)
Sample Deployment Clearance Reports (DCRs)
Incident Response Logs (last 12 months)
Model Training Data Documentation
Third-Party Vendor Contracts
Human Override Audit Logs
Audit Committee Meeting Minutes (last 12 months)
Post-Deployment Monitoring Dashboards
Training Materials for AI Users and Developers

Governance Maturity Model

Level 1 — Ad Hoc: No formal AI governance. Decisions made informally. No documentation. High risk.

Level 2 — Developing: Basic AI inventory exists. Some policies written. Inconsistent enforcement. Governance Committee meets irregularly.

Level 3 — Defined: Formal policies for all AI systems. Committee structure in place. Regular reviews. Documented decisions.

Level 4 — Managed: Metrics-driven governance. Quantitative oversight of AI system health. Integrated risk management with other enterprise risk frameworks.

Level 5 — Optimizing: Continuous improvement of governance. Predictive risk management (flagging problems before they emerge). Industry thought leadership.

Most organizations are at Level 1-2. Moving to Level 3 (defined) is achievable in 12-18 months with dedicated effort.

The AI Governance Evaluation Stack: Policies to Metrics

Layer 1: Policies — "What do we believe about AI quality? What are our principles?"

Example: "We believe all AI systems must be fair. Gender disparity <2pp is acceptable."

Layer 2: Processes — "How do we implement policies?"

Example: "Fairness audits conducted quarterly. Gender disparity measured on all systems."

Layer 3: Controls — "What gates prevent bad systems from reaching production?"

Example: "Systems with >2pp gender disparity blocked from deploy. Escalate to governance committee."

Layer 4: Metrics — "How do we measure if controls are working?"

Example: "% of systems passing fairness gate. Median disparity of deployed systems. Time to remediation for failed audits."

Layer 5: Reporting — "Who knows about this? What actions result?"

Example: "Quarterly governance report to board. Annual external audit. Public AI fairness commitment."

AI Governance Maturity Model: 5-Level Framework

Level 1: Initial / Ad-Hoc

Governance happens reactively (after problems found)
No formal process or committee
No metrics tracking
Typical: Startups, early-stage companies

Level 2: Developing / Partially Defined

Basic governance committee exists
Some documented policies (informal)
Evaluation conducted but not systematic
No public reporting
Typical: Growth-stage companies, some structure

Level 3: Defined / Structured

Formal governance charter and committee
Clear policies documented
Systematic evaluation (quarterly or more)
Internal dashboards monitoring governance metrics
Some public reporting of governance activities
Typical: Enterprise companies, regulated industries

Level 4: Optimized / Advanced

Integrated governance across organization
Continuous evaluation (not just quarterly)
Proactive risk identification
Documented incident response procedures
Regular third-party audits
Public transparency reports on AI governance
Typical: Large tech companies, high regulatory scrutiny

Level 5: Leading / Exemplary

AI governance deeply embedded in culture
Real-time governance monitoring
Leading industry practices; publishing governance research
Zero governance incidents (or near-zero)
Partnerships with regulators, academia
Public commitments exceeded regularly
Typical: Industry leaders setting standards

Evaluating AI Governance Programs: Are You Actually Governing?

The Problem: Organizations claim good governance but don't actually enforce it. They have policies but no accountability.

Audit Questions:

Do you have a documented AI governance policy? (Can you show it?)
Who is accountable for governance? (Named person/committee?)
What happens if a system violates policy? (Consequences?) — If answer is "nothing" or "unclear," governance is performative
Do you measure governance metrics? (Dashboards?) — If no metrics, no governance
Has a system ever been blocked from deployment due to governance? (Yes? Then governance is real. No? Then it's not.)
Can you point to recent incidents and how you resolved them? (Documented?) — If no incident documentation, governance is missing
Do you conduct annual independent audits? (Third-party validation?)

If you answer "no" to more than 2 questions, governance is weak/performative.

Incident Response Governance: When AI Fails in Production

Three-Phase Framework:

Phase 1: Detection & Containment (0-4 hours)

Incident detected: Accuracy dropped, bias found, or customer complaint
Immediate action: Notify governance committee; potentially roll back system
Goal: Stop damage before it spreads

Phase 2: Investigation & Remediation (4-48 hours)

Root cause analysis: What went wrong? Was it predicted by governance?
Fix: Retrain, retune, or redesign
Validation: Re-evaluate before redeploying

Phase 3: Learning & Prevention (1-4 weeks)

Post-mortem: How could governance have prevented this?
Policy update: What should we change to prevent recurrence?
Monitoring: Add metrics to catch similar issues earlier next time
Communication: Report to leadership, customers (if needed), public (if warranted)

AI Governance Audit Methodology: How Third Parties Evaluate Governance

Audit Scope: Documentation review, interviews, system testing, metrics analysis

Audit Questions:

Are governance policies actually enforced? (Check recent decision logs)
Are evaluation metrics accurate? (Validate against spot-checks)
Are governance decisions documented? (See records)
Is there a working escalation path? (Interview committee members)
Are third parties involved appropriately? (Check for conflicts)

Audit Output: Report with findings, risks, recommendations. Remediation roadmap.

Building the AI Governance Committee

Ideal Composition (8-12 people):

Chair: Executive sponsor (VP+ level; has authority)
CTO/Chief ML Officer: Technical authority
Head of Compliance/Legal: Regulatory knowledge
Chief Data Officer: Data governance liaison
Head of Ethics: Ethical review (if separate from compliance)
VP Product: Business perspective (product roadmap tradeoffs)
External advisor: Academic or industry expert (external perspective)
Community representative: User/affected community perspective (if high-risk domain)

Committee Responsibilities:

Quarterly governance reviews (any issues? any improvements?)
System approval for deployment (Go/no-go decisions)
Incident response decisions (Roll back? Fix forward?)
Policy updates (Thresholds changing? New risks emerging?)
Public reporting (Transparency on governance activities)

Operating Cadence: Monthly full committee meetings. Weekly chair + CTO syncs. Quarterly all-hands review.

AI Governance Certification (CS-003)

AI Governance vs. AI Evaluation: What's the Difference?

The AI Governance Framework Components

AI Inventory

Risk Stratification

Policy Documents

Standards

Processes

Accountability

Audit Trails

Risk Classification Frameworks

EU AI Act Risk Tiers

NIST AI RMF Risk Categories

eval.qa Internal Classification

Policy Architecture for AI Governance

Model Governance Policy

Data Governance for AI

Evaluation Policy

Incident Response Policy

Vendor Management Policy

Human Override Policy

The AI Governance Committee

Charter

Recommended Composition

Responsibilities

Documentation

Audit-Ready AI Governance

What Regulators Look For

The 12-Document Governance Evidence Pack

Governance Maturity Model

The AI Governance Evaluation Stack: Policies to Metrics

AI Governance Maturity Model: 5-Level Framework

Evaluating AI Governance Programs: Are You Actually Governing?

Incident Response Governance: When AI Fails in Production

AI Governance Audit Methodology: How Third Parties Evaluate Governance

Building the AI Governance Committee

Key Takeaways

Build Your AI Governance Program

AI Governance Certification (CS-003)

AI Governance vs. AI Evaluation: What's the Difference?

The AI Governance Framework Components

AI Inventory

Risk Stratification

Policy Documents

Standards

Processes

Accountability

Audit Trails

Risk Classification Frameworks

EU AI Act Risk Tiers

NIST AI RMF Risk Categories

eval.qa Internal Classification

Policy Architecture for AI Governance

Model Governance Policy

Data Governance for AI

Evaluation Policy

Incident Response Policy

Vendor Management Policy

Human Override Policy

The AI Governance Committee

Charter

Recommended Composition

Responsibilities

Documentation

Audit-Ready AI Governance

What Regulators Look For

The 12-Document Governance Evidence Pack

Governance Maturity Model

The AI Governance Evaluation Stack: Policies to Metrics

AI Governance Maturity Model: 5-Level Framework

Evaluating AI Governance Programs: Are You Actually Governing?

Incident Response Governance: When AI Fails in Production

AI Governance Audit Methodology: How Third Parties Evaluate Governance

Building the AI Governance Committee

Key Takeaways

Build Your AI Governance Program

Related Lessons