Test Suite Quality Assurance¶

This guide explains how to ensure test quality through mutation testing and coverage analysis.

Overview¶

Tools and techniques for evaluating the quality and effectiveness of your test suite. These are NOT test types - they are meta-testing practices that evaluate your existing tests.

1. Mutation Testing (Mutmut)¶

What it is¶

A technique that evaluates test quality by mutating production code and verifying that tests fail.

How it works¶

Mutmut introduces small changes to your code (e.g., + → -, > → >=, True → False)
Runs your existing test suite against each mutation
Verifies that at least one test fails for each mutation
Surviving mutants = mutations that didn’t cause test failures = potential gaps in test coverage

Purpose¶

Verify that your tests are effective at catching bugs, not just achieving high coverage.

When to use¶

After initial test suite is written - Mutation testing measures test quality, not code quality
Validating test quality for high-risk modules - Ensure critical code has strong test coverage
Periodic quality audits - Not every PR (too slow), but periodically to validate test effectiveness
Before major refactoring - Ensure tests will catch regressions

NOT a test type¶

Mutation testing doesn’t write tests - it evaluates your existing tests.

Characteristics¶

Extremely slow: Mutates code and reruns entire test suite for each mutation
Quality signal: Measures test effectiveness (do tests catch bugs?), not just coverage (are lines executed?)
Selective application: Only run on designated high-priority modules
Orthogonal to test dimensions: Works with any test scope, approach, or purpose

Priority Tiers¶

Tier	Modules	Frequency	Rationale
Tier 1 (Always)	`evaluation/metrics.py`, `evaluation/simulation.py`	Every PR	Critical for correctness - errors invalidate all evaluations
Tier 2 (Periodic)	`model/`, `transform/`	Weekly or before release	Important but less critical than metrics
Tier 3 (Rare)	`ingest/`, `utils/`	Monthly or as needed	Lower priority, test coverage may suffice

Usage¶

# Run mutation testing on high-priority module
mutmut run --paths-to-mutate=src/ncaa_eval/evaluation/metrics.py

# Review results
mutmut results

# Show details of surviving mutants (gaps in test coverage)
mutmut show <mutant-id>

# Run only against fast tests (for quicker feedback)
mutmut run --paths-to-mutate=src/ncaa_eval/evaluation/metrics.py --runner="pytest -m smoke"

Interpreting results¶

Mutation Score = (Killed Mutants / Total Mutants) × 100%
Target: 80%+ for Tier 1 modules, 70%+ for Tier 2
Surviving mutants indicate:
- Missing test cases
- Weak assertions (e.g., assert result is not None instead of assert result == expected)
- Dead code (code that can be changed without affecting behavior)

Example surviving mutant (test gap)¶

# Original code
def calculate_margin(home_score, away_score):
    return home_score - away_score

# Mutant (changed - to +)
def calculate_margin(home_score, away_score):
    return home_score + away_score  # Mutmut changed this

# If this mutant survives, it means no test verifies the margin calculation is correct!
# Solution: Add a test with known inputs/outputs
def test_calculate_margin_correctness():
    assert calculate_margin(100, 80) == 20  # Would catch the + mutation

Marker¶

No dedicated marker — mutation testing is run via mutmut as a quality tool, not as a pytest marker.

Execution Tier¶

Tier 1 (Pre-commit): ❌ NO (extremely slow)
Tier 2 (PR/CI): ✅ YES, but only for designated Tier 1 modules

See Execution Guide for details on execution tiers.

2. Coverage Analysis¶

What it is¶

Measurement of what percentage of your code is executed during test runs.

Types of coverage¶

Line coverage: Percentage of code lines executed
Branch coverage: Percentage of conditional branches (if/else) taken
Function coverage: Percentage of functions called

Purpose¶

Identify untested code paths, NOT a measure of test quality.

Important distinction¶

High coverage ≠ Good tests: You can have 100% coverage with weak assertions
Mutation testing verifies test quality; coverage only verifies code execution

Tool¶

pytest-cov

Usage¶

# Generate coverage report
pytest --cov=src/ncaa_eval --cov-report=term-missing

# HTML report for detailed analysis
pytest --cov=src/ncaa_eval --cov-report=html
open htmlcov/index.html

# Branch coverage (more thorough than line coverage)
pytest --cov=src/ncaa_eval --cov-branch --cov-report=term-missing

Coverage is a signal, not a gate¶

Use coverage to identify gaps, but don’t block PRs solely for coverage numbers. A few well-written tests with 70% coverage are better than many weak tests with 100% coverage.

3. Combining Quality Assurance Tools¶

Use both mutation testing and coverage analysis together for comprehensive quality assessment:

Workflow¶

Write tests for your feature
Check coverage - Identify untested code paths
Add tests to cover gaps
Run mutation testing - Verify tests actually catch bugs
Fix surviving mutants - Add stronger assertions or test cases

Example - Comprehensive quality check¶

# Step 1: Run tests with coverage
pytest --cov=src/ncaa_eval/evaluation/metrics.py --cov-report=term-missing

# Coverage: 95% ✓ (high coverage)

# Step 2: Run mutation testing
mutmut run --paths-to-mutate=src/ncaa_eval/evaluation/metrics.py

# Mutation Score: 60% ✗ (many surviving mutants = weak tests!)

# Step 3: Review surviving mutants
mutmut results
mutmut show 42  # Mutation: changed `>` to `>=` and tests still passed

# Step 4: Add test to catch this mutation
def test_threshold_boundary():
    """Regression test: Ensure > is used, not >= (catches mutant #42)."""
    result = is_above_threshold(value=50, threshold=50)
    assert result is False  # Should be False because 50 is NOT > 50

Key Principles¶

✅ Coverage shows what code runs - It identifies untested code paths
✅ Mutation testing shows test effectiveness - It verifies tests catch bugs
✅ Use both together - Coverage finds gaps, mutation testing validates quality
✅ Selective mutation testing - Only run on high-priority modules (too slow for everything)
✅ Coverage is a signal, not a gate - Don’t block PRs solely on coverage numbers
✅ Target 80%+ mutation score - For critical modules (Tier 1)
✅ Weak assertions fail mutation tests - Use specific assertions, not just is not None

Test Suite Quality Assurance¶

Overview¶

1. Mutation Testing (Mutmut)¶

What it is¶

How it works¶

Purpose¶

When to use¶

NOT a test type¶

Characteristics¶

Priority Tiers¶

Usage¶

Interpreting results¶

Example surviving mutant (test gap)¶

Marker¶

Execution Tier¶

2. Coverage Analysis¶

What it is¶

Types of coverage¶

Purpose¶

Important distinction¶

Tool¶

Usage¶

Coverage is a signal, not a gate¶

3. Combining Quality Assurance Tools¶

Workflow¶

Example - Comprehensive quality check¶

Key Principles¶

See Also¶