Test Suite Quality Assurance¶
This guide explains how to ensure test quality through mutation testing and coverage analysis.
Overview¶
Tools and techniques for evaluating the quality and effectiveness of your test suite. These are NOT test types - they are meta-testing practices that evaluate your existing tests.
1. Mutation Testing (Mutmut)¶
What it is¶
A technique that evaluates test quality by mutating production code and verifying that tests fail.
How it works¶
Mutmut introduces small changes to your code (e.g.,
+→-,>→>=,True→False)Runs your existing test suite against each mutation
Verifies that at least one test fails for each mutation
Surviving mutants = mutations that didn’t cause test failures = potential gaps in test coverage
Purpose¶
Verify that your tests are effective at catching bugs, not just achieving high coverage.
When to use¶
After initial test suite is written - Mutation testing measures test quality, not code quality
Validating test quality for high-risk modules - Ensure critical code has strong test coverage
Periodic quality audits - Not every PR (too slow), but periodically to validate test effectiveness
Before major refactoring - Ensure tests will catch regressions
NOT a test type¶
Mutation testing doesn’t write tests - it evaluates your existing tests.
Characteristics¶
Extremely slow: Mutates code and reruns entire test suite for each mutation
Quality signal: Measures test effectiveness (do tests catch bugs?), not just coverage (are lines executed?)
Selective application: Only run on designated high-priority modules
Orthogonal to test dimensions: Works with any test scope, approach, or purpose
Priority Tiers¶
Tier |
Modules |
Frequency |
Rationale |
|---|---|---|---|
Tier 1 (Always) |
|
Every PR |
Critical for correctness - errors invalidate all evaluations |
Tier 2 (Periodic) |
|
Weekly or before release |
Important but less critical than metrics |
Tier 3 (Rare) |
|
Monthly or as needed |
Lower priority, test coverage may suffice |
Usage¶
# Run mutation testing on high-priority module
mutmut run --paths-to-mutate=src/ncaa_eval/evaluation/metrics.py
# Review results
mutmut results
# Show details of surviving mutants (gaps in test coverage)
mutmut show <mutant-id>
# Run only against fast tests (for quicker feedback)
mutmut run --paths-to-mutate=src/ncaa_eval/evaluation/metrics.py --runner="pytest -m smoke"
Interpreting results¶
Mutation Score = (Killed Mutants / Total Mutants) × 100%
Target: 80%+ for Tier 1 modules, 70%+ for Tier 2
Surviving mutants indicate:
Missing test cases
Weak assertions (e.g.,
assert result is not Noneinstead ofassert result == expected)Dead code (code that can be changed without affecting behavior)
Example surviving mutant (test gap)¶
# Original code
def calculate_margin(home_score, away_score):
return home_score - away_score
# Mutant (changed - to +)
def calculate_margin(home_score, away_score):
return home_score + away_score # Mutmut changed this
# If this mutant survives, it means no test verifies the margin calculation is correct!
# Solution: Add a test with known inputs/outputs
def test_calculate_margin_correctness():
assert calculate_margin(100, 80) == 20 # Would catch the + mutation
Marker¶
No dedicated marker — mutation testing is run via mutmut as a quality tool, not as a pytest marker.
Execution Tier¶
Tier 1 (Pre-commit): ❌ NO (extremely slow)
Tier 2 (PR/CI): ✅ YES, but only for designated Tier 1 modules
See Execution Guide for details on execution tiers.
2. Coverage Analysis¶
What it is¶
Measurement of what percentage of your code is executed during test runs.
Types of coverage¶
Line coverage: Percentage of code lines executed
Branch coverage: Percentage of conditional branches (if/else) taken
Function coverage: Percentage of functions called
Purpose¶
Identify untested code paths, NOT a measure of test quality.
Important distinction¶
High coverage ≠ Good tests: You can have 100% coverage with weak assertions
Mutation testing verifies test quality; coverage only verifies code execution
Tool¶
pytest-cov
Usage¶
# Generate coverage report
pytest --cov=src/ncaa_eval --cov-report=term-missing
# HTML report for detailed analysis
pytest --cov=src/ncaa_eval --cov-report=html
open htmlcov/index.html
# Branch coverage (more thorough than line coverage)
pytest --cov=src/ncaa_eval --cov-branch --cov-report=term-missing
Coverage is a signal, not a gate¶
Use coverage to identify gaps, but don’t block PRs solely for coverage numbers. A few well-written tests with 70% coverage are better than many weak tests with 100% coverage.
3. Combining Quality Assurance Tools¶
Use both mutation testing and coverage analysis together for comprehensive quality assessment:
Workflow¶
Write tests for your feature
Check coverage - Identify untested code paths
Add tests to cover gaps
Run mutation testing - Verify tests actually catch bugs
Fix surviving mutants - Add stronger assertions or test cases
Example - Comprehensive quality check¶
# Step 1: Run tests with coverage
pytest --cov=src/ncaa_eval/evaluation/metrics.py --cov-report=term-missing
# Coverage: 95% ✓ (high coverage)
# Step 2: Run mutation testing
mutmut run --paths-to-mutate=src/ncaa_eval/evaluation/metrics.py
# Mutation Score: 60% ✗ (many surviving mutants = weak tests!)
# Step 3: Review surviving mutants
mutmut results
mutmut show 42 # Mutation: changed `>` to `>=` and tests still passed
# Step 4: Add test to catch this mutation
def test_threshold_boundary():
"""Regression test: Ensure > is used, not >= (catches mutant #42)."""
result = is_above_threshold(value=50, threshold=50)
assert result is False # Should be False because 50 is NOT > 50
Key Principles¶
✅ Coverage shows what code runs - It identifies untested code paths
✅ Mutation testing shows test effectiveness - It verifies tests catch bugs
✅ Use both together - Coverage finds gaps, mutation testing validates quality
✅ Selective mutation testing - Only run on high-priority modules (too slow for everything)
✅ Coverage is a signal, not a gate - Don’t block PRs solely on coverage numbers
✅ Target 80%+ mutation score - For critical modules (Tier 1)
✅ Weak assertions fail mutation tests - Use specific assertions, not just
is not None
See Also¶
Test Scope Guide - Unit vs Integration tests
Test Approach Guide - Example-based vs Property-based
Test Purpose Guide - Functional, Performance, Regression
Execution Guide - When tests and checks run (4-tier model)
Conventions Guide - Fixtures, markers, organization, coverage targets
Domain Testing Guide - Performance and data leakage testing