Test Purpose Guide: Functional, Performance, and Regression Testing¶
This guide explains why you’re writing a test - the purpose or goal of the test.
Overview¶
Test purpose is one of the four orthogonal dimensions of testing. These categories are orthogonal to both scope and approach - a single test can serve multiple purposes.
Functional Testing¶
What it is¶
Tests that verify the system behaves correctly according to functional requirements.
Purpose¶
Ensure the code produces the correct output for given inputs and meets business/domain requirements.
When to use¶
Always - This is the primary purpose of most tests
Verifying business logic correctness
Validating API contracts and interfaces
Ensuring data transformations produce expected results
Characteristics¶
Tests correctness of behavior
Verifies outputs match specifications
Can be any scope (unit, integration) and any approach (example-based, property-based)
Examples¶
Functional unit test (example-based):
def test_brier_score_correct_formula():
"""Verify Brier score calculation follows the correct mathematical formula."""
predictions = np.array([0.8, 0.3, 0.6])
actuals = np.array([1, 0, 1])
# Brier = mean((prediction - actual)^2)
expected = ((0.8-1)**2 + (0.3-0)**2 + (0.6-1)**2) / 3
result = brier_score(predictions, actuals)
assert abs(result - expected) < 1e-10
Functional integration test (property-based):
@pytest.mark.integration
@pytest.mark.property
@given(season=st.integers(2015, 2025))
def test_games_loaded_have_required_fields(season):
"""Verify all loaded games contain required fields (functional correctness)."""
games = load_games_for_season(season)
required_fields = ["season", "day_num", "w_team_id", "l_team_id", "w_score", "l_score"]
for game in games:
for field in required_fields:
assert hasattr(game, field), f"Game missing required field: {field}"
Marker¶
No specific marker (default assumption for all tests). Combine with scope/approach markers.
Performance Testing¶
What it is¶
Tests that verify the system meets performance requirements (speed, memory, throughput).
Purpose¶
Ensure code executes within acceptable time/resource bounds, especially for performance-critical operations.
When to use¶
Vectorization compliance - Verify critical paths don’t use Python loops (NFR1)
Benchmark targets - Ensure operations meet speed requirements (e.g., 60-second backtest)
Regression prevention - Detect performance degradation over time
Resource limits - Verify memory usage stays within bounds
Characteristics¶
Tests speed/efficiency of execution
Often includes timing assertions or benchmarks
Critical for this project due to 60-second backtest target
Typically marked as
@pytest.mark.slow(excluded from pre-commit)
Examples¶
Performance unit test (assertion-based):
import timeit
import pytest
@pytest.mark.slow
@pytest.mark.performance
def test_brier_score_vectorized_performance():
"""Verify Brier score calculation meets performance target (vectorized)."""
predictions = np.random.rand(100_000)
actuals = np.random.randint(0, 2, 100_000)
# Should complete in < 10ms for 100k predictions (vectorized)
time_taken = timeit.timeit(
lambda: brier_score(predictions, actuals),
number=10
) / 10 # Average per iteration
assert time_taken < 0.01, f"Brier score too slow: {time_taken:.4f}s (target: < 0.01s)"
Performance integration test (benchmark-based):
@pytest.mark.slow
@pytest.mark.performance
@pytest.mark.integration
def test_full_backtest_meets_60_second_target():
"""Verify 10-year Elo backtest completes within 60 seconds (NFR1)."""
games = load_games_for_years(range(2013, 2023))
start_time = timeit.default_timer()
model = EloModel()
model.fit(games)
predictions = model.predict(games)
elapsed = timeit.default_timer() - start_time
assert elapsed < 60.0, f"Backtest too slow: {elapsed:.2f}s (target: < 60s)"
Vectorization compliance test:
import inspect
@pytest.mark.smoke
@pytest.mark.performance
def test_metric_calculations_are_vectorized():
"""Verify critical metric calculations don't use Python loops (NFR1)."""
from ncaa_eval.evaluation import metrics
# Check that metric module doesn't contain for loops over DataFrames
source = inspect.getsource(metrics)
# This is a heuristic - manual review is still needed
forbidden_patterns = [
"for _ in df.iterrows()",
".iterrows()",
".itertuples()",
"for row in df",
]
for pattern in forbidden_patterns:
assert pattern not in source, \
f"Metrics module contains non-vectorized pattern: {pattern}"
Note: The ingest layer (
src/ncaa_eval/ingest/) is excluded from theiterrows()forbidden-pattern check only.iterrows()is explicitly permitted there for one-time-per-sync data transformation (see Style Guide Section 5, Exception #4). The.itertuples()andfor row in dfpatterns remain prohibited in business-logic and metric-calculation code; their use elsewhere (ingest connectors, model prediction loops) is evaluated case-by-case against the vectorization mandate.
Marker¶
@pytest.mark.performance (combine with scope markers like @pytest.mark.integration)
Pre-commit¶
Generally ❌ NO (too slow), except for quick vectorization smoke checks
PR-time¶
✅ YES - Essential for NFR1 compliance
Regression Testing¶
What it is¶
Tests written specifically to prevent previously discovered bugs from recurring.
Purpose¶
Ensure fixed bugs stay fixed as the codebase evolves.
When to use¶
After fixing a bug - Always write a regression test that would have caught the bug
For critical bugs - High-severity or hard-to-debug issues need regression coverage
Before refactoring - Write regression tests for existing behavior before major changes
Characteristics¶
Tests specific scenarios that previously failed
Often includes a reference to the bug/issue (e.g., “Regression test for #42”)
Can be any scope (unit, integration) and any approach (example-based, property-based)
Usually example-based (tests the specific failing case)
Examples¶
Regression unit test (example-based):
@pytest.mark.regression
def test_elo_rating_never_negative_after_extreme_losses():
"""Regression test: Elo ratings went negative after 50+ consecutive losses.
Bug: Issue #42 - Elo.update() didn't properly floor ratings at minimum value.
Fixed: 2026-01-15 - Added max(rating, MIN_RATING) in update logic.
"""
engine = EloFeatureEngine(elo_config)
# Simulate 100 consecutive losses to much higher-rated opponent
for game in losing_streak_games:
engine.update_game(game)
# Should never go below minimum rating (e.g., 0 or 100)
assert all(r >= 0 for r in engine.ratings.values()), "Elo rating went negative"
Regression integration test (example-based):
@pytest.mark.regression
@pytest.mark.integration
def test_chronological_api_rejects_exact_cutoff_date():
"""Regression test: API allowed games ON cutoff date (data leakage).
Bug: Issue #87 - get_games_before() used <= instead of < for date comparison.
Fixed: 2026-02-10 - Changed to strict less-than comparison.
"""
api = ChronologicalDataServer()
cutoff = "2023-03-15"
# Load games before cutoff
games = api.get_games_before(cutoff_date=cutoff)
# NO game should have date >= cutoff (strict less-than)
from datetime import datetime
cutoff_dt = datetime.fromisoformat(cutoff)
for game in games:
game_date = datetime.fromisoformat(game.date)
assert game_date < cutoff_dt, \
f"Game on/after cutoff leaked through: {game.date} (cutoff: {cutoff})"
Regression test with property-based approach:
@pytest.mark.regression
@pytest.mark.property
@given(k_factor=st.floats(1.0, 100.0))
def test_elo_update_stable_for_all_k_factors(k_factor):
"""Regression test: Elo exploded with large k_factors.
Bug: Issue #56 - No upper bound on rating change led to overflow.
Fixed: 2026-01-20 - Clamped rating changes to reasonable bounds.
"""
config = EloConfig(k_factor=k_factor)
engine = EloFeatureEngine(config)
engine.update_game(sample_game)
# Rating should stay within reasonable bounds
for rating in engine.ratings.values():
assert 0 <= rating <= 3000, f"Rating exploded with k_factor={k_factor}: {rating}"
Best Practice - Always include bug context¶
@pytest.mark.regression
def test_specific_bug_fixed():
"""Regression test: [Brief description of bug]
Bug: Issue #[number] or [Description of when it was discovered]
Fixed: [Date] - [Brief description of fix]
Test: [What this test verifies]
"""
# Test code that would have failed before the fix
pass
Marker¶
@pytest.mark.regression (combine with scope markers)
Pre-commit¶
✅ YES if fast (mark as @pytest.mark.smoke)
❌ NO if slow
PR-time¶
✅ YES - Always run regression tests
Combining Multiple Purposes¶
A single test can have multiple purposes:
Functional + Performance:
@pytest.mark.performance
def test_brier_score_correct_and_fast():
"""Verify Brier score is correct AND meets performance target."""
predictions = np.array([0.8, 0.3])
actuals = np.array([1, 0])
# Functional: Verify correctness
expected = ((0.8-1)**2 + (0.3-0)**2) / 2
result = brier_score(predictions, actuals)
assert abs(result - expected) < 1e-10
# Performance: Verify speed for large datasets
large_preds = np.random.rand(100_000)
large_actuals = np.random.randint(0, 2, 100_000)
import timeit
time_taken = timeit.timeit(
lambda: brier_score(large_preds, large_actuals),
number=10
) / 10
assert time_taken < 0.01 # < 10ms for 100k predictions
Functional + Regression:
@pytest.mark.regression
@pytest.mark.integration
def test_walk_forward_cv_prevents_leakage_bug():
"""Regression + Functional: Verify walk-forward CV doesn't leak data.
Bug: Issue #92 - CV splitter leaked one day of overlap between train/test.
Fixed: 2026-02-12 - Changed to strict < comparison for split boundaries.
"""
splitter = WalkForwardSplitter(years=range(2015, 2020))
games = load_games_for_years(range(2015, 2020))
for train_data, test_data, year in splitter.split(games):
train_max_date = train_data['date'].max()
test_min_date = test_data['date'].min()
# Functional: Verify no overlap
assert train_max_date < test_min_date
# Regression: Specific case that failed before fix
if year == 2017:
# This specific split had overlap before the fix
assert train_max_date != test_min_date
All three purposes:
@pytest.mark.integration
@pytest.mark.property
@pytest.mark.performance
@pytest.mark.regression
@given(season=st.integers(2015, 2025))
def test_game_loading_comprehensive(season):
"""Functional + Performance + Regression: Comprehensive game loading test.
Functional: Verify all games have required fields
Performance: Verify loading completes in reasonable time
Regression: Issue #103 - Loading 2019 season caused memory overflow
"""
import timeit
# Performance: Time the loading
start = timeit.default_timer()
games = load_games_for_season(season)
elapsed = timeit.default_timer() - start
# Functional: Verify correctness
assert len(games) > 0
required_fields = ["season", "day_num", "w_team_id", "l_team_id"]
for game in games:
for field in required_fields:
assert hasattr(game, field)
# Performance: Should complete in < 5 seconds for any season
assert elapsed < 5.0
# Regression: 2019 season specifically caused issues
if season == 2019:
import sys
memory_mb = sys.getsizeof(games) / (1024 * 1024)
assert memory_mb < 100 # Should stay under 100MB
See Also¶
Test Scope Guide - Unit vs Integration tests
Test Approach Guide - Example-based vs Property-based
Domain Testing Guide - Performance and data leakage testing details
Execution Guide - When tests/checks run (4-tier model)
Quality Assurance Guide - Mutation testing, coverage analysis