Testing Conventions¶
This guide covers test organization, fixtures, markers, and naming conventions.
Test Organization¶
Directory Structure¶
tests/
├── __init__.py
├── conftest.py # Shared fixtures
├── fixtures/
│ ├── .gitkeep
│ └── kaggle/
│ ├── MNCAATourneyCompactResults.csv
│ ├── MRegularSeasonCompactResults.csv
│ ├── MSeasons.csv
│ └── MTeams.csv
├── integration/
│ ├── __init__.py
│ ├── test_documented_commands.py # E2E CLI documentation tests
│ ├── test_elo_integration.py # Integration: Elo pipeline
│ ├── test_feature_serving_integration.py # Integration: feature serving
│ └── test_sync.py # Integration: ingest → storage
└── unit/
├── __init__.py
├── test_bracket_page.py
├── test_bracket_renderer.py
├── test_calibration.py
├── test_chronological_serving.py
├── test_cli_train.py
├── test_connector_base.py
├── test_dashboard_app.py
├── test_dashboard_filters.py
├── test_deep_dive_page.py
├── test_elo.py
├── test_espn_connector.py
├── test_evaluation_backtest.py
├── test_evaluation_metrics.py
├── test_evaluation_plotting.py
├── test_evaluation_simulation.py
├── test_evaluation_splitter.py
├── test_feature_serving.py
├── test_framework_validation.py
├── test_fuzzy.py
├── test_graph.py
├── test_home_page.py
├── test_imports.py
├── test_kaggle_connector.py
├── test_leaderboard_page.py
├── test_logger.py
├── test_model_base.py
├── test_model_elo.py
├── test_model_logistic_regression.py
├── test_model_registry.py
├── test_model_tracking.py
├── test_model_xgboost.py
├── test_normalization.py
├── test_opponent.py
├── test_package_structure.py
├── test_pool_scorer_page.py
├── test_repository.py
├── test_run_store_metrics.py
├── test_schema.py
└── test_sequential.py
Naming Conventions¶
Entity |
Convention |
Example |
|---|---|---|
Test files |
|
|
Test functions |
|
|
Fixture functions |
Descriptive name (no suffix) |
|
Test classes |
|
|
Pytest Discovery¶
Pytest automatically discovers:
Files matching
test_*.pyor*_test.pyFunctions matching
test_*()Classes matching
Test*
No custom discovery configuration is needed (already set in pyproject.toml).
Fixture Conventions¶
Fixture Scope¶
Scope |
Lifetime |
Use Case |
Example |
|---|---|---|---|
|
Per test function (default) |
Independent test data, reset each test |
|
|
Per test class |
Shared setup for class methods |
|
|
Per test file |
Expensive setup, reused across file |
|
|
Per test session |
One-time setup for all tests |
|
Fixture Organization¶
tests/conftest.py: Project-wide fixtures (e.g.,sample_teams(),elo_config(),temp_data_dir())Inline fixtures: Simple fixtures can be defined in test files if not reused elsewhere
Fixture Best Practices¶
1. Type annotations: All fixtures must have return type annotations (mypy –strict compliance)
from __future__ import annotations
import pytest
from ncaa_eval.ingest.schema import Game
@pytest.fixture
def sample_game() -> Game:
"""Provide a sample game for testing."""
return Game(season=2023, day_num=100, w_team_id=1234, l_team_id=5678, w_score=75, l_score=70)
2. Parametrized fixtures: Use @pytest.fixture(params=[...]) for testing multiple scenarios
@pytest.fixture(params=[1500, 1800, 2100])
def elo_rating(request: pytest.FixtureRequest) -> int:
"""Provide different Elo ratings for testing."""
return request.param
3. Teardown with yield: Use yield for setup/teardown patterns
from typing import Iterator
from pathlib import Path
import shutil
@pytest.fixture
def temp_data_dir() -> Iterator[Path]:
"""Create temporary directory for test data."""
temp_dir = Path("test_data_temp")
temp_dir.mkdir(exist_ok=True)
yield temp_dir
# Teardown: clean up after test
shutil.rmtree(temp_dir)
Test Markers¶
Pytest markers enable selective test execution for pre-commit vs. PR-time distinction. Markers can be combined across all dimensions.
Marker Definitions¶
Marker |
Dimension |
Purpose |
Command |
|---|---|---|---|
|
Speed |
Fast smoke tests for pre-commit (< 1s each; smoke subset < 5s; Tier 1 overall < 10s) |
|
|
Speed |
Slow tests excluded from pre-commit (> 5 seconds each) |
|
|
Scope |
Pure unit tests with no I/O or external dependencies |
|
|
Scope |
Integration tests (I/O, database) |
|
|
Approach |
Property-based tests (Hypothesis) |
|
|
Purpose |
Performance/benchmark tests |
|
|
Purpose |
Regression tests (prevent bug recurrence) |
|
|
Quality |
Tests incompatible with mutmut runner directory ( |
N/A |
Marker Configuration¶
Markers are configured in pyproject.toml:
[tool.pytest.ini_options]
markers = [
"smoke: Fast smoke tests for pre-commit (< 5s smoke subset; Tier 1 overall < 10s)",
"slow: Slow tests excluded from pre-commit (> 5 seconds each)",
"integration: Integration tests with I/O or external dependencies",
"property: Hypothesis property-based tests",
"performance: Performance and benchmark tests",
"regression: Regression tests to prevent bug recurrence",
"no_mutation: Tests incompatible with mutmut runner directory (Path(__file__)-dependent structural tests)",
"unit: Pure unit tests with no I/O or external dependencies",
]
Marker Usage Examples¶
import pytest
from hypothesis import given, strategies as st
# Speed marker only (fast unit test)
@pytest.mark.smoke
def test_import_package():
"""Verify package can be imported without errors."""
import ncaa_eval # noqa: F401 — import itself is the assertion
# Scope + Speed markers (slow integration test, example-based)
@pytest.mark.integration
@pytest.mark.slow
def test_full_season_processing(large_dataset_fixture):
"""Process a full season of games (slow due to data volume)."""
result = process_season(large_dataset_fixture, season=2023)
assert len(result) > 1000
# Approach marker only (property-based unit test)
@pytest.mark.property
@given(probs=st.lists(st.floats(0.0, 1.0), min_size=1))
def test_brier_score_is_bounded(probs):
"""Verify Brier score is always in [0, 1] (invariant)."""
preds = np.array(probs)
actuals = np.ones(len(probs))
score = brier_score(preds, actuals)
assert 0.0 <= score <= 1.0
# Scope + Approach markers (property-based integration test)
@pytest.mark.integration
@pytest.mark.property
@given(cutoff_year=st.integers(2015, 2025))
def test_temporal_boundary_invariant(cutoff_year):
"""Verify API enforces temporal boundaries (integration + invariant)."""
api = ChronologicalDataServer()
games = api.get_games_before(cutoff_year=cutoff_year)
assert all(game.season <= cutoff_year for game in games)
# Purpose markers (regression test)
@pytest.mark.regression
def test_elo_never_negative(elo_config, sample_games):
"""Regression test: Prevent Issue #42 (negative Elo ratings)."""
engine = EloFeatureEngine(elo_config)
for game in sample_games:
engine.update_game(game)
assert all(r >= 0 for r in engine.ratings.values())
# All dimensions combined (integration + property + performance)
@pytest.mark.integration
@pytest.mark.property
@pytest.mark.performance
@pytest.mark.slow
@given(season=st.integers(2015, 2025))
def test_game_loading_fast_and_correct(season):
"""Verify game loading is correct AND performant (all dimensions)."""
import timeit
start = timeit.default_timer()
games = load_games_for_season(season)
elapsed = timeit.default_timer() - start
# Functional correctness
assert len(games) > 0
# Performance requirement
assert elapsed < 5.0
Test Execution Commands¶
Context |
Command |
What Runs |
|---|---|---|
Pre-commit |
|
Smoke tests only (< 5s) |
Local full suite |
|
All tests (no filter) |
Local with coverage |
|
All tests + HTML coverage report |
Exclude slow tests |
|
All except slow tests |
Integration only |
|
Integration tests only (scope filter) |
Property-based only |
|
Property-based tests only (approach filter) |
Performance only |
|
Performance tests only (purpose filter) |
Regression only |
|
Regression tests only (purpose filter) |
Combined filters |
|
Integration regression tests |
CI/PR |
|
All tests + terminal coverage |
Coverage Targets¶
Coverage is a quality signal, not a binary gate. Targets guide development but are not enforced as strict gates.
Module-Specific Targets¶
Module |
Line Coverage |
Branch Coverage |
Rationale |
|---|---|---|---|
|
95% |
90% |
Critical for correctness (LogLoss, Brier, ECE). Errors invalidate all model evaluations. |
|
90% |
85% |
Monte Carlo simulator (Epic 6). Errors affect tournament strategy. |
|
90% |
85% |
Core abstraction for all models. Errors cascade to all implementations. |
|
85% |
80% |
Feature correctness impacts model quality. Data leakage prevention critical. |
|
80% |
75% |
Data quality impacts everything downstream. |
|
75% |
70% |
Lower priority than core logic but still important. |
Overall Project |
80% |
75% |
Balanced target: rigorous without being burdensome. |
Enforcement Approach¶
Pre-commit: NO coverage enforcement (would slow development loop)
PR-time: Coverage report generated (
pytest --cov) but NOT enforced as gate (informational only)Rationale: Coverage highlights gaps but shouldn’t block PRs if tests are high-quality. Manual review of coverage reports is more valuable than automated enforcement.
Coverage Tooling¶
Tool:
pytest-covplugin (configured inpyproject.toml)HTML reports:
pytest --cov --cov-report=html(local debugging)Terminal reports:
pytest --cov --cov-report=term-missing(CI)Branch coverage:
pytest --cov=src/ncaa_eval --cov-branch(measures both line and branch coverage)
Development Workflow Integration¶
Nox Workflow¶
The testing strategy integrates into the nox-orchestrated development pipeline:
Command: nox
Workflow: Ruff (lint/format) → Mypy (strict) → Pytest (full suite)
# Actual noxfile.py tests session (python=False uses active conda env)
@nox.session(python=False)
def tests(session):
"""Run the full pytest test suite."""
session.run("pytest", "--tb=short", *session.posargs)
Pre-commit Hook Integration¶
Pre-commit hooks are configured in .pre-commit-config.yaml:
# .pre-commit-config.yaml (excerpt — pytest-smoke hook)
- repo: local
hooks:
- id: pytest-smoke
name: pytest-smoke
entry: poetry run pytest -m smoke --tb=short -q
language: system
types: [python]
pass_filenames: false
stages: [commit]
See Also¶
Test Scope Guide - Unit vs Integration tests
Test Approach Guide - Example-based vs Property-based
Test Purpose Guide - Functional, Performance, Regression
Execution Guide - When tests/checks run (4-tier model)
Quality Assurance Guide - Mutation testing, coverage analysis
Domain Testing Guide - Performance and data leakage testing