Testing Conventions¶

This guide covers test organization, fixtures, markers, and naming conventions.

Test Organization¶

Directory Structure¶

tests/
├── __init__.py
├── conftest.py                          # Shared fixtures
├── fixtures/
│   ├── .gitkeep
│   └── kaggle/
│       ├── MNCAATourneyCompactResults.csv
│       ├── MRegularSeasonCompactResults.csv
│       ├── MSeasons.csv
│       └── MTeams.csv
├── integration/
│   ├── __init__.py
│   ├── test_documented_commands.py      # E2E CLI documentation tests
│   ├── test_elo_integration.py          # Integration: Elo pipeline
│   ├── test_feature_serving_integration.py  # Integration: feature serving
│   └── test_sync.py                     # Integration: ingest → storage
└── unit/
    ├── __init__.py
    ├── test_bracket_page.py
    ├── test_bracket_renderer.py
    ├── test_calibration.py
    ├── test_chronological_serving.py
    ├── test_cli_train.py
    ├── test_connector_base.py
    ├── test_dashboard_app.py
    ├── test_dashboard_filters.py
    ├── test_deep_dive_page.py
    ├── test_elo.py
    ├── test_espn_connector.py
    ├── test_evaluation_backtest.py
    ├── test_evaluation_metrics.py
    ├── test_evaluation_plotting.py
    ├── test_evaluation_simulation.py
    ├── test_evaluation_splitter.py
    ├── test_feature_serving.py
    ├── test_framework_validation.py
    ├── test_fuzzy.py
    ├── test_graph.py
    ├── test_home_page.py
    ├── test_imports.py
    ├── test_kaggle_connector.py
    ├── test_leaderboard_page.py
    ├── test_logger.py
    ├── test_model_base.py
    ├── test_model_elo.py
    ├── test_model_logistic_regression.py
    ├── test_model_registry.py
    ├── test_model_tracking.py
    ├── test_model_xgboost.py
    ├── test_normalization.py
    ├── test_opponent.py
    ├── test_package_structure.py
    ├── test_pool_scorer_page.py
    ├── test_repository.py
    ├── test_run_store_metrics.py
    ├── test_schema.py
    └── test_sequential.py

Naming Conventions¶

Entity	Convention	Example
Test files	`test_<module_name>.py`	`test_metrics.py` for `src/ncaa_eval/evaluation/metrics.py`
Test functions	`test_<function>_<scenario>()`	`test_brier_score_perfect_prediction()`
Fixture functions	Descriptive name (no suffix)	`sample_teams()`, `elo_config()`, `temp_data_dir()`
Test classes	`Test<ClassName>`	`class TestEloModel:`

Pytest Discovery¶

Pytest automatically discovers:

Files matching test_*.py or *_test.py
Functions matching test_*()
Classes matching Test*

No custom discovery configuration is needed (already set in pyproject.toml).

Fixture Conventions¶

Fixture Scope¶

Scope	Lifetime	Use Case	Example
`function`	Per test function (default)	Independent test data, reset each test	`@pytest.fixture def sample_game(): ...`
`class`	Per test class	Shared setup for class methods	`@pytest.fixture(scope="class") def db_connection(): ...`
`module`	Per test file	Expensive setup, reused across file	`@pytest.fixture(scope="module") def trained_model(): ...`
`session`	Per test session	One-time setup for all tests	`@pytest.fixture(scope="session") def test_database(): ...`

Fixture Organization¶

tests/conftest.py: Project-wide fixtures (e.g., sample_teams(), elo_config(), temp_data_dir())
Inline fixtures: Simple fixtures can be defined in test files if not reused elsewhere

Fixture Best Practices¶

1. Type annotations: All fixtures must have return type annotations (mypy –strict compliance)

from __future__ import annotations

import pytest
from ncaa_eval.ingest.schema import Game

@pytest.fixture
def sample_game() -> Game:
    """Provide a sample game for testing."""
    return Game(season=2023, day_num=100, w_team_id=1234, l_team_id=5678, w_score=75, l_score=70)

2. Parametrized fixtures: Use @pytest.fixture(params=[...]) for testing multiple scenarios

@pytest.fixture(params=[1500, 1800, 2100])
def elo_rating(request: pytest.FixtureRequest) -> int:
    """Provide different Elo ratings for testing."""
    return request.param

3. Teardown with yield: Use yield for setup/teardown patterns

from typing import Iterator
from pathlib import Path
import shutil

@pytest.fixture
def temp_data_dir() -> Iterator[Path]:
    """Create temporary directory for test data."""
    temp_dir = Path("test_data_temp")
    temp_dir.mkdir(exist_ok=True)
    yield temp_dir
    # Teardown: clean up after test
    shutil.rmtree(temp_dir)

Test Markers¶

Pytest markers enable selective test execution for pre-commit vs. PR-time distinction. Markers can be combined across all dimensions.

Marker Definitions¶

Marker	Dimension	Purpose	Command
`@pytest.mark.smoke`	Speed	Fast smoke tests for pre-commit (< 1s each; smoke subset < 5s; Tier 1 overall < 10s)	`pytest -m smoke`
`@pytest.mark.slow`	Speed	Slow tests excluded from pre-commit (> 5 seconds each)	`pytest -m "not slow"`
`@pytest.mark.unit`	Scope	Pure unit tests with no I/O or external dependencies	`pytest -m unit`
`@pytest.mark.integration`	Scope	Integration tests (I/O, database)	`pytest -m integration`
`@pytest.mark.property`	Approach	Property-based tests (Hypothesis)	`pytest -m property`
`@pytest.mark.performance`	Purpose	Performance/benchmark tests	`pytest -m performance`
`@pytest.mark.regression`	Purpose	Regression tests (prevent bug recurrence)	`pytest -m regression`
`@pytest.mark.no_mutation`	Quality	Tests incompatible with mutmut runner directory (`Path(__file__)`-dependent)	N/A

Marker Configuration¶

Markers are configured in pyproject.toml:

[tool.pytest.ini_options]
markers = [
    "smoke: Fast smoke tests for pre-commit (< 5s smoke subset; Tier 1 overall < 10s)",
    "slow: Slow tests excluded from pre-commit (> 5 seconds each)",
    "integration: Integration tests with I/O or external dependencies",
    "property: Hypothesis property-based tests",
    "performance: Performance and benchmark tests",
    "regression: Regression tests to prevent bug recurrence",
    "no_mutation: Tests incompatible with mutmut runner directory (Path(__file__)-dependent structural tests)",
    "unit: Pure unit tests with no I/O or external dependencies",
]

Marker Usage Examples¶

import pytest
from hypothesis import given, strategies as st

# Speed marker only (fast unit test)
@pytest.mark.smoke
def test_import_package():
    """Verify package can be imported without errors."""
    import ncaa_eval  # noqa: F401 — import itself is the assertion

# Scope + Speed markers (slow integration test, example-based)
@pytest.mark.integration
@pytest.mark.slow
def test_full_season_processing(large_dataset_fixture):
    """Process a full season of games (slow due to data volume)."""
    result = process_season(large_dataset_fixture, season=2023)
    assert len(result) > 1000

# Approach marker only (property-based unit test)
@pytest.mark.property
@given(probs=st.lists(st.floats(0.0, 1.0), min_size=1))
def test_brier_score_is_bounded(probs):
    """Verify Brier score is always in [0, 1] (invariant)."""
    preds = np.array(probs)
    actuals = np.ones(len(probs))
    score = brier_score(preds, actuals)
    assert 0.0 <= score <= 1.0

# Scope + Approach markers (property-based integration test)
@pytest.mark.integration
@pytest.mark.property
@given(cutoff_year=st.integers(2015, 2025))
def test_temporal_boundary_invariant(cutoff_year):
    """Verify API enforces temporal boundaries (integration + invariant)."""
    api = ChronologicalDataServer()
    games = api.get_games_before(cutoff_year=cutoff_year)
    assert all(game.season <= cutoff_year for game in games)

# Purpose markers (regression test)
@pytest.mark.regression
def test_elo_never_negative(elo_config, sample_games):
    """Regression test: Prevent Issue #42 (negative Elo ratings)."""
    engine = EloFeatureEngine(elo_config)
    for game in sample_games:
        engine.update_game(game)
    assert all(r >= 0 for r in engine.ratings.values())

# All dimensions combined (integration + property + performance)
@pytest.mark.integration
@pytest.mark.property
@pytest.mark.performance
@pytest.mark.slow
@given(season=st.integers(2015, 2025))
def test_game_loading_fast_and_correct(season):
    """Verify game loading is correct AND performant (all dimensions)."""
    import timeit
    start = timeit.default_timer()
    games = load_games_for_season(season)
    elapsed = timeit.default_timer() - start

    # Functional correctness
    assert len(games) > 0
    # Performance requirement
    assert elapsed < 5.0

Test Execution Commands¶

Context	Command	What Runs
Pre-commit	`pytest -m smoke`	Smoke tests only (< 5s)
Local full suite	`pytest`	All tests (no filter)
Local with coverage	`pytest --cov=src/ncaa_eval --cov-report=html`	All tests + HTML coverage report
Exclude slow tests	`pytest -m "not slow"`	All except slow tests
Integration only	`pytest -m integration`	Integration tests only (scope filter)
Property-based only	`pytest -m property`	Property-based tests only (approach filter)
Performance only	`pytest -m performance`	Performance tests only (purpose filter)
Regression only	`pytest -m regression`	Regression tests only (purpose filter)
Combined filters	`pytest -m "integration and regression"`	Integration regression tests
CI/PR	`pytest --cov=src/ncaa_eval --cov-report=term-missing`	All tests + terminal coverage

Coverage Targets¶

Coverage is a quality signal, not a binary gate. Targets guide development but are not enforced as strict gates.

Module-Specific Targets¶

Module	Line Coverage	Branch Coverage	Rationale
`evaluation/metrics.py`	95%	90%	Critical for correctness (LogLoss, Brier, ECE). Errors invalidate all model evaluations.
`evaluation/simulation.py`	90%	85%	Monte Carlo simulator (Epic 6). Errors affect tournament strategy.
`model/` (Model ABC)	90%	85%	Core abstraction for all models. Errors cascade to all implementations.
`transform/` (Features)	85%	80%	Feature correctness impacts model quality. Data leakage prevention critical.
`ingest/` (Data Ingestion)	80%	75%	Data quality impacts everything downstream.
`utils/` (Utilities)	75%	70%	Lower priority than core logic but still important.
Overall Project	80%	75%	Balanced target: rigorous without being burdensome.

Enforcement Approach¶

Pre-commit: NO coverage enforcement (would slow development loop)
PR-time: Coverage report generated (pytest --cov) but NOT enforced as gate (informational only)
Rationale: Coverage highlights gaps but shouldn’t block PRs if tests are high-quality. Manual review of coverage reports is more valuable than automated enforcement.

Coverage Tooling¶

Tool: pytest-cov plugin (configured in pyproject.toml)
HTML reports: pytest --cov --cov-report=html (local debugging)
Terminal reports: pytest --cov --cov-report=term-missing (CI)
Branch coverage: pytest --cov=src/ncaa_eval --cov-branch (measures both line and branch coverage)

Development Workflow Integration¶

Nox Workflow¶

The testing strategy integrates into the nox-orchestrated development pipeline:

Command: nox Workflow: Ruff (lint/format) → Mypy (strict) → Pytest (full suite)

# Actual noxfile.py tests session (python=False uses active conda env)
@nox.session(python=False)
def tests(session):
    """Run the full pytest test suite."""
    session.run("pytest", "--tb=short", *session.posargs)

Pre-commit Hook Integration¶

Pre-commit hooks are configured in .pre-commit-config.yaml:

# .pre-commit-config.yaml (excerpt — pytest-smoke hook)
  - repo: local
    hooks:
      - id: pytest-smoke
        name: pytest-smoke
        entry: poetry run pytest -m smoke --tb=short -q
        language: system
        types: [python]
        pass_filenames: false
        stages: [commit]