Testing Strategy¶
Quick Reference for the ncaa_eval project testing approach. For detailed explanations and examples, see the testing guides.
Table of Contents¶
Overview¶
Key Principles¶
✅ Fast feedback via Tier 1 (pre-commit, < 10s total)
✅ Thorough validation via Tier 2 (PR/CI, complete suite)
✅ Four orthogonal dimensions - choose appropriate combination
✅ Coverage is a signal, not a gate - identify gaps, don’t block
✅ Mutation testing evaluates test quality (critical modules only)
✅ Vectorization compliance via performance testing (NFR1)
✅ Temporal integrity via data leakage testing (NFR4)
✅ 4-tier execution model - Tier 1 (pre-commit) → Tier 2 (PR/CI) → Tier 3 (AI review) → Tier 4 (owner review)
Four Orthogonal Dimensions¶
This strategy separates four independent dimensions of testing. Choose the appropriate combination for each test case:
Test Scope - What you’re testing → Scope Guide
Unit: Single function/class in isolation
Integration: Multiple components working together
Test Approach - How you write the test → Approach Guide
Example-based: Concrete inputs → expected outputs
Property-based (Hypothesis): Invariants that should hold for all inputs
Fuzz-based (Hypothesis): Random/mutated inputs to find crashes and error handling gaps (no dedicated marker — use
@pytest.mark.slow)
Test Purpose - Why you’re writing the test → Purpose Guide
Functional: Correctness of behavior (default)
Performance: Speed/efficiency compliance (NFR1: vectorization)
Regression: Prevent previously fixed bugs from recurring
Execution Scope - When tests/checks run → Execution Guide
Tier 1 (Pre-commit): Smoke tests + fast checks (< 10s total)
Tier 2 (PR/CI): Complete suite + coverage + mutation
Tier 3/4: AI + Owner review
Note: Mutation testing and coverage are not test types - they’re quality assurance tools. See Quality Assurance Guide.
Execution Tiers (When Checks Run)¶
The project uses a 4-tier execution model that balances speed with thoroughness:
Tier 1: Pre-Commit (< 10s total)¶
Fast, local checks that run on every commit:
Check |
Tool |
What It Catches |
|---|---|---|
Lint |
|
Style violations, import issues |
Format |
|
Inconsistent formatting |
Type-check |
|
Missing annotations, type errors |
Smoke tests |
|
Broken imports, sanity failures |
Rationale: Catch 80% of issues in seconds before code leaves your machine.
Tier 2: PR/CI (minutes)¶
Comprehensive validation before merge:
Check |
Tool |
What It Catches |
|---|---|---|
Full test suite |
|
All regressions, edge cases |
Integration tests |
|
Component interaction failures |
Property-based |
|
Invariant violations |
Performance |
|
Vectorization violations, speed regressions |
Coverage |
|
Untested code paths |
Mutation (Tier 1 modules) |
|
Weak tests, coverage gaps |
Rationale: Catch remaining 20% requiring full project context.
Tier 3: AI Code Review¶
Docstring quality, vectorization compliance, architecture alignment, test quality, design intent.
Tier 4: Owner Review¶
Functional correctness, strategic alignment, complexity appropriateness, scope creep prevention.
See Execution Guide for complete details on each tier.
Detailed Guides¶
For comprehensive explanations, examples, and best practices:
Test Scope Guide - Unit vs Integration tests
Test Approach Guide - Example-based vs Property-based
Test Purpose Guide - Functional, Performance, Regression
Execution Guide - When tests/checks run (4-tier model)
Quality Assurance Guide - Mutation testing, coverage analysis
Conventions Guide - Fixtures, markers, organization, coverage targets
Domain Testing Guide - Performance testing, Data leakage prevention
Quick Decision Trees¶
Which test scope?¶
flowchart TD
Start{Does it interact with<br/>external systems?<br/>files, DB, network}
Start -->|YES| Integration[Integration test<br/>@pytest.mark.integration<br/>PR-time only]
Start -->|NO| Unit[Unit test<br/>fast, pre-commit eligible if smoke]
Which approach?¶
flowchart TD
Start{Testing error handling<br/>or crash resilience?}
Start -->|YES| Fuzz[Fuzz-based<br/>Hypothesis st.text/st.binary]
Start -->|NO| Known{Have specific<br/>known scenarios?}
Known -->|YES| Example[Example-based<br/>parametrize for multiple cases]
Known -->|NO| Invariant{Can you state<br/>an invariant?}
Invariant -->|YES| Property[Property-based<br/>@pytest.mark.property<br/>Hypothesis]
Invariant -->|NO| ExampleAlt[Example-based<br/>test specific examples]
Which execution tier?¶
flowchart TD
Start{Is test fast?<br/>under 1 second}
Start -->|NO| Tier2Slow[Tier 2 only<br/>@pytest.mark.slow]
Start -->|YES| Critical{Import/sanity/schema check<br/>OR critical regression?}
Critical -->|YES| Tier1[Tier 1 eligible<br/>@pytest.mark.smoke]
Critical -->|NO| Tier2Fast[Tier 2 only<br/>save pre-commit budget]
Test Markers Reference¶
Marker |
Dimension |
Command |
|---|---|---|
|
Speed |
|
|
Speed |
|
|
Scope |
|
|
Scope |
|
|
Approach |
|
|
Purpose |
|
|
Purpose |
|
|
Quality |
Tests incompatible with mutmut runner |
Combine markers across dimensions:
@pytest.mark.integration
@pytest.mark.property
@pytest.mark.regression
Test Commands Reference¶
Context |
Command |
What Runs |
|---|---|---|
Tier 1 (Pre-commit) |
|
Smoke tests only (< 5s; Tier 1 overall < 10s) |
Tier 2 (PR/CI - full) |
|
All tests |
Tier 2 (PR/CI - coverage) |
|
All + coverage report |
Tier 2 (exclude slow) |
|
All except slow tests |
Filter by dimension |
|
Filter by marker |
Combined filters |
|
Intersection |
Test Organization¶
tests/
├── __init__.py
├── conftest.py # Shared fixtures
├── fixtures/
│ ├── .gitkeep
│ └── kaggle/
│ ├── MNCAATourneyCompactResults.csv
│ ├── MRegularSeasonCompactResults.csv
│ ├── MSeasons.csv
│ └── MTeams.csv
├── integration/
│ ├── __init__.py
│ ├── test_documented_commands.py
│ ├── test_elo_integration.py
│ ├── test_feature_serving_integration.py
│ └── test_sync.py
└── unit/
├── __init__.py
├── test_bracket_page.py
├── test_bracket_renderer.py
├── test_calibration.py
├── test_chronological_serving.py
├── test_cli_train.py
├── test_connector_base.py
├── test_dashboard_app.py
├── test_dashboard_filters.py
├── test_deep_dive_page.py
├── test_elo.py
├── test_espn_connector.py
├── test_evaluation_backtest.py
├── test_evaluation_metrics.py
├── test_evaluation_plotting.py
├── test_evaluation_simulation.py
├── test_evaluation_splitter.py
├── test_feature_serving.py
├── test_framework_validation.py
├── test_fuzzy.py
├── test_graph.py
├── test_home_page.py
├── test_imports.py
├── test_kaggle_connector.py
├── test_leaderboard_page.py
├── test_logger.py
├── test_model_base.py
├── test_model_elo.py
├── test_model_logistic_regression.py
├── test_model_registry.py
├── test_model_tracking.py
├── test_model_xgboost.py
├── test_normalization.py
├── test_opponent.py
├── test_package_structure.py
├── test_pool_scorer_page.py
├── test_repository.py
├── test_run_store_metrics.py
├── test_schema.py
└── test_sequential.py
Naming conventions:
Test files:
test_<module_name>.pyTest functions:
test_<function>_<scenario>()Fixtures: Descriptive names (e.g.,
sample_teams,elo_config,temp_data_dir)
See Conventions Guide for details.
Coverage Targets¶
Module |
Line |
Branch |
Rationale |
|---|---|---|---|
|
95% |
90% |
Critical - errors invalidate all evaluations |
|
90% |
85% |
Monte Carlo simulator |
|
90% |
85% |
Core abstraction |
|
85% |
80% |
Feature correctness, leakage prevention |
|
80% |
75% |
Data quality |
|
75% |
70% |
Lower priority |
Overall |
80% |
75% |
Balanced |
Coverage is a signal, not a gate. Use to identify gaps, not block PRs.
See Conventions Guide for details.
Testing Tools¶
Tool |
Purpose |
Configuration |
|---|---|---|
Pytest |
Testing framework |
|
Hypothesis |
Property-based + Fuzz testing |
Dev dependency |
Mutmut |
Mutation testing (quality) |
Dev dependency |
pytest-cov |
Coverage reporting |
|
Nox |
Session orchestration |
|
Domain-Specific Testing¶
Performance Testing (NFR1: Vectorization)¶
Smoke: Assertion-based vectorization checks (< 1s)
PR-time: Performance benchmarks, 60-second backtest target
@pytest.mark.smoke
@pytest.mark.performance
def test_metrics_are_vectorized():
"""Quick check: no .iterrows() in metrics."""
# See domain-testing.md for example
Data Leakage Prevention (NFR4: Temporal Boundaries)¶
Smoke: API contract unit tests (fast)
PR-time: End-to-end workflow tests, property-based invariants
@pytest.mark.smoke
def test_api_enforces_cutoff():
"""Quick check: API rejects future data."""
# See domain-testing.md for example
See Domain Testing Guide for comprehensive examples.
References¶
STYLE_GUIDE.md- Coding standards, vectorization rulespecs/05-architecture-fullstack.md- Architecture, nox workflowspecs/03-prd.md- Non-functional requirements (NFR1-NFR5)pyproject.toml- Pytest configuration.github/pull_request_template.md- PR checklist