# Testing Strategy **Quick Reference** for the `ncaa_eval` project testing approach. For detailed explanations and examples, see the [testing guides](testing/). --- ## Table of Contents 1. [Overview](#overview) - [Key Principles](#key-principles) - [Four Orthogonal Dimensions](#four-orthogonal-dimensions) - [Execution Tiers (When Checks Run)](#execution-tiers-when-checks-run) - [Tier 1: Pre-Commit](#tier-1-pre-commit--10s-total) - [Tier 2: PR/CI](#tier-2-prci-minutes) - [Tier 3: AI Code Review](#tier-3-ai-code-review) - [Tier 4: Owner Review](#tier-4-owner-review) 2. [Detailed Guides](#detailed-guides) 3. [Quick Decision Trees](#quick-decision-trees) - [Which test scope?](#which-test-scope) - [Which approach?](#which-approach) - [Which execution tier?](#which-execution-tier) 4. [Test Markers Reference](#test-markers-reference) 5. [Test Commands Reference](#test-commands-reference) 6. [Test Organization](#test-organization) 7. [Coverage Targets](#coverage-targets) 8. [Testing Tools](#testing-tools) 9. [Domain-Specific Testing](#domain-specific-testing) - [Performance Testing (NFR1: Vectorization)](#performance-testing-nfr1-vectorization) - [Data Leakage Prevention (NFR4: Temporal Boundaries)](#data-leakage-prevention-nfr4-temporal-boundaries) 10. [References](#references) --- ## Overview ### Key Principles 1. ✅ **Fast feedback** via Tier 1 (pre-commit, < 10s total) 2. ✅ **Thorough validation** via Tier 2 (PR/CI, complete suite) 3. ✅ **Four orthogonal dimensions** - choose appropriate combination 4. ✅ **Coverage is a signal, not a gate** - identify gaps, don't block 5. ✅ **Mutation testing** evaluates test quality (critical modules only) 6. ✅ **Vectorization compliance** via performance testing (NFR1) 7. ✅ **Temporal integrity** via data leakage testing (NFR4) 8. ✅ **4-tier execution model** - Tier 1 (pre-commit) → Tier 2 (PR/CI) → Tier 3 (AI review) → Tier 4 (owner review) ### Four Orthogonal Dimensions This strategy separates **four independent dimensions** of testing. Choose the appropriate combination for each test case: 1. **Test Scope** - *What* you're testing → [Scope Guide](testing/test-scope-guide.md) - **Unit:** Single function/class in isolation - **Integration:** Multiple components working together 2. **Test Approach** - *How* you write the test → [Approach Guide](testing/test-approach-guide.md) - **Example-based:** Concrete inputs → expected outputs - **Property-based (Hypothesis):** Invariants that should hold for all inputs - **Fuzz-based (Hypothesis):** Random/mutated inputs to find crashes and error handling gaps (no dedicated marker — use `@pytest.mark.slow`) 3. **Test Purpose** - *Why* you're writing the test → [Purpose Guide](testing/test-purpose-guide.md) - **Functional:** Correctness of behavior (default) - **Performance:** Speed/efficiency compliance (NFR1: vectorization) - **Regression:** Prevent previously fixed bugs from recurring 4. **Execution Scope** - *When* tests/checks run → [Execution Guide](testing/execution.md) - **Tier 1 (Pre-commit):** Smoke tests + fast checks (< 10s total) - **Tier 2 (PR/CI):** Complete suite + coverage + mutation - **Tier 3/4:** AI + Owner review **Note:** Mutation testing and coverage are **not test types** - they're quality assurance tools. See [Quality Assurance Guide](testing/quality.md). ### Execution Tiers (When Checks Run) The project uses a **4-tier execution model** that balances speed with thoroughness: #### Tier 1: Pre-Commit (< 10s total) Fast, local checks that run on every commit: | Check | Tool | What It Catches | |-------|------|-----------------| | Lint | `ruff check .` | Style violations, import issues | | Format | `ruff format --check .` | Inconsistent formatting | | Type-check | `mypy` (strict) | Missing annotations, type errors | | Smoke tests | `pytest -m smoke` | Broken imports, sanity failures | **Rationale:** Catch 80% of issues in seconds before code leaves your machine. #### Tier 2: PR/CI (minutes) Comprehensive validation before merge: | Check | Tool | What It Catches | |-------|------|-----------------| | Full test suite | `pytest` | All regressions, edge cases | | Integration tests | `pytest -m integration` | Component interaction failures | | Property-based | `pytest -m property` | Invariant violations | | Performance | `pytest -m performance` | Vectorization violations, speed regressions | | Coverage | `pytest-cov` | Untested code paths | | Mutation (Tier 1 modules) | `mutmut` | Weak tests, coverage gaps | **Rationale:** Catch remaining 20% requiring full project context. #### Tier 3: AI Code Review Docstring quality, vectorization compliance, architecture alignment, test quality, design intent. #### Tier 4: Owner Review Functional correctness, strategic alignment, complexity appropriateness, scope creep prevention. See [Execution Guide](testing/execution.md) for complete details on each tier. --- ## Detailed Guides For comprehensive explanations, examples, and best practices: - **[Test Scope Guide](testing/test-scope-guide.md)** - Unit vs Integration tests - **[Test Approach Guide](testing/test-approach-guide.md)** - Example-based vs Property-based - **[Test Purpose Guide](testing/test-purpose-guide.md)** - Functional, Performance, Regression - **[Execution Guide](testing/execution.md)** - When tests/checks run (4-tier model) - **[Quality Assurance Guide](testing/quality.md)** - Mutation testing, coverage analysis - **[Conventions Guide](testing/conventions.md)** - Fixtures, markers, organization, coverage targets - **[Domain Testing Guide](testing/domain-testing.md)** - Performance testing, Data leakage prevention --- ## Quick Decision Trees ### Which test scope? ```mermaid flowchart TD Start{Does it interact with
external systems?
files, DB, network} Start -->|YES| Integration[Integration test
@pytest.mark.integration
PR-time only] Start -->|NO| Unit[Unit test
fast, pre-commit eligible if smoke] ``` ### Which approach? ```mermaid flowchart TD Start{Testing error handling
or crash resilience?} Start -->|YES| Fuzz[Fuzz-based
Hypothesis st.text/st.binary] Start -->|NO| Known{Have specific
known scenarios?} Known -->|YES| Example[Example-based
parametrize for multiple cases] Known -->|NO| Invariant{Can you state
an invariant?} Invariant -->|YES| Property[Property-based
@pytest.mark.property
Hypothesis] Invariant -->|NO| ExampleAlt[Example-based
test specific examples] ``` ### Which execution tier? ```mermaid flowchart TD Start{Is test fast?
under 1 second} Start -->|NO| Tier2Slow[Tier 2 only
@pytest.mark.slow] Start -->|YES| Critical{Import/sanity/schema check
OR critical regression?} Critical -->|YES| Tier1[Tier 1 eligible
@pytest.mark.smoke] Critical -->|NO| Tier2Fast[Tier 2 only
save pre-commit budget] ``` --- ## Test Markers Reference | Marker | Dimension | Command | |--------|-----------|---------| | `@pytest.mark.smoke` | Speed | `pytest -m smoke` | | `@pytest.mark.slow` | Speed | `pytest -m "not slow"` | | `@pytest.mark.unit` | Scope | `pytest -m unit` | | `@pytest.mark.integration` | Scope | `pytest -m integration` | | `@pytest.mark.property` | Approach | `pytest -m property` | | `@pytest.mark.performance` | Purpose | `pytest -m performance` | | `@pytest.mark.regression` | Purpose | `pytest -m regression` | | `@pytest.mark.no_mutation` | Quality | Tests incompatible with mutmut runner | **Combine markers across dimensions:** ```python @pytest.mark.integration @pytest.mark.property @pytest.mark.regression ``` --- ## Test Commands Reference | Context | Command | What Runs | |---------|---------|-----------| | **Tier 1 (Pre-commit)** | `pytest -m smoke` | Smoke tests only (< 5s; Tier 1 overall < 10s) | | **Tier 2 (PR/CI - full)** | `pytest` | All tests | | **Tier 2 (PR/CI - coverage)** | `pytest --cov=src/ncaa_eval --cov-report=term-missing` | All + coverage report | | **Tier 2 (exclude slow)** | `pytest -m "not slow"` | All except slow tests | | **Filter by dimension** | `pytest -m integration` | Filter by marker | | **Combined filters** | `pytest -m "integration and regression"` | Intersection | --- ## Test Organization ``` tests/ ├── __init__.py ├── conftest.py # Shared fixtures ├── fixtures/ │ ├── .gitkeep │ └── kaggle/ │ ├── MNCAATourneyCompactResults.csv │ ├── MRegularSeasonCompactResults.csv │ ├── MSeasons.csv │ └── MTeams.csv ├── integration/ │ ├── __init__.py │ ├── test_documented_commands.py │ ├── test_elo_integration.py │ ├── test_feature_serving_integration.py │ └── test_sync.py └── unit/ ├── __init__.py ├── test_bracket_page.py ├── test_bracket_renderer.py ├── test_calibration.py ├── test_chronological_serving.py ├── test_cli_train.py ├── test_connector_base.py ├── test_dashboard_app.py ├── test_dashboard_filters.py ├── test_deep_dive_page.py ├── test_elo.py ├── test_espn_connector.py ├── test_evaluation_backtest.py ├── test_evaluation_metrics.py ├── test_evaluation_plotting.py ├── test_evaluation_simulation.py ├── test_evaluation_splitter.py ├── test_feature_serving.py ├── test_framework_validation.py ├── test_fuzzy.py ├── test_graph.py ├── test_home_page.py ├── test_imports.py ├── test_kaggle_connector.py ├── test_leaderboard_page.py ├── test_logger.py ├── test_model_base.py ├── test_model_elo.py ├── test_model_logistic_regression.py ├── test_model_registry.py ├── test_model_tracking.py ├── test_model_xgboost.py ├── test_normalization.py ├── test_opponent.py ├── test_package_structure.py ├── test_pool_scorer_page.py ├── test_repository.py ├── test_run_store_metrics.py ├── test_schema.py └── test_sequential.py ``` **Naming conventions:** - Test files: `test_.py` - Test functions: `test__()` - Fixtures: Descriptive names (e.g., `sample_teams`, `elo_config`, `temp_data_dir`) See [Conventions Guide](testing/conventions.md) for details. --- ## Coverage Targets | Module | Line | Branch | Rationale | |--------|------|--------|-----------| | `evaluation/metrics.py` | 95% | 90% | Critical - errors invalidate all evaluations | | `evaluation/simulation.py` | 90% | 85% | Monte Carlo simulator | | `model/` | 90% | 85% | Core abstraction | | `transform/` | 85% | 80% | Feature correctness, leakage prevention | | `ingest/` | 80% | 75% | Data quality | | `utils/` | 75% | 70% | Lower priority | | **Overall** | **80%** | **75%** | Balanced | **Coverage is a signal, not a gate.** Use to identify gaps, not block PRs. See [Conventions Guide](testing/conventions.md#coverage-targets) for details. --- ## Testing Tools | Tool | Purpose | Configuration | |------|---------|---------------| | **Pytest** | Testing framework | `pyproject.toml` `[tool.pytest.ini_options]` | | **Hypothesis** | Property-based + Fuzz testing | Dev dependency | | **Mutmut** | Mutation testing (quality) | Dev dependency | | **pytest-cov** | Coverage reporting | `[tool.coverage.report]` | | **Nox** | Session orchestration | `noxfile.py` | --- ## Domain-Specific Testing ### Performance Testing (NFR1: Vectorization) - **Smoke:** Assertion-based vectorization checks (< 1s) - **PR-time:** Performance benchmarks, 60-second backtest target ```python @pytest.mark.smoke @pytest.mark.performance def test_metrics_are_vectorized(): """Quick check: no .iterrows() in metrics.""" # See domain-testing.md for example ``` ### Data Leakage Prevention (NFR4: Temporal Boundaries) - **Smoke:** API contract unit tests (fast) - **PR-time:** End-to-end workflow tests, property-based invariants ```python @pytest.mark.smoke def test_api_enforces_cutoff(): """Quick check: API rejects future data.""" # See domain-testing.md for example ``` See [Domain Testing Guide](testing/domain-testing.md) for comprehensive examples. --- ## References - [`STYLE_GUIDE.md`](STYLE_GUIDE.md) - Coding standards, vectorization rule - `specs/05-architecture-fullstack.md` - Architecture, nox workflow - `specs/03-prd.md` - Non-functional requirements (NFR1-NFR5) - `pyproject.toml` - Pytest configuration - `.github/pull_request_template.md` - PR checklist