# Testing Strategy

**Quick Reference** for the `ncaa_eval` project testing approach. For detailed explanations and examples, see the [testing guides](testing/).

---

## Table of Contents

1. [Overview](#overview)
   - [Key Principles](#key-principles)
   - [Four Orthogonal Dimensions](#four-orthogonal-dimensions)
   - [Execution Tiers (When Checks Run)](#execution-tiers-when-checks-run)
     - [Tier 1: Pre-Commit](#tier-1-pre-commit--10s-total)
     - [Tier 2: PR/CI](#tier-2-prci-minutes)
     - [Tier 3: AI Code Review](#tier-3-ai-code-review)
     - [Tier 4: Owner Review](#tier-4-owner-review)
2. [Detailed Guides](#detailed-guides)
3. [Quick Decision Trees](#quick-decision-trees)
   - [Which test scope?](#which-test-scope)
   - [Which approach?](#which-approach)
   - [Which execution tier?](#which-execution-tier)
4. [Test Markers Reference](#test-markers-reference)
5. [Test Commands Reference](#test-commands-reference)
6. [Test Organization](#test-organization)
7. [Coverage Targets](#coverage-targets)
8. [Testing Tools](#testing-tools)
9. [Domain-Specific Testing](#domain-specific-testing)
   - [Performance Testing (NFR1: Vectorization)](#performance-testing-nfr1-vectorization)
   - [Data Leakage Prevention (NFR4: Temporal Boundaries)](#data-leakage-prevention-nfr4-temporal-boundaries)
10. [References](#references)

---

## Overview

### Key Principles

1. ✅ **Fast feedback** via Tier 1 (pre-commit, < 10s total)
2. ✅ **Thorough validation** via Tier 2 (PR/CI, complete suite)
3. ✅ **Four orthogonal dimensions** - choose appropriate combination
4. ✅ **Coverage is a signal, not a gate** - identify gaps, don't block
5. ✅ **Mutation testing** evaluates test quality (critical modules only)
6. ✅ **Vectorization compliance** via performance testing (NFR1)
7. ✅ **Temporal integrity** via data leakage testing (NFR4)
8. ✅ **4-tier execution model** - Tier 1 (pre-commit) → Tier 2 (PR/CI) → Tier 3 (AI review) → Tier 4 (owner review)

### Four Orthogonal Dimensions

This strategy separates **four independent dimensions** of testing. Choose the appropriate combination for each test case:

1. **Test Scope** - *What* you're testing → [Scope Guide](testing/test-scope-guide.md)
   - **Unit:** Single function/class in isolation
   - **Integration:** Multiple components working together

2. **Test Approach** - *How* you write the test → [Approach Guide](testing/test-approach-guide.md)
   - **Example-based:** Concrete inputs → expected outputs
   - **Property-based (Hypothesis):** Invariants that should hold for all inputs
   - **Fuzz-based (Hypothesis):** Random/mutated inputs to find crashes and error handling gaps (no dedicated marker — use `@pytest.mark.slow`)

3. **Test Purpose** - *Why* you're writing the test → [Purpose Guide](testing/test-purpose-guide.md)
   - **Functional:** Correctness of behavior (default)
   - **Performance:** Speed/efficiency compliance (NFR1: vectorization)
   - **Regression:** Prevent previously fixed bugs from recurring

4. **Execution Scope** - *When* tests/checks run → [Execution Guide](testing/execution.md)
   - **Tier 1 (Pre-commit):** Smoke tests + fast checks (< 10s total)
   - **Tier 2 (PR/CI):** Complete suite + coverage + mutation
   - **Tier 3/4:** AI + Owner review

**Note:** Mutation testing and coverage are **not test types** - they're quality assurance tools. See [Quality Assurance Guide](testing/quality.md).

### Execution Tiers (When Checks Run)

The project uses a **4-tier execution model** that balances speed with thoroughness:

#### Tier 1: Pre-Commit (< 10s total)

Fast, local checks that run on every commit:

| Check | Tool | What It Catches |
|-------|------|-----------------|
| Lint | `ruff check .` | Style violations, import issues |
| Format | `ruff format --check .` | Inconsistent formatting |
| Type-check | `mypy` (strict) | Missing annotations, type errors |
| Smoke tests | `pytest -m smoke` | Broken imports, sanity failures |

**Rationale:** Catch 80% of issues in seconds before code leaves your machine.

#### Tier 2: PR/CI (minutes)

Comprehensive validation before merge:

| Check | Tool | What It Catches |
|-------|------|-----------------|
| Full test suite | `pytest` | All regressions, edge cases |
| Integration tests | `pytest -m integration` | Component interaction failures |
| Property-based | `pytest -m property` | Invariant violations |
| Performance | `pytest -m performance` | Vectorization violations, speed regressions |
| Coverage | `pytest-cov` | Untested code paths |
| Mutation (Tier 1 modules) | `mutmut` | Weak tests, coverage gaps |

**Rationale:** Catch remaining 20% requiring full project context.

#### Tier 3: AI Code Review

Docstring quality, vectorization compliance, architecture alignment, test quality, design intent.

#### Tier 4: Owner Review

Functional correctness, strategic alignment, complexity appropriateness, scope creep prevention.

See [Execution Guide](testing/execution.md) for complete details on each tier.

---

## Detailed Guides

For comprehensive explanations, examples, and best practices:

- **[Test Scope Guide](testing/test-scope-guide.md)** - Unit vs Integration tests
- **[Test Approach Guide](testing/test-approach-guide.md)** - Example-based vs Property-based
- **[Test Purpose Guide](testing/test-purpose-guide.md)** - Functional, Performance, Regression
- **[Execution Guide](testing/execution.md)** - When tests/checks run (4-tier model)
- **[Quality Assurance Guide](testing/quality.md)** - Mutation testing, coverage analysis
- **[Conventions Guide](testing/conventions.md)** - Fixtures, markers, organization, coverage targets
- **[Domain Testing Guide](testing/domain-testing.md)** - Performance testing, Data leakage prevention

---

## Quick Decision Trees

### Which test scope?

```mermaid
flowchart TD
    Start{Does it interact with<br/>external systems?<br/>files, DB, network}
    Start -->|YES| Integration[Integration test<br/>@pytest.mark.integration<br/>PR-time only]
    Start -->|NO| Unit[Unit test<br/>fast, pre-commit eligible if smoke]
```

### Which approach?

```mermaid
flowchart TD
    Start{Testing error handling<br/>or crash resilience?}
    Start -->|YES| Fuzz[Fuzz-based<br/>Hypothesis st.text/st.binary]
    Start -->|NO| Known{Have specific<br/>known scenarios?}
    Known -->|YES| Example[Example-based<br/>parametrize for multiple cases]
    Known -->|NO| Invariant{Can you state<br/>an invariant?}
    Invariant -->|YES| Property[Property-based<br/>@pytest.mark.property<br/>Hypothesis]
    Invariant -->|NO| ExampleAlt[Example-based<br/>test specific examples]
```

### Which execution tier?

```mermaid
flowchart TD
    Start{Is test fast?<br/>under 1 second}
    Start -->|NO| Tier2Slow[Tier 2 only<br/>@pytest.mark.slow]
    Start -->|YES| Critical{Import/sanity/schema check<br/>OR critical regression?}
    Critical -->|YES| Tier1[Tier 1 eligible<br/>@pytest.mark.smoke]
    Critical -->|NO| Tier2Fast[Tier 2 only<br/>save pre-commit budget]
```

---

## Test Markers Reference

| Marker | Dimension | Command |
|--------|-----------|---------|
| `@pytest.mark.smoke` | Speed |  `pytest -m smoke` |
| `@pytest.mark.slow` | Speed |  `pytest -m "not slow"` |
| `@pytest.mark.unit` | Scope |  `pytest -m unit` |
| `@pytest.mark.integration` | Scope |  `pytest -m integration` |
| `@pytest.mark.property` | Approach |  `pytest -m property` |
| `@pytest.mark.performance` | Purpose |  `pytest -m performance` |
| `@pytest.mark.regression` | Purpose |  `pytest -m regression` |
| `@pytest.mark.no_mutation` | Quality |  Tests incompatible with mutmut runner |


**Combine markers across dimensions:**
```python
@pytest.mark.integration
@pytest.mark.property
@pytest.mark.regression
```

---

## Test Commands Reference

| Context | Command | What Runs |
|---------|---------|-----------|
| **Tier 1 (Pre-commit)** | `pytest -m smoke` | Smoke tests only (< 5s; Tier 1 overall < 10s) |
| **Tier 2 (PR/CI - full)** | `pytest` | All tests |
| **Tier 2 (PR/CI - coverage)** | `pytest --cov=src/ncaa_eval --cov-report=term-missing` | All + coverage report |
| **Tier 2 (exclude slow)** | `pytest -m "not slow"` | All except slow tests |
| **Filter by dimension** | `pytest -m integration` | Filter by marker |
| **Combined filters** | `pytest -m "integration and regression"` | Intersection |

---

## Test Organization

```
tests/
├── __init__.py
├── conftest.py                          # Shared fixtures
├── fixtures/
│   ├── .gitkeep
│   └── kaggle/
│       ├── MNCAATourneyCompactResults.csv
│       ├── MRegularSeasonCompactResults.csv
│       ├── MSeasons.csv
│       └── MTeams.csv
├── integration/
│   ├── __init__.py
│   ├── test_documented_commands.py
│   ├── test_elo_integration.py
│   ├── test_feature_serving_integration.py
│   └── test_sync.py
└── unit/
    ├── __init__.py
    ├── test_bracket_page.py
    ├── test_bracket_renderer.py
    ├── test_calibration.py
    ├── test_chronological_serving.py
    ├── test_cli_train.py
    ├── test_connector_base.py
    ├── test_dashboard_app.py
    ├── test_dashboard_filters.py
    ├── test_deep_dive_page.py
    ├── test_elo.py
    ├── test_espn_connector.py
    ├── test_evaluation_backtest.py
    ├── test_evaluation_metrics.py
    ├── test_evaluation_plotting.py
    ├── test_evaluation_simulation.py
    ├── test_evaluation_splitter.py
    ├── test_feature_serving.py
    ├── test_framework_validation.py
    ├── test_fuzzy.py
    ├── test_graph.py
    ├── test_home_page.py
    ├── test_imports.py
    ├── test_kaggle_connector.py
    ├── test_leaderboard_page.py
    ├── test_logger.py
    ├── test_model_base.py
    ├── test_model_elo.py
    ├── test_model_logistic_regression.py
    ├── test_model_registry.py
    ├── test_model_tracking.py
    ├── test_model_xgboost.py
    ├── test_normalization.py
    ├── test_opponent.py
    ├── test_package_structure.py
    ├── test_pool_scorer_page.py
    ├── test_repository.py
    ├── test_run_store_metrics.py
    ├── test_schema.py
    └── test_sequential.py
```

**Naming conventions:**
- Test files: `test_<module_name>.py`
- Test functions: `test_<function>_<scenario>()`
- Fixtures: Descriptive names (e.g., `sample_teams`, `elo_config`, `temp_data_dir`)

See [Conventions Guide](testing/conventions.md) for details.

---

## Coverage Targets

| Module | Line | Branch | Rationale |
|--------|------|--------|-----------|
| `evaluation/metrics.py` | 95% | 90% | Critical - errors invalidate all evaluations |
| `evaluation/simulation.py` | 90% | 85% | Monte Carlo simulator |
| `model/` | 90% | 85% | Core abstraction |
| `transform/` | 85% | 80% | Feature correctness, leakage prevention |
| `ingest/` | 80% | 75% | Data quality |
| `utils/` | 75% | 70% | Lower priority |
| **Overall** | **80%** | **75%** | Balanced |

**Coverage is a signal, not a gate.** Use to identify gaps, not block PRs.

See [Conventions Guide](testing/conventions.md#coverage-targets) for details.

---

## Testing Tools

| Tool | Purpose | Configuration |
|------|---------|---------------|
| **Pytest** | Testing framework | `pyproject.toml` `[tool.pytest.ini_options]` |
| **Hypothesis** | Property-based + Fuzz testing | Dev dependency |
| **Mutmut** | Mutation testing (quality) | Dev dependency |
| **pytest-cov** | Coverage reporting | `[tool.coverage.report]` |
| **Nox** | Session orchestration | `noxfile.py` |

---

## Domain-Specific Testing

### Performance Testing (NFR1: Vectorization)

- **Smoke:** Assertion-based vectorization checks (< 1s)
- **PR-time:** Performance benchmarks, 60-second backtest target

```python
@pytest.mark.smoke
@pytest.mark.performance
def test_metrics_are_vectorized():
    """Quick check: no .iterrows() in metrics."""
    # See domain-testing.md for example
```

### Data Leakage Prevention (NFR4: Temporal Boundaries)

- **Smoke:** API contract unit tests (fast)
- **PR-time:** End-to-end workflow tests, property-based invariants

```python
@pytest.mark.smoke
def test_api_enforces_cutoff():
    """Quick check: API rejects future data."""
    # See domain-testing.md for example
```

See [Domain Testing Guide](testing/domain-testing.md) for comprehensive examples.

---

## References

- [`STYLE_GUIDE.md`](STYLE_GUIDE.md) - Coding standards, vectorization rule
- `specs/05-architecture-fullstack.md` - Architecture, nox workflow
- `specs/03-prd.md` - Non-functional requirements (NFR1-NFR5)
- `pyproject.toml` - Pytest configuration
- `.github/pull_request_template.md` - PR checklist