# Domain-Specific Testing

This guide covers testing approaches specific to the `ncaa_eval` project's domain requirements: **performance testing** (vectorization compliance) and **data leakage prevention**.

---

## Performance Testing Guidelines

### Context: Vectorization First Rule

The Architecture mandates **"Vectorization First"** (Section 12):

> **Reject PRs that use `for` loops over Pandas DataFrames for metric calculations.**

This project has a **60-second backtest target** (10-year Elo training + inference). NFR1 mandates vectorization; NFR2 mandates parallelism. Performance testing ensures these requirements are met.

---

### How to Test Vectorization

#### 1. Assertion-Based Unit Tests (Quick Smoke Check)

**Purpose:** Catch obvious violations quickly during pre-commit

```python
@pytest.mark.smoke
def test_brier_score_is_vectorized():
    """Verify Brier score uses vectorized operations (no Python loops)."""
    import inspect

    source = inspect.getsource(brier_score)

    # Check that implementation doesn't contain for loops over data
    # (Allow for loops in fixture setup, just not in calculation logic)
    assert "for " not in source or "# vectorized" in source, \
        "Brier score must use vectorized numpy/pandas operations"
```

**Pros:** Fast, catches obvious violations
**Cons:** Brittle (may trigger false positives on legitimate loops in comments)

**Pre-commit:** ✅ **YES** - Mark as `@pytest.mark.smoke`

---

#### 2. Performance Benchmark Integration Tests

**Purpose:** Verify actual performance meets targets

```python
import timeit
import pytest

@pytest.mark.slow
@pytest.mark.integration
def test_brier_score_performance():
    """Verify Brier score meets performance target."""
    predictions = np.random.rand(10_000)
    actuals = np.random.randint(0, 2, 10_000)

    time_taken = timeit.timeit(
        lambda: brier_score(predictions, actuals),
        number=100
    )

    # Should complete 100 iterations in < 100ms (1ms per call for 10k predictions)
    assert time_taken < 0.1, f"Brier score too slow: {time_taken:.3f}s for 100 iterations"
```

**Pros:** Verifies actual performance, catches regressions
**Cons:** Slow, results vary by machine (need generous thresholds)

**Pre-commit:** ❌ **NO** (too slow)
**PR-time:** ✅ **YES** - Mark as `@pytest.mark.slow` and `@pytest.mark.performance`

---

#### 3. pytest-benchmark (Optional, for Critical Paths)

> ⚠️ **NOT INSTALLED:** `pytest-benchmark` is optional and NOT currently in `pyproject.toml`.
> Use timeit-based benchmarks (Option 2) for now.

**Purpose:** Track performance over time with detailed statistics

```python
def test_brier_score_benchmark(benchmark):
    """Benchmark Brier score calculation."""
    predictions = np.random.rand(10_000)
    actuals = np.random.randint(0, 2, 10_000)

    result = benchmark(brier_score, predictions, actuals)

    # Benchmark provides detailed statistics (mean, stddev, percentiles)
    assert result is not None
```

**Pros:** Detailed statistics, integrated with pytest, tracks performance over time
**Cons:** Requires `pytest-benchmark` plugin (not currently installed)

**Pre-commit:** ❌ **NO**
**PR-time:** ✅ **YES** (if plugin is installed)

---

### Comprehensive Performance Test Example

```python
@pytest.mark.slow
@pytest.mark.performance
@pytest.mark.integration
def test_full_backtest_meets_60_second_target():
    """Verify 10-year Elo backtest completes within 60 seconds (NFR1)."""
    games = load_games_for_years(range(2013, 2023))

    start_time = timeit.default_timer()
    model = EloModel()
    model.fit(games)
    predictions = model.predict(games)
    elapsed = timeit.default_timer() - start_time

    assert elapsed < 60.0, f"Backtest too slow: {elapsed:.2f}s (target: < 60s)"
```

---

### Vectorization Smoke Test Pattern

```python
import inspect

@pytest.mark.smoke
@pytest.mark.performance
def test_metric_calculations_are_vectorized():
    """Verify critical metric calculations don't use Python loops (NFR1)."""
    from ncaa_eval.evaluation import metrics

    # Check that metric module doesn't contain for loops over DataFrames
    source = inspect.getsource(metrics)

    # This is a heuristic - manual review is still needed
    forbidden_patterns = [
        "for _ in df.iterrows()",
        ".iterrows()",
        ".itertuples()",
        "for row in df",
    ]

    for pattern in forbidden_patterns:
        assert pattern not in source, \
            f"Metrics module contains non-vectorized pattern: {pattern}"
```

> **Note:** The ingest layer (`src/ncaa_eval/ingest/`) is excluded from the `iterrows()`
> forbidden-pattern check only. `iterrows()` is explicitly permitted there for one-time-per-sync
> data transformation (see Style Guide Section 5, Exception #4). The `.itertuples()` and
> `for row in df` patterns remain prohibited in business-logic and metric-calculation code;
> their use elsewhere (ingest connectors, model prediction loops) is evaluated case-by-case
> against the vectorization mandate.

---

### Recommendation

- Use **assertion-based tests** for quick smoke checks (mark as `@pytest.mark.smoke`)
- Use **performance benchmarks** for critical modules (`evaluation/metrics.py`, `evaluation/simulation.py`) as `@pytest.mark.slow` tests (PR-time only)
- Consider **pytest-benchmark** for performance regression tracking (optional, not currently installed)

---

## Data Leakage Prevention Testing

### Context: Temporal Boundary Enforcement

The Architecture mandates **"Data Safety: Temporal boundaries enforced strictly in the API to prevent data leakage"** (Section 11.2).

NFR4 requires strict temporal boundaries: **training data must never include information from future games**. This is critical for model validity.

---

### How to Test Temporal Integrity

#### 1. Unit Tests (API Contract)

**Purpose:** Verify API rejects requests for future data

```python
def test_get_chronological_season_enforces_cutoff():
    """Verify chronological API rejects requests for future data."""
    api = ChronologicalDataServer()

    with pytest.raises(ValueError, match="Cannot access future data"):
        api.get_games_before(date="2025-12-31", cutoff_date="2025-01-01")
```

**Tests:** API raises error when requesting data beyond cutoff

**Pre-commit:** ✅ **YES** (fast, no I/O) - Mark as `@pytest.mark.smoke`

---

#### 2. Integration Tests (End-to-End Workflow)

**Purpose:** Verify end-to-end workflows prevent leakage

```python
@pytest.mark.integration
def test_walk_forward_validation_prevents_leakage(sample_games):
    """Verify walk-forward CV never trains on future data."""
    splitter = WalkForwardSplitter(years=range(2015, 2025))

    for train_data, test_data, year in splitter.split(sample_games):
        # Verify all training games occur before test games
        train_max_date = train_data['date'].max()
        test_min_date = test_data['date'].min()

        assert train_max_date < test_min_date, \
            f"Data leakage detected in {year} fold: train_max={train_max_date}, test_min={test_min_date}"
```

**Tests:** Cross-validation splitter never leaks future data into training

**Pre-commit:** ❌ **NO** (too slow)
**PR-time:** ✅ **YES** - Mark as `@pytest.mark.integration`

---

#### 3. Property-Based Tests (Invariants)

**Purpose:** Verify temporal boundary holds across all possible inputs

```python
from hypothesis import given, strategies as st

@pytest.mark.property
@given(cutoff_year=st.integers(2015, 2025))
def test_temporal_boundary_invariant(cutoff_year):
    """Verify no API call can access data beyond cutoff year."""
    api = ChronologicalDataServer()
    games = api.get_games_before(cutoff_year=cutoff_year)

    # Invariant: ALL games must be from or before cutoff year
    assert all(game.season <= cutoff_year for game in games), \
        f"API returned games beyond cutoff year {cutoff_year}"
```

**Tests:** Temporal boundary holds across all possible cutoff years

**Pre-commit:** ❌ **NO** (Hypothesis is slow)
**PR-time:** ✅ **YES** - Mark as `@pytest.mark.property`

---

### Comprehensive Data Leakage Test Example

```python
@pytest.mark.regression
@pytest.mark.integration
def test_chronological_api_rejects_exact_cutoff_date():
    """Regression test: API allowed games ON cutoff date (data leakage).

    Bug: Issue #87 - get_games_before() used <= instead of < for date comparison.
    Fixed: 2026-02-10 - Changed to strict less-than comparison.
    """
    api = ChronologicalDataServer()
    cutoff = "2023-03-15"

    # Load games before cutoff
    games = api.get_games_before(cutoff_date=cutoff)

    # NO game should have date >= cutoff (strict less-than)
    from datetime import datetime
    cutoff_dt = datetime.fromisoformat(cutoff)

    for game in games:
        game_date = datetime.fromisoformat(game.date)
        assert game_date < cutoff_dt, \
            f"Game on/after cutoff leaked through: {game.date} (cutoff: {cutoff})"
```

---

### Recommendation

- Add **unit tests** for API contract violations (fast, pre-commit safe)
- Add **integration tests** for end-to-end workflows like walk-forward CV (PR-time only)
- Add **property-based tests** for temporal boundary invariants (PR-time only)

---

## Summary

### Performance Testing Checklist

- ✅ Assertion-based smoke tests for vectorization compliance (`@pytest.mark.smoke`)
- ✅ Performance benchmarks for critical paths (`@pytest.mark.slow`, `@pytest.mark.performance`)
- ✅ Full backtest timing test (60-second target)
- ✅ Forbidden pattern detection (`.iterrows()`, etc.)

### Data Leakage Prevention Checklist

- ✅ Unit tests for API contract enforcement (`@pytest.mark.smoke`)
- ✅ Integration tests for end-to-end workflows (`@pytest.mark.integration`)
- ✅ Property-based tests for temporal boundary invariants (`@pytest.mark.property`)
- ✅ Regression tests for previously discovered leakage bugs (`@pytest.mark.regression`)

---

## See Also

- [Test Purpose Guide](test-purpose-guide.md) - Performance and Regression testing
- [Test Scope Guide](test-scope-guide.md) - Unit vs Integration tests
- [Test Approach Guide](test-approach-guide.md) - Example-based vs Property-based
- [Execution Guide](execution.md) - When tests/checks run (4-tier model)
- [Quality Assurance Guide](quality.md) - Mutation testing, coverage analysis
- [STYLE_GUIDE.md](../STYLE_GUIDE.md) Section 5 - Vectorization First rule