# NCAA Eval Style Guide

This guide documents the coding standards and conventions for the `ncaa_eval` project.
All rules here reflect decisions already configured in `pyproject.toml`. Tooling
enforcement is handled by separate stories (1.4-1.6); this document captures the
**what** and **why**.

---

## 1. Docstring Convention (Google Style)

**Choice:** Google-style docstrings.
**Configured in:** `[tool.ruff.lint.pydocstyle] convention = "google"`

### Rationale

- Google style reads naturally in source code and renders well in Sphinx via
  `napoleon`.
- With PEP 484 type annotations present everywhere (mypy --strict), repeating
  types inside docstrings is redundant. Google style keeps the prose concise.

### Rules

1. **Do not duplicate types** in `Args:` / `Returns:` sections. Type information
   lives in annotations only.
2. Include `Args:`, `Returns:` (unless `None`), and `Raises:` (if applicable).
3. Add an `Examples:` section for complex public functions.
4. Docstrings are **encouraged but not enforced** on every entity. When pydocstyle
   (`D`) rules are enabled (Story 1.4), `D1` (missing docstring) and `D415`
   (first-line punctuation) will be suppressed via the Ruff `ignore` list.
   **Note:** `D` rules are *not currently active* — the `convention = "google"`
   setting and ignore entries in `pyproject.toml` are preparatory config that
   takes effect once `D` is added to `extend-select` in Story 1.4.
   At minimum, every public module and every public class should have a docstring.
5. **Detailed description required** when a function performs 3 or more operations.
   The description paragraph (after the summary line) must explain *how* the
   function implements its purpose — not just restate the summary. This is enforced
   during code review (PR checklist).

### Example

```python
from __future__ import annotations

def rolling_efficiency(
    scores: pd.Series,
    window: int,
    min_periods: int = 1,
) -> pd.Series:
    """Compute rolling offensive efficiency over a sliding window.

    Applies an exponentially-weighted mean to per-game point totals,
    normalised by possessions.

    Args:
        scores: Raw per-game point totals.
        window: Number of games in the rolling window.
        min_periods: Minimum observations required for a valid result.

    Returns:
        Rolling efficiency values aligned to the input index.

    Raises:
        ValueError: If *window* is less than 1.

    Examples:
        >>> rolling_efficiency(pd.Series([70, 80, 90]), window=2)
        0      NaN
        1    75.0
        2    85.0
        dtype: float64
    """
```

---

## 2. Naming Conventions

| Entity | Convention | Example |
|---|---|---|
| **Modules / packages** | `snake_case` | `feature_pipeline.py`, `ncaa_eval/` |
| **Classes** | `PascalCase` | `EloModel`, `TeamStats` |
| **Functions / methods** | `snake_case` | `compute_margin`, `get_team_id` |
| **Variables** | `snake_case` | `home_score`, `season_df` |
| **Constants** | `UPPER_SNAKE_CASE` | `DEFAULT_K_FACTOR`, `MIN_GAMES` |
| **Type aliases** | `PascalCase` (use `type` statement) | `type TeamId = int` |
| **Private members** | Single leading underscore | `_cache`, `_validate_input()` |
| **Test files** | `test_<module>.py` | `test_elo_model.py` |
| **Test functions** | `test_<behaviour>` | `test_margin_positive_when_home_wins` |

### Additional Rules

- **No abbreviations** in public APIs unless universally understood (`id`, `df`,
  `url`).
- **Boolean names** should read as predicates: `is_valid`, `has_played`,
  `should_normalize`.
- **DataFrame variables** should indicate their grain: `game_df`, `season_stats_df`,
  `team_features_df`.

---

## 3. Import Ordering

**Configured in:** `[tool.ruff.lint.isort]`

Imports are ordered in three groups separated by blank lines:

1. **Standard library** (`os`, `pathlib`, `typing`, ...)
2. **Third-party** (`pandas`, `numpy`, `xgboost`, ...)
3. **Local / first-party** (`ncaa_eval`, `tests`)

### Mandatory Rules

| Rule | Detail | Config Reference |
|---|---|---|
| Future annotations | Every file must start with `from __future__ import annotations` | `required-imports` |
| Combine as-imports | `from foo import bar, baz as B` on one line | `combine-as-imports = true` |
| First-party detection | `tests` is known first-party | `known-first-party = ["tests"]` |

### Example

```python
from __future__ import annotations

import logging
from pathlib import Path

import numpy as np
import pandas as pd
from sklearn.model_selection import TimeSeriesSplit

from ncaa_eval.model import Model
from ncaa_eval.transform import compute_rolling_stats
```

### Active Ruff Rules

Beyond isort, `pyproject.toml` extends the default Ruff rules (`E`, `F`) with:

| Rule | Category | What It Enforces |
|---|---|---|
| `I` | isort | Import ordering (see above) |
| `UP` | pyupgrade | Modern Python syntax — e.g., `list[int]` not `List[int]`, `X \| None` not `Optional[X]` |
| `PT` | flake8-pytest-style | Pytest best practices — e.g., `@pytest.fixture` conventions, parametrize style |
| `TID25` | tidy-imports | Import hygiene — bans relative imports from parent packages |
| `C90` | mccabe | McCabe cyclomatic complexity ≤ 10 (see Section 6.1) |
| `PLR0911` | pylint-refactor | Max 6 return statements per function |
| `PLR0912` | pylint-refactor | Max 12 branches per function |
| `PLR0913` | pylint-refactor | Max 5 function arguments |

### Suppressed Rules

| Rule | Why Suppressed |
|---|---|
| `E501` | Line length is enforced by the Ruff **formatter** (110 chars), not the linter. Suppressing `E501` avoids double-reporting. |
| `D1` | Missing-docstring warnings. Not yet active (see Section 1 note). |
| `D415` | First-line punctuation. Not yet active (see Section 1 note). |
| `PLR2004` | Magic value comparison — too aggressive for data science code with domain-specific constants (e.g., tournament field size `64`, coin-flip probability `0.5`). See PEP 20 "Practicality Beats Purity" in Section 6. |

### Lint Suppression Policy

When `# noqa` or `# type: ignore` is unavoidable, follow these rules:

1. **Always include the specific error code.** Never use bare `# noqa` or bare
   `# type: ignore` — always specify which rule is being suppressed.

   ```python
   # GOOD: Specific suppression with rationale
   import joblib  # type: ignore[import-untyped] — no type stubs available

   # BAD: Bare suppression — which rule? why?
   import joblib  # type: ignore
   ```

2. **Prefer refactoring over suppression.** The escalation path:
   - First: Refactor to eliminate the violation (extract function, split class, etc.)
   - Second: If refactoring is impractical, suppress with specific code and comment
   - Never: Suppress because "it's faster than fixing"

3. **Complexity-class codes require PO approval.** Inline `# noqa` for the following
   codes is allowed only with explicit PO approval:
   - `PLR0911` — Too many return statements (max 6)
   - `PLR0912` — Too many branches (max 12)
   - `PLR0913` — Too many arguments (max 5)
   - `C901` — McCabe cyclomatic complexity (max 10)

   When a suppression is approved, the inline comment must include a rationale:
   ```python
   # Approved — rationale
   def update_game(  # noqa: PLR0913 — game data has inherent dimensionality
   # Pending refactoring — tracked to a story
   def run_training(  # noqa: PLR0913, C901, PLR0912 — REFACTOR Story 8.1
   ```

4. **Acceptable suppression scenarios:**
   - `# type: ignore[import-untyped]` — Third-party libraries without type stubs
   - `# type: ignore[no-untyped-def]` — Callback signatures imposed by frameworks
     (e.g., Streamlit event handlers)
   - `# noqa: BLE001` — Broad exception handling with documented rationale
   - `# noqa: PLR0913` (etc.) — With PO approval and rationale (see rule 3)

5. **Unacceptable suppression scenarios:**
   - Any `# noqa` for PLR0911, PLR0912, PLR0913, or C901 **without** PO approval
   - `# type: ignore` on return types — Fix the return type annotation instead
   - Any suppression without the specific error code

---

## 4. Type Annotation Standards

**Configured in:** `[tool.mypy] strict = true`

### What `--strict` Means for Developers

Every function signature, variable, and return type must be annotated. The strict
flag enables all of the following (as of mypy 1.x — the exact set may change with
future versions; run `mypy --help` to see the current list):

- `--disallow-untyped-defs` — every function needs annotations
- `--disallow-any-generics` — no bare `list`, `dict`; use `list[int]`, `dict[str, float]`
- `--warn-return-any` — functions must not silently return `Any`
- `--no-implicit-reexport` — explicit `__all__` required for re-exports
- `--strict-bytes` — distinguishes `bytes` from `str`

### Practical Guidelines

1. Use `from __future__ import annotations` (already enforced) so all annotations
   are strings and forward references work.
2. Use modern syntax: `list[int]` not `List[int]`, `X | None` not `Optional[X]`.
3. Use the `type` statement for aliases: `type TeamId = int`.
4. For third-party libraries with incomplete stubs (numpy, pandas, xgboost), the
   config uses `follow_imports = "silent"` to suppress errors from untyped
   dependencies. Add `# type: ignore[<code>]` only when truly necessary and always
   include the specific error code.
5. The `py.typed` marker at `src/ncaa_eval/py.typed` signals PEP 561 compliance.

### Pydantic Integration

**Configured in:** `[tool.mypy] plugins` and `[tool.pydantic-mypy]`

Pydantic is the project's primary data modeling library (shared types between Logic
and UI layers). The mypy plugin provides enhanced type checking for Pydantic models:

```toml
[tool.mypy]
plugins = ["pydantic.mypy"]

[tool.pydantic-mypy]
init_typed = true                    # __init__ params get field types
init_forbid_extra = true             # error on unknown fields in __init__
warn_required_dynamic_aliases = true # warn if alias without default
```

**What this means for developers:**
- `init_typed = true` — mypy understands `Model(field=value)` constructor calls
  and validates argument types against field definitions.
- `init_forbid_extra = true` — passing an unknown field name to a model constructor
  is a type error, not a silent runtime discard.
- `warn_required_dynamic_aliases = true` — alerts when a field with `alias=` has no
  default value, which can cause confusing `ValidationError` messages.

---

## 5. Vectorization First

> **Reject PRs that use `for` loops over Pandas DataFrames for metric
> calculations.**
>
> — Architecture Section 12

This is a **hard rule**. Scientific computation in this project must use vectorized
operations.

### Why

- Vectorized pandas/numpy operations are 10-100x faster than Python loops.
- They are easier to reason about for correctness.
- They compose cleanly with `.pipe()` chains.

### Correct Pattern

```python
# Vectorized — computes margin for every game in one shot
game_df["margin"] = game_df["home_score"] - game_df["away_score"]

# Vectorized rolling mean
game_df["rolling_ppg"] = game_df.groupby("team_id")["points"].transform(
    lambda s: s.rolling(window=5, min_periods=1).mean()
)
```

### Incorrect Pattern

```python
# WRONG — for loop over DataFrame rows
margins = []
for _, row in game_df.iterrows():
    margins.append(row["home_score"] - row["away_score"])
game_df["margin"] = margins
```

### Exceptions

Explicit `for` loops are acceptable only when:

1. Operating on a **small, fixed collection** (e.g., iterating over 5 model
   configs, not 10,000 game rows).
2. Performing **graph traversal** (e.g., NetworkX operations that have no
   vectorized equivalent).
3. The loop body involves **side effects** that cannot be vectorized (e.g.,
   writing files per team).
4. **Ingest-layer one-time-per-sync operations** — `src/ncaa_eval/ingest/connectors/kaggle.py`
   uses `iterrows()` in `load_day_zeros`, `fetch_teams`, and `_parse_games_csv` for
   per-row data transformation: constructing Pydantic models (`Game`, `Team`) in
   `fetch_teams` and `_parse_games_csv`, and parsing dates with per-row error handling
   in `load_day_zeros`. These run exactly once per `sync` operation and are not in any
   hot path. Pandera schema validation (Story 9.14) guards the DataFrame before iteration
   begins.
   *(PO Decision C — Audit item 2.4, 2026-03-11)*

In these cases, add a brief comment explaining why the loop is necessary.

---

## 6. Design Philosophy (PEP 20 - The Zen of Python)

> "Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex."
>
> — PEP 20

While PEP 20 can't be fully automated, we enforce its core principles through code review and tooling where possible.

### Enforced Principles

#### Simple is Better Than Complex
- **Tooling:** Ruff complexity checks (see Section 6.1)
- **Review:** Max cyclomatic complexity = 10, max function length = 50 lines
- **Guideline:** If a function is hard to name, it's doing too much — split it

```python
# GOOD: Simple, single-purpose function
def calculate_margin(home_score: int, away_score: int) -> int:
    """Calculate point margin (positive = home win)."""
    return home_score - away_score

# BAD: Too complex, doing multiple things
def process_game(game_data: dict) -> dict:
    """Process game... but what does this actually do?"""
    # 80 lines of nested logic with unclear responsibilities
    ...
```

#### Explicit is Better Than Implicit
- **Tooling:** Mypy strict mode enforces explicit types
- **Review:** No magic numbers, no implicit state changes, clear function names
- **Guideline:** Code should read like prose — the reader shouldn't have to guess

```python
# GOOD: Explicit parameters and behavior
def calculate_elo_change(
    rating: int,
    opponent_rating: int,
    won: bool,
    k_factor: int = 32,
) -> int:
    """Calculate Elo rating change after a game."""
    expected = 1 / (1 + 10 ** ((opponent_rating - rating) / 400))
    actual = 1 if won else 0
    return int(k_factor * (actual - expected))

# BAD: Implicit behavior, magic numbers
def adjust_rating(r: int, o: int, w: bool) -> int:
    return int(32 * (w - 1 / (1 + 10 ** ((o - r) / 400))))
```

#### Readability Counts
- **Tooling:** Ruff formatting (110 char lines), naming conventions
- **Review:** Variable names reflect domain concepts, not abbreviations
- **Guideline:** Write for humans first, computers second

```python
# GOOD: Clear domain language
team_offensive_efficiency = total_points / total_possessions

# BAD: Abbreviations require mental translation
off_eff = pts / poss
```

#### Flat is Better Than Nested
- **Tooling:** Ruff detects excessive nesting
- **Review:** Max nesting depth = 3
- **Guideline:** Use early returns to reduce nesting

```python
# GOOD: Flat structure with early returns
def validate_game(game: Game) -> None:
    """Validate game data."""
    if game.home_score < 0:
        raise ValueError("Home score cannot be negative")
    if game.away_score < 0:
        raise ValueError("Away score cannot be negative")
    if game.date > datetime.now():
        raise ValueError("Game cannot be in the future")

# BAD: Nested structure
def validate_game(game: Game) -> None:
    if game.home_score >= 0:
        if game.away_score >= 0:
            if game.date <= datetime.now():
                return
            else:
                raise ValueError("Game cannot be in the future")
        else:
            raise ValueError("Away score cannot be negative")
    else:
        raise ValueError("Home score cannot be negative")
```

#### There Should Be One Obvious Way to Do It
- **Review:** Follow established project patterns (e.g., vectorization for calculations)
- **Guideline:** Check existing code before inventing new approaches

```python
# GOOD: Follows project pattern (vectorized operations)
game_df["margin"] = game_df["home_score"] - game_df["away_score"]

# BAD: Invents custom approach (loops)
for idx in range(len(game_df)):
    game_df.loc[idx, "margin"] = game_df.loc[idx, "home_score"] - game_df.loc[idx, "away_score"]
```

#### Complex is Better Than Complicated
- **Review:** Accept genuine complexity; reject unnecessary complication
- **Guideline:** Some systems are inherently complex (simulation engines, Elo rating
  chains, cross-validation orchestrators). The goal is *managed complexity* — complex
  internals behind simple interfaces — not *complication* from tangled logic.

```python
# GOOD: Complex but not complicated — clear interface, complex internals
class MonteCarloSimulator:
    """Simulate tournament outcomes via Monte Carlo sampling.

    The algorithm is inherently complex (bracket traversal, probability
    propagation, seeded randomness), but the interface is simple.
    """
    def simulate(self, bracket: Bracket, n_simulations: int = 10_000) -> SimulationResult:
        ...

# BAD: Complicated — unnecessary indirection, unclear purpose
def run_sim(data, opts, flags, ctx, mode="default", **kwargs):
    if mode == "default":
        ...  # 200 lines of nested conditionals
```

#### Sparse is Better Than Dense
- **Tooling:** 110-character line length limit, Ruff formatting
- **Review:** One operation per line, generous whitespace, no clever one-liners
- **Guideline:** When a line does too much, split it. When a function packs too many
  operations together, extract named intermediates.

```python
# GOOD: Sparse — each step is clear
expected_win_rate = 1 / (1 + 10 ** ((opponent_rating - team_rating) / 400))
actual_outcome = 1.0 if won else 0.0
rating_change = k_factor * (actual_outcome - expected_win_rate)

# BAD: Dense — everything crammed together
return int(32*(int(w)-1/(1+10**((o-r)/400))))
```

#### Special Cases Aren't Special Enough / Practicality Beats Purity
- **Review:** Allow pragmatic exceptions when properly documented
- **Guideline:** The vectorization-first rule (Section 5) has legitimate exceptions:
  graph traversal, small fixed collections, side-effect loops. Data science code has
  domain-specific constants (`0.5` for coin-flip probability, `64` for tournament
  teams) that PLR2004 would flag — we suppress PLR2004 because practicality beats
  purity for these cases. The key: every exception must be *documented and intentional*,
  never silent.

```python
# GOOD: Pragmatic exception with justification
# PLR2004 suppressed project-wide — domain constants like tournament size
# are clearer inline than as named constants
def build_bracket(teams: list[Team]) -> Bracket:
    if len(teams) != 64:  # NCAA tournament field size
        raise ValueError(f"Expected 64 teams, got {len(teams)}")
    ...

# BAD: Bare magic number with no context
def process(data):
    if len(data) > 42:  # What is 42?
        data = data[:42]
```

#### Errors Should Never Pass Silently / Unless Explicitly Silenced
- **Review:** Every exception handler must log or re-raise; no bare `except: pass`
- **Guideline:** This is the project's most-violated PEP 20 principle (see Pattern D
  from codebase audit). The convention:
  1. **Default:** Log at WARNING level with structured context, then re-raise or return
     a sentinel that callers check.
  2. **Explicitly silenced:** Log at DEBUG level with a code comment explaining *why*
     silence is acceptable. Use `# noqa` or `# type: ignore` with specific codes.
  3. **Never:** Bare `except Exception: pass` or `except Exception` that substitutes
     a value without logging.

```python
# GOOD: Error surfaced with context
try:
    response = fetch_team_schedule(team_id)
except RequestError as exc:
    logger.warning("Failed to fetch schedule for team %d: %s", team_id, exc)
    raise

# GOOD: Explicitly silenced with rationale
try:
    optional_enrichment = fetch_extra_stats(team_id)
except RequestError:
    # Enrichment is non-critical; base stats are sufficient for predictions.
    logger.debug("Optional enrichment unavailable for team %d", team_id)
    optional_enrichment = None

# BAD: Silent swallowing — Pattern D violation
try:
    data = fetch_team_schedule(team_id)
except Exception:
    data = pd.DataFrame()  # Caller has no idea data is missing
```

#### Refuse the Temptation to Guess
- **Tooling:** `mypy --strict` — forces explicit types everywhere
- **Review:** No implicit type coercions, no unvalidated assumptions
- **Guideline:** This aphorism is the philosophical foundation for `mypy --strict`.
  When types are explicit, the compiler catches wrong assumptions before runtime.
  When types are `Any` or missing, the code *guesses* — and guesses wrong at the
  worst possible time.

```python
# GOOD: Explicit types — no guessing
def get_team_rating(team_id: int, ratings: dict[int, float]) -> float | None:
    return ratings.get(team_id)

# BAD: Implicit types — code guesses about structure
def get_team_rating(team_id, ratings):
    return ratings[team_id]  # KeyError? ratings is a list? Who knows.
```

#### If the Implementation Is Hard to Explain, It's a Bad Idea / If It's Easy to Explain, It May Be a Good Idea
- **Tooling:** McCabe complexity ≤ 10, max 50 lines per function
- **Review:** If you can't describe a function in one sentence, it needs splitting
- **Guideline:** This reinforces the complexity limits. If a code reviewer asks
  "what does this do?" and the answer requires a paragraph, the function is doing
  too much. The `run_training()` function in `cli/train.py` (70+ statements, 3
  `noqa` suppressions) is the canonical example of code that is hard to explain.

```python
# GOOD: Easy to explain — "calculates Brier score for probability predictions"
def brier_score(probabilities: np.ndarray, outcomes: np.ndarray) -> float:
    return float(np.mean((probabilities - outcomes) ** 2))

# BAD: Hard to explain — "it processes... things... with conditions..."
def process_and_validate_then_transform_with_fallback(data, config, mode, flags):
    ...  # 80 lines that require 5 minutes to explain
```

### Code Review Checklist for PEP 20

During code review, verify:

- [ ] **Simplicity:** Functions have single, clear responsibilities (McCabe complexity ≤ 10)
- [ ] **Single purpose:** Each function does exactly one thing; max 50 lines, nesting ≤ 3
- [ ] **Explicitness:** No magic numbers, parameters have clear names, behavior is obvious
- [ ] **Readability:** Domain concepts use full words, not abbreviations
- [ ] **Flatness:** Nesting depth ≤ 3, early returns preferred
- [ ] **Consistency:** Follows existing project patterns (vectorization, type sharing, etc.)
- [ ] **Complexity vs complication:** Inherent complexity is managed behind simple interfaces
- [ ] **Sparseness:** One operation per line, no clever one-liners, named intermediates
- [ ] **Pragmatic exceptions:** Deviations from rules are documented and intentional
- [ ] **Error handling:** No silent exception swallowing; handlers log or re-raise
- [ ] **Explicit types:** No `Any`, no missing annotations, no implicit coercions
- [ ] **Explainability:** Every function describable in one sentence
- [ ] **No complexity-code overrides:** No `# noqa` for PLR0911/PLR0912/PLR0913/C901 without PO approval; each includes a rationale
- [ ] **Docstring detail:** Functions with 3+ operations have detailed description explaining *how*

---

### 6.1 Complexity Gates (Ruff Configuration)

**Configured in:** `pyproject.toml` → `[tool.ruff.lint.mccabe]`

| Metric | Limit | Enforced By |
|---|---|---|
| **McCabe Cyclomatic Complexity** | 10 | Ruff `C901` (pre-commit) |
| **Max Function Length** | 50 lines | Manual review |
| **Max Nesting Depth** | 3 | Manual review |
| **Max Arguments** | 5 | Ruff `PLR0913` (pre-commit) |

See `pyproject.toml` for exact configuration.

---

### 6.2 Pure Functions vs Side Effects

> **Pure functions are easier to test, faster to run, and easier to reason about.**

**Design guideline:** Keep business logic pure, push side effects to edges.

#### Pure Functions (Preferred)

**Definition:** Same input always produces same output, no side effects.

**Characteristics:**
- Deterministic (predictable)
- No external dependencies (no I/O, no database, no network, no time/randomness)
- No state mutation
- Easy to test (just input → output, no mocking)
- Perfect for property-based testing (Hypothesis)
- Fast (no I/O)

```python
# PURE: Always deterministic, easy to test
def calculate_win_probability(rating_diff: int, k_factor: float = 32.0) -> float:
    """Calculate win probability from rating difference."""
    return 1 / (1 + 10 ** (-rating_diff / 400))

# PURE: Data transformation, no side effects
def normalize_team_names(names: pd.Series) -> pd.Series:
    """Normalize team names to standard format (vectorized)."""
    return names.str.strip().str.title().str.replace("St.", "Saint")

# PURE: Mathematical calculation (vectorized)
def calculate_margins(home_scores: np.ndarray, away_scores: np.ndarray) -> np.ndarray:
    """Calculate point margins (vectorized, pure)."""
    return home_scores - away_scores
```

**Testing pure functions:**

```python
# Unit test: Simple input → output
def test_win_probability_equal_ratings():
    """Verify win probability is 50% for equal ratings."""
    assert calculate_win_probability(rating_diff=0) == 0.5

# Property test: Perfect for pure functions
@pytest.mark.property
@given(rating_diff=st.integers(-1000, 1000))
def test_win_probability_always_bounded(rating_diff):
    """Verify win probability always in [0, 1] (invariant)."""
    prob = calculate_win_probability(rating_diff)
    assert 0 <= prob <= 1
```

---

#### Side-Effect Functions (Push to Edges)

**Definition:** Functions that interact with the outside world or modify state.

**Characteristics:**
- Non-deterministic (may vary based on external state)
- External dependencies (files, database, network, time, randomness)
- Modifies state (mutates objects, writes files, updates database)
- Harder to test (requires mocks, stubs, fixtures)
- Requires integration tests

```python
# SIDE-EFFECT: Reads from file system
def load_games(path: Path) -> pd.DataFrame:
    """Load games from CSV file (I/O operation)."""
    return pd.read_csv(path)

# SIDE-EFFECT: Depends on current time (non-deterministic)
def is_game_started(game_start: datetime) -> bool:
    """Check if game has started."""
    return datetime.now() > game_start  # Changes over time

# SIDE-EFFECT: Mutates external state
def update_team_rating(team_id: int, new_rating: int) -> None:
    """Update team rating in database."""
    db.execute("UPDATE teams SET rating = ? WHERE id = ?", new_rating, team_id)
```

**Testing side-effect functions:**

```python
# Integration test: Requires fixtures
@pytest.mark.integration
def test_load_games_returns_valid_dataframe(temp_data_dir):
    """Verify games can be loaded from CSV."""
    # Setup: Create test file
    test_file = temp_data_dir / "games.csv"
    test_file.write_text("game_id,home_team,away_team\n1,Duke,UNC\n")

    # Execute
    games = load_games(test_file)

    # Assert
    assert len(games) == 1
    assert "game_id" in games.columns
```

---

#### Good Separation: Pure Core + Side-Effect Shell

**Pattern:** Keep calculations pure, orchestrate I/O at edges.

```python
# PURE: Core business logic (easy to test, vectorized)
def calculate_win_probabilities(
    home_ratings: np.ndarray,
    away_ratings: np.ndarray,
) -> np.ndarray:
    """Calculate win probabilities (pure, vectorized)."""
    rating_diff = home_ratings - away_ratings
    return 1 / (1 + 10 ** (-rating_diff / 400))

# SIDE-EFFECT: Orchestration at the edge
def simulate_tournament(games_path: Path, ratings_path: Path) -> pd.DataFrame:
    """Simulate tournament (orchestrates pure logic + I/O)."""
    # Side effects: Load data
    games = pd.read_csv(games_path)
    ratings = pd.read_csv(ratings_path, index_col="team")

    # Side effects: Data prep
    games = games.merge(ratings, left_on="home_team", right_index=True)
    games = games.merge(ratings, left_on="away_team", right_index=True,
                       suffixes=("_home", "_away"))

    # PURE: Core calculation (vectorized, easy to test separately)
    games["win_prob"] = calculate_win_probabilities(
        games["rating_home"].values,
        games["rating_away"].values,
    )

    return games
```

**Why this is better:**

1. **Pure function (`calculate_win_probabilities`):**
   - Fast unit tests (no I/O)
   - Property-based tests (invariants)
   - Reusable in different contexts
   - Vectorized (meets NFR1)

2. **Side-effect function (`simulate_tournament`):**
   - Thin orchestration layer
   - Easy to see where I/O happens
   - Core logic testable independently

---

#### Bad: Mixing Pure Logic with Side Effects

```python
# BAD: Pure logic buried inside side effects
def simulate_tournament_bad() -> pd.DataFrame:
    """Simulate tournament (mixed design - hard to test)."""
    games = pd.read_csv("data/games.csv")  # Side effect

    results = []
    for _, game in games.iterrows():  # Side effect + non-vectorized!
        # Side effects: Database calls
        home_rating = db.query("SELECT rating FROM teams WHERE id = ?", game.home_team)
        away_rating = db.query("SELECT rating FROM teams WHERE id = ?", game.away_team)

        # Pure logic buried inside (can't test without database!)
        rating_diff = home_rating - away_rating
        prob = 1 / (1 + 10 ** (-rating_diff / 400))

        results.append(prob)

    return pd.DataFrame(results)
```

**Problems:**
- Can't test calculation logic without database
- Can't use property-based tests
- Slow (I/O in loop)
- Violates vectorization requirement
- Hard to debug (mixed concerns)

---

#### Testing Strategy by Function Type

| Function Type | Test Type | Characteristics | Example |
|---------------|-----------|-----------------|---------|
| **Pure** | Unit test, Property-based | Fast, no mocking, deterministic | `calculate_win_probability()` |
| **Side-effect** | Integration test | Slower, requires fixtures/mocks | `load_games()`, `update_team_rating()` |
| **Mixed** | ❌ AVOID | Hard to test, refactor! | `simulate_tournament_bad()` |

---

#### Code Review Checklist for Pure vs Side-Effect

During code review, verify:

- [ ] **Pure logic separated:** Business calculations are pure functions
- [ ] **Side effects at edges:** I/O, database, network calls in orchestration layer
- [ ] **No mixing:** Pure functions don't contain I/O operations
- [ ] **Vectorization:** Pure functions use numpy/pandas operations (not loops)
- [ ] **Property tests:** Pure functions have property-based tests for invariants
- [ ] **Integration tests:** Side-effect functions have integration tests with fixtures

---

## 7. PR Checklist (Summary)

Every pull request must pass the following gates. The actual PR template is at
`.github/pull_request_template.md`. For
the philosophy behind this two-tier approach, see
[`docs/TESTING_STRATEGY.md`](TESTING_STRATEGY.md) (Section: Quality Gates and Execution Tiers).

| Gate | Tool | Timing |
|---|---|---|
| Lint pass | Ruff | Pre-commit (fast) |
| Type-check pass | Mypy (`--strict`) | Pre-commit (fast) |
| Test pass | Pytest | PR / CI |
| Docstring coverage | Manual review | PR review |
| Docstring detail | Manual review | PR review |
| No vectorization violations | Manual review | PR review |
| Conventional commit messages | Commitizen | Pre-commit |
| Function complexity | Manual review | PR review |
| No complexity-code overrides | Manual review | PR review |
| PEP 20 compliance | Manual review | PR review |
| SOLID principles | Manual review | PR review |
| Pure function design | Manual review | PR review |

### Review Criteria

- Code follows naming conventions (Section 2).
- Imports are ordered correctly (Section 3).
- New public APIs have docstrings (Section 1).
- Functions with 3+ operations include a detailed docstring description explaining
  *how* (not just *what*) — see Section 1.
- No `for` loops over DataFrames for calculations (Section 5).
- Type annotations are complete (Section 4).
- Each function does one thing; manually verify single responsibility, max 50 lines,
  nesting ≤ 3.
- No `# noqa` for PLR0911, PLR0912, PLR0913, or C901 without PO approval; each
  must include a rationale (Section 3, Lint Suppression Policy).
- PEP 20 design principles respected (Section 6).
- Pure functions used for business logic, side effects at edges (Section 6.2).
- SOLID principles applied for testability (Section 10).
- Data structures shared between Logic and UI use Pydantic models or TypedDicts.
- Dashboard code never reads files directly — it calls `ncaa_eval` functions.

---

## 8. File & Module Organization

### Project Layout

```
src/
  ncaa_eval/
    __init__.py          # Package root
    py.typed             # PEP 561 marker
    cli/                 # CLI entry points (main, train)
    ingest/              # Data source connectors
    transform/           # Feature engineering
    model/               # Model ABC and implementations (singular, not models/)
    evaluation/          # Metrics, CV, simulation
    utils/               # Shared helpers (logger.py)
tests/
  __init__.py
  conftest.py            # Shared fixtures
  unit/                  # Unit tests organized by domain
  integration/           # Integration tests
  fixtures/              # Test data fixtures
dashboard/               # Streamlit UI (imports ncaa_eval — no direct IO)
data/                    # Local data store (git-ignored)
```

### Rules

1. **One concept per module.** A module should do one thing. If it grows beyond
   ~300 lines, consider splitting.
2. **Mirror `src/` in `tests/`.** `src/ncaa_eval/model/elo.py` is tested by
   `tests/unit/test_elo.py` (or `tests/integration/test_elo_integration.py`).
3. **`__init__.py` re-exports.** Public symbols should be importable from the
   *submodule* level (e.g., `from ncaa_eval.model import Model, get_model`).
   The root `ncaa_eval` package does not re-export anything. Concrete model
   classes self-register via `@register_model` and are accessed through the
   registry (`get_model("elo")`) or direct import
   (`from ncaa_eval.model.elo import EloModel`).
4. **No circular imports.** If two modules need each other, extract shared types
   into a third module.
5. **Configuration lives in `pyproject.toml`.** Do not create separate config files
   for tools that support `pyproject.toml`.
6. **Line length is 110.** Not the default 88. Configured in
   `[tool.ruff] line-length = 110`.

---

## 9. Additional Architecture Rules

These rules come from the project architecture and apply across all code:

| Rule | Detail |
|---|---|
| **Type sharing** | All data structures shared between Logic and UI must use Pydantic models or TypedDicts. |
| **No direct IO in UI** | The Streamlit dashboard must call `ncaa_eval` library functions — never read files directly. |
| **Commit messages** | Use conventional commits format (`feat:`, `fix:`, `docs:`, etc.) enforced by Commitizen. |
| **Python version** | `>=3.12,<4.0`. Use modern syntax (`match`, `type` statement, `X \| None`). |

---

## 10. SOLID Principles for Testability

> **SOLID principles make code testable.** Violating SOLID = hard-to-test code.

These five object-oriented design principles improve maintainability and testability. While not automatically enforced, they're checked during code review.

### S - Single Responsibility Principle (SRP)

> "A class should have only one reason to change."

**What it means:** Each class/function does ONE thing well.

**Already enforced by:** PEP 20 complexity checks (Section 6)

```python
# GOOD: Single responsibility
class GameLoader:
    """Loads games from CSV files."""
    def load(self, path: Path) -> pd.DataFrame:
        return pd.read_csv(path)

class GameValidator:
    """Validates game data."""
    def validate(self, games: pd.DataFrame) -> None:
        if games["home_score"].min() < 0:
            raise ValueError("Scores cannot be negative")

# BAD: Multiple responsibilities
class GameManager:
    """Does everything - loading, validating, processing, saving."""
    def do_everything(self, path: Path) -> None:
        # Too many reasons to change!
        pass
```

**Testing impact:** Easy to test (one thing = one test class)

---

### O - Open/Closed Principle (OCP)

> "Open for extension, closed for modification."

**What it means:** Add new features without changing existing code.

```python
# GOOD: Open for extension via inheritance
class RatingModel(ABC):
    @abstractmethod
    def predict(self, game: Game) -> float:
        pass

class EloModel(RatingModel):
    def predict(self, game: Game) -> float:
        return self._calculate_elo(game)

class GlickoModel(RatingModel):  # New model, no changes to existing code
    def predict(self, game: Game) -> float:
        return self._calculate_glicko(game)

# BAD: Must modify existing code for each new model
def predict(game: Game, model_type: str) -> float:
    if model_type == "elo":
        return calculate_elo(game)
    elif model_type == "glicko":  # Must modify this function
        return calculate_glicko(game)
```

**Testing impact:** Easy to mock/stub new implementations

---

### L - Liskov Substitution Principle (LSP)

> "Subtypes must be substitutable for their base types."

**What it means:** If `Dog` inherits from `Animal`, you can use `Dog` anywhere you use `Animal` without breaking things.

**Already enforced by:** Property-based tests verify contracts (Section: Test Purpose Guide)

```python
# GOOD: Subclass honors parent contract
class RatingModel(ABC):
    @abstractmethod
    def predict(self, game: Game) -> float:
        """Return probability in [0, 1]."""
        pass

class EloModel(RatingModel):
    def predict(self, game: Game) -> float:
        # Always returns [0, 1] as promised
        return 1 / (1 + 10 ** ((opp_rating - rating) / 400))

# BAD: Subclass violates parent contract
class BrokenModel(RatingModel):
    def predict(self, game: Game) -> float:
        # Returns values > 1.0, breaking the contract!
        return rating_difference * 100
```

**Testing impact:** Property tests verify contracts hold across all implementations

---

### I - Interface Segregation Principle (ISP)

> "Many specific interfaces are better than one general-purpose interface."

**What it means:** Don't force classes to implement methods they don't need.

**Already enforced by:** MyPy strict mode with ABCs and Protocols (Section 4)

This project uses **ABCs as the primary pattern** for interface segregation (Model ABC,
Repository ABC, Connector ABC) and **Protocols as a complement** for structural typing
where duck typing is preferred (e.g., `ProbabilityProvider`, `ScoringRule` in the
simulation engine). Choose based on context:

- **ABCs:** When implementations must be explicitly registered and share behavior
  (e.g., all models inherit from `Model` and get common infrastructure).
- **Protocols:** When you need structural typing — any object with matching methods
  qualifies, without inheriting from anything (e.g., `ProbabilityProvider` can be
  satisfied by any class with a `predict_proba` method).

```python
# PRIMARY: ABCs for explicit interface contracts
class Model(ABC):
    """Base class for all prediction models."""
    @abstractmethod
    def predict(self, game: Game) -> float: ...

class EloModel(Model):
    def predict(self, game: Game) -> float: ...

# COMPLEMENT: Protocols for structural typing (duck typing)
class ProbabilityProvider(Protocol):
    def predict_proba(self, matchup: Matchup) -> float: ...

# Any class with predict_proba() satisfies this — no inheritance needed
class EloProvider:
    def predict_proba(self, matchup: Matchup) -> float: ...

# BAD: Fat interface forces unused methods
class DoEverything(ABC):
    @abstractmethod
    def predict(self, game: Game) -> float: ...
    @abstractmethod
    def fit(self, games: pd.DataFrame) -> None: ...
    @abstractmethod
    def cross_validate(self, games: pd.DataFrame) -> dict: ...
    # Simple models forced to implement methods they don't need!
```

**Testing impact:** Less mocking needed (small interfaces)

---

### D - Dependency Inversion Principle (DIP)

> "Depend on abstractions, not concretions."

**What it means:** High-level code shouldn't depend on low-level details.

**Already enforced by:** Architecture rule "Type sharing: use Pydantic models or TypedDicts" (Section 9)

```python
# GOOD: Depends on abstraction (Protocol)
class Simulator:
    def __init__(self, model: Predictable):  # Any model works
        self.model = model

    def simulate_tournament(self, games: list[Game]) -> pd.DataFrame:
        for game in games:
            pred = self.model.predict(game)  # Works with any Predictable
            ...

# BAD: Depends on concrete implementation
class Simulator:
    def __init__(self, elo_model: EloModel):  # Locked to EloModel
        self.model = elo_model

    def simulate_tournament(self, games: list[Game]) -> pd.DataFrame:
        # Can only use EloModel, not flexible
        pass
```

**Testing impact:** Easy to inject test doubles

---

### SOLID Review Checklist

During code review, verify:

- [ ] **SRP:** Classes/functions have single, clear responsibility (covered by PEP 20 complexity)
- [ ] **OCP:** New features added via extension (inheritance, composition), not modification
- [ ] **LSP:** Subtypes honor parent contracts (property tests verify this)
- [ ] **ISP:** Interfaces are small and focused (ABCs primary, Protocols for structural typing)
- [ ] **DIP:** Depends on abstractions (Protocols, TypedDicts), not concrete classes

### Summary: What's Already Covered

| SOLID Principle | Already Enforced By |
|-----------------|---------------------|
| **SRP** | PEP 20: "Simple is better than complex" (complexity ≤ 10) |
| **OCP** | Manual review (can't automate) |
| **LSP** | Property tests: "probabilities in [0, 1]" invariants |
| **ISP** | MyPy strict mode: ABCs primary (Model, Repository, Connector), Protocols complement (ProbabilityProvider, ScoringRule) |
| **DIP** | Architecture: "Type sharing via Pydantic/TypedDicts" |

**Result:** SOLID compliance is mostly automated or already covered by existing standards. Code review adds a final check for OCP and overall SOLID adherence.

---

## 11. Quality Gate Architecture

This project has two complementary quality gate systems. **Pre-commit is canonical;
nox is the local convenience runner.**

### Pre-commit (Canonical)

Pre-commit hooks (`.pre-commit-config.yaml`) run automatically on every commit and
are enforced in CI. They are the single source of truth for code quality enforcement.

| Hook | Stage | What It Does |
|---|---|---|
| `ruff-lint` | commit | Ruff linting with `--fix` (auto-corrects violations) |
| `ruff-format` | commit | Ruff formatting (auto-formats code) |
| `mypy-strict` | commit | Mypy strict type checking (`src/`, `tests/`, `noxfile.py`, `sync.py`) |
| `pytest-smoke` | commit | Pytest smoke tests (`-m smoke`) |
| `commitizen` | commit-msg | Conventional commit message enforcement |
| `actionlint` | commit | GitHub Actions workflow linting |

Pre-commit also includes housekeeping hooks: `end-of-file-fixer`,
`trailing-whitespace`, `check-yaml`, `check-toml`, `detect-private-key`,
`no-commit-to-branch`, `check-merge-conflict`, `debug-statements`.

### Nox (Local Convenience Runner)

Nox (`noxfile.py`) provides a single command (`nox`) that runs the full quality
pipeline locally. It is useful for manual full-suite runs before pushing but is NOT
used in CI.

| Session | What It Does |
|---|---|
| `lint` | `ruff check . --fix` + `ruff format --check .` |
| `typecheck` | `mypy --strict` on `src/ncaa_eval`, `tests`, `noxfile.py`, `sync.py` |
| `tests` | Full `pytest` suite |
| `docs` | `sphinx-apidoc` + `sphinx-build` |

### Known Divergences (Intentional)

| Area | Pre-commit | Nox | Rationale |
|---|---|---|---|
| **ruff format** | Auto-fix (rewrites files) | Check-only (`--check`) | Pre-commit should fix on commit; nox should only report for manual inspection |
| **CI usage** | Runs in CI (`.github/workflows/`) | Not used in CI | Pre-commit + standalone pytest covers lint + typecheck + full tests in CI |

### CI Pipeline

Two CI workflows run in GitHub Actions:

**`.github/workflows/python-check.yaml`** (runs on every pull request):
1. **Pre-commit** — `pre-commit run --all-files` (all hooks: ruff, mypy, actionlint, etc.)
2. **Full test suite** — `pytest --cov=src/ncaa_eval` with coverage reporting

**`.github/workflows/main-updated.yaml`** (runs on every push to `main`):
1. **Commitizen** — version bump and changelog (skipped for `bump:` commits)
2. **Sphinx docs** — `sphinx-apidoc` + `sphinx-build` → GitHub Pages deployment

Quality gates are enforced twice: by the developer's local pre-commit hooks on every
commit/push, and again by `python-check.yaml` on every pull request before merge.

---

## References

- `pyproject.toml` — Single source of truth for all tool configurations
- `specs/05-architecture-fullstack.md` Section 12 — Coding standards
- `specs/05-architecture-fullstack.md` Section 10 — Development workflow
- `specs/03-prd.md` Section 4 — Technical assumptions & constraints