# NCAA Eval Style Guide This guide documents the coding standards and conventions for the `ncaa_eval` project. All rules here reflect decisions already configured in `pyproject.toml`. Tooling enforcement is handled by separate stories (1.4-1.6); this document captures the **what** and **why**. --- ## 1. Docstring Convention (Google Style) **Choice:** Google-style docstrings. **Configured in:** `[tool.ruff.lint.pydocstyle] convention = "google"` ### Rationale - Google style reads naturally in source code and renders well in Sphinx via `napoleon`. - With PEP 484 type annotations present everywhere (mypy --strict), repeating types inside docstrings is redundant. Google style keeps the prose concise. ### Rules 1. **Do not duplicate types** in `Args:` / `Returns:` sections. Type information lives in annotations only. 2. Include `Args:`, `Returns:` (unless `None`), and `Raises:` (if applicable). 3. Add an `Examples:` section for complex public functions. 4. Docstrings are **encouraged but not enforced** on every entity. When pydocstyle (`D`) rules are enabled (Story 1.4), `D1` (missing docstring) and `D415` (first-line punctuation) will be suppressed via the Ruff `ignore` list. **Note:** `D` rules are *not currently active* — the `convention = "google"` setting and ignore entries in `pyproject.toml` are preparatory config that takes effect once `D` is added to `extend-select` in Story 1.4. At minimum, every public module and every public class should have a docstring. 5. **Detailed description required** when a function performs 3 or more operations. The description paragraph (after the summary line) must explain *how* the function implements its purpose — not just restate the summary. This is enforced during code review (PR checklist). ### Example ```python from __future__ import annotations def rolling_efficiency( scores: pd.Series, window: int, min_periods: int = 1, ) -> pd.Series: """Compute rolling offensive efficiency over a sliding window. Applies an exponentially-weighted mean to per-game point totals, normalised by possessions. Args: scores: Raw per-game point totals. window: Number of games in the rolling window. min_periods: Minimum observations required for a valid result. Returns: Rolling efficiency values aligned to the input index. Raises: ValueError: If *window* is less than 1. Examples: >>> rolling_efficiency(pd.Series([70, 80, 90]), window=2) 0 NaN 1 75.0 2 85.0 dtype: float64 """ ``` --- ## 2. Naming Conventions | Entity | Convention | Example | |---|---|---| | **Modules / packages** | `snake_case` | `feature_pipeline.py`, `ncaa_eval/` | | **Classes** | `PascalCase` | `EloModel`, `TeamStats` | | **Functions / methods** | `snake_case` | `compute_margin`, `get_team_id` | | **Variables** | `snake_case` | `home_score`, `season_df` | | **Constants** | `UPPER_SNAKE_CASE` | `DEFAULT_K_FACTOR`, `MIN_GAMES` | | **Type aliases** | `PascalCase` (use `type` statement) | `type TeamId = int` | | **Private members** | Single leading underscore | `_cache`, `_validate_input()` | | **Test files** | `test_.py` | `test_elo_model.py` | | **Test functions** | `test_` | `test_margin_positive_when_home_wins` | ### Additional Rules - **No abbreviations** in public APIs unless universally understood (`id`, `df`, `url`). - **Boolean names** should read as predicates: `is_valid`, `has_played`, `should_normalize`. - **DataFrame variables** should indicate their grain: `game_df`, `season_stats_df`, `team_features_df`. --- ## 3. Import Ordering **Configured in:** `[tool.ruff.lint.isort]` Imports are ordered in three groups separated by blank lines: 1. **Standard library** (`os`, `pathlib`, `typing`, ...) 2. **Third-party** (`pandas`, `numpy`, `xgboost`, ...) 3. **Local / first-party** (`ncaa_eval`, `tests`) ### Mandatory Rules | Rule | Detail | Config Reference | |---|---|---| | Future annotations | Every file must start with `from __future__ import annotations` | `required-imports` | | Combine as-imports | `from foo import bar, baz as B` on one line | `combine-as-imports = true` | | First-party detection | `tests` is known first-party | `known-first-party = ["tests"]` | ### Example ```python from __future__ import annotations import logging from pathlib import Path import numpy as np import pandas as pd from sklearn.model_selection import TimeSeriesSplit from ncaa_eval.model import Model from ncaa_eval.transform import compute_rolling_stats ``` ### Active Ruff Rules Beyond isort, `pyproject.toml` extends the default Ruff rules (`E`, `F`) with: | Rule | Category | What It Enforces | |---|---|---| | `I` | isort | Import ordering (see above) | | `UP` | pyupgrade | Modern Python syntax — e.g., `list[int]` not `List[int]`, `X \| None` not `Optional[X]` | | `PT` | flake8-pytest-style | Pytest best practices — e.g., `@pytest.fixture` conventions, parametrize style | | `TID25` | tidy-imports | Import hygiene — bans relative imports from parent packages | | `C90` | mccabe | McCabe cyclomatic complexity ≤ 10 (see Section 6.1) | | `PLR0911` | pylint-refactor | Max 6 return statements per function | | `PLR0912` | pylint-refactor | Max 12 branches per function | | `PLR0913` | pylint-refactor | Max 5 function arguments | ### Suppressed Rules | Rule | Why Suppressed | |---|---| | `E501` | Line length is enforced by the Ruff **formatter** (110 chars), not the linter. Suppressing `E501` avoids double-reporting. | | `D1` | Missing-docstring warnings. Not yet active (see Section 1 note). | | `D415` | First-line punctuation. Not yet active (see Section 1 note). | | `PLR2004` | Magic value comparison — too aggressive for data science code with domain-specific constants (e.g., tournament field size `64`, coin-flip probability `0.5`). See PEP 20 "Practicality Beats Purity" in Section 6. | ### Lint Suppression Policy When `# noqa` or `# type: ignore` is unavoidable, follow these rules: 1. **Always include the specific error code.** Never use bare `# noqa` or bare `# type: ignore` — always specify which rule is being suppressed. ```python # GOOD: Specific suppression with rationale import joblib # type: ignore[import-untyped] — no type stubs available # BAD: Bare suppression — which rule? why? import joblib # type: ignore ``` 2. **Prefer refactoring over suppression.** The escalation path: - First: Refactor to eliminate the violation (extract function, split class, etc.) - Second: If refactoring is impractical, suppress with specific code and comment - Never: Suppress because "it's faster than fixing" 3. **Complexity-class codes require PO approval.** Inline `# noqa` for the following codes is allowed only with explicit PO approval: - `PLR0911` — Too many return statements (max 6) - `PLR0912` — Too many branches (max 12) - `PLR0913` — Too many arguments (max 5) - `C901` — McCabe cyclomatic complexity (max 10) When a suppression is approved, the inline comment must include a rationale: ```python # Approved — rationale def update_game( # noqa: PLR0913 — game data has inherent dimensionality # Pending refactoring — tracked to a story def run_training( # noqa: PLR0913, C901, PLR0912 — REFACTOR Story 8.1 ``` 4. **Acceptable suppression scenarios:** - `# type: ignore[import-untyped]` — Third-party libraries without type stubs - `# type: ignore[no-untyped-def]` — Callback signatures imposed by frameworks (e.g., Streamlit event handlers) - `# noqa: BLE001` — Broad exception handling with documented rationale - `# noqa: PLR0913` (etc.) — With PO approval and rationale (see rule 3) 5. **Unacceptable suppression scenarios:** - Any `# noqa` for PLR0911, PLR0912, PLR0913, or C901 **without** PO approval - `# type: ignore` on return types — Fix the return type annotation instead - Any suppression without the specific error code --- ## 4. Type Annotation Standards **Configured in:** `[tool.mypy] strict = true` ### What `--strict` Means for Developers Every function signature, variable, and return type must be annotated. The strict flag enables all of the following (as of mypy 1.x — the exact set may change with future versions; run `mypy --help` to see the current list): - `--disallow-untyped-defs` — every function needs annotations - `--disallow-any-generics` — no bare `list`, `dict`; use `list[int]`, `dict[str, float]` - `--warn-return-any` — functions must not silently return `Any` - `--no-implicit-reexport` — explicit `__all__` required for re-exports - `--strict-bytes` — distinguishes `bytes` from `str` ### Practical Guidelines 1. Use `from __future__ import annotations` (already enforced) so all annotations are strings and forward references work. 2. Use modern syntax: `list[int]` not `List[int]`, `X | None` not `Optional[X]`. 3. Use the `type` statement for aliases: `type TeamId = int`. 4. For third-party libraries with incomplete stubs (numpy, pandas, xgboost), the config uses `follow_imports = "silent"` to suppress errors from untyped dependencies. Add `# type: ignore[]` only when truly necessary and always include the specific error code. 5. The `py.typed` marker at `src/ncaa_eval/py.typed` signals PEP 561 compliance. ### Pydantic Integration **Configured in:** `[tool.mypy] plugins` and `[tool.pydantic-mypy]` Pydantic is the project's primary data modeling library (shared types between Logic and UI layers). The mypy plugin provides enhanced type checking for Pydantic models: ```toml [tool.mypy] plugins = ["pydantic.mypy"] [tool.pydantic-mypy] init_typed = true # __init__ params get field types init_forbid_extra = true # error on unknown fields in __init__ warn_required_dynamic_aliases = true # warn if alias without default ``` **What this means for developers:** - `init_typed = true` — mypy understands `Model(field=value)` constructor calls and validates argument types against field definitions. - `init_forbid_extra = true` — passing an unknown field name to a model constructor is a type error, not a silent runtime discard. - `warn_required_dynamic_aliases = true` — alerts when a field with `alias=` has no default value, which can cause confusing `ValidationError` messages. --- ## 5. Vectorization First > **Reject PRs that use `for` loops over Pandas DataFrames for metric > calculations.** > > — Architecture Section 12 This is a **hard rule**. Scientific computation in this project must use vectorized operations. ### Why - Vectorized pandas/numpy operations are 10-100x faster than Python loops. - They are easier to reason about for correctness. - They compose cleanly with `.pipe()` chains. ### Correct Pattern ```python # Vectorized — computes margin for every game in one shot game_df["margin"] = game_df["home_score"] - game_df["away_score"] # Vectorized rolling mean game_df["rolling_ppg"] = game_df.groupby("team_id")["points"].transform( lambda s: s.rolling(window=5, min_periods=1).mean() ) ``` ### Incorrect Pattern ```python # WRONG — for loop over DataFrame rows margins = [] for _, row in game_df.iterrows(): margins.append(row["home_score"] - row["away_score"]) game_df["margin"] = margins ``` ### Exceptions Explicit `for` loops are acceptable only when: 1. Operating on a **small, fixed collection** (e.g., iterating over 5 model configs, not 10,000 game rows). 2. Performing **graph traversal** (e.g., NetworkX operations that have no vectorized equivalent). 3. The loop body involves **side effects** that cannot be vectorized (e.g., writing files per team). 4. **Ingest-layer one-time-per-sync operations** — `src/ncaa_eval/ingest/connectors/kaggle.py` uses `iterrows()` in `load_day_zeros`, `fetch_teams`, and `_parse_games_csv` for per-row data transformation: constructing Pydantic models (`Game`, `Team`) in `fetch_teams` and `_parse_games_csv`, and parsing dates with per-row error handling in `load_day_zeros`. These run exactly once per `sync` operation and are not in any hot path. Pandera schema validation (Story 9.14) guards the DataFrame before iteration begins. *(PO Decision C — Audit item 2.4, 2026-03-11)* In these cases, add a brief comment explaining why the loop is necessary. --- ## 6. Design Philosophy (PEP 20 - The Zen of Python) > "Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex." > > — PEP 20 While PEP 20 can't be fully automated, we enforce its core principles through code review and tooling where possible. ### Enforced Principles #### Simple is Better Than Complex - **Tooling:** Ruff complexity checks (see Section 6.1) - **Review:** Max cyclomatic complexity = 10, max function length = 50 lines - **Guideline:** If a function is hard to name, it's doing too much — split it ```python # GOOD: Simple, single-purpose function def calculate_margin(home_score: int, away_score: int) -> int: """Calculate point margin (positive = home win).""" return home_score - away_score # BAD: Too complex, doing multiple things def process_game(game_data: dict) -> dict: """Process game... but what does this actually do?""" # 80 lines of nested logic with unclear responsibilities ... ``` #### Explicit is Better Than Implicit - **Tooling:** Mypy strict mode enforces explicit types - **Review:** No magic numbers, no implicit state changes, clear function names - **Guideline:** Code should read like prose — the reader shouldn't have to guess ```python # GOOD: Explicit parameters and behavior def calculate_elo_change( rating: int, opponent_rating: int, won: bool, k_factor: int = 32, ) -> int: """Calculate Elo rating change after a game.""" expected = 1 / (1 + 10 ** ((opponent_rating - rating) / 400)) actual = 1 if won else 0 return int(k_factor * (actual - expected)) # BAD: Implicit behavior, magic numbers def adjust_rating(r: int, o: int, w: bool) -> int: return int(32 * (w - 1 / (1 + 10 ** ((o - r) / 400)))) ``` #### Readability Counts - **Tooling:** Ruff formatting (110 char lines), naming conventions - **Review:** Variable names reflect domain concepts, not abbreviations - **Guideline:** Write for humans first, computers second ```python # GOOD: Clear domain language team_offensive_efficiency = total_points / total_possessions # BAD: Abbreviations require mental translation off_eff = pts / poss ``` #### Flat is Better Than Nested - **Tooling:** Ruff detects excessive nesting - **Review:** Max nesting depth = 3 - **Guideline:** Use early returns to reduce nesting ```python # GOOD: Flat structure with early returns def validate_game(game: Game) -> None: """Validate game data.""" if game.home_score < 0: raise ValueError("Home score cannot be negative") if game.away_score < 0: raise ValueError("Away score cannot be negative") if game.date > datetime.now(): raise ValueError("Game cannot be in the future") # BAD: Nested structure def validate_game(game: Game) -> None: if game.home_score >= 0: if game.away_score >= 0: if game.date <= datetime.now(): return else: raise ValueError("Game cannot be in the future") else: raise ValueError("Away score cannot be negative") else: raise ValueError("Home score cannot be negative") ``` #### There Should Be One Obvious Way to Do It - **Review:** Follow established project patterns (e.g., vectorization for calculations) - **Guideline:** Check existing code before inventing new approaches ```python # GOOD: Follows project pattern (vectorized operations) game_df["margin"] = game_df["home_score"] - game_df["away_score"] # BAD: Invents custom approach (loops) for idx in range(len(game_df)): game_df.loc[idx, "margin"] = game_df.loc[idx, "home_score"] - game_df.loc[idx, "away_score"] ``` #### Complex is Better Than Complicated - **Review:** Accept genuine complexity; reject unnecessary complication - **Guideline:** Some systems are inherently complex (simulation engines, Elo rating chains, cross-validation orchestrators). The goal is *managed complexity* — complex internals behind simple interfaces — not *complication* from tangled logic. ```python # GOOD: Complex but not complicated — clear interface, complex internals class MonteCarloSimulator: """Simulate tournament outcomes via Monte Carlo sampling. The algorithm is inherently complex (bracket traversal, probability propagation, seeded randomness), but the interface is simple. """ def simulate(self, bracket: Bracket, n_simulations: int = 10_000) -> SimulationResult: ... # BAD: Complicated — unnecessary indirection, unclear purpose def run_sim(data, opts, flags, ctx, mode="default", **kwargs): if mode == "default": ... # 200 lines of nested conditionals ``` #### Sparse is Better Than Dense - **Tooling:** 110-character line length limit, Ruff formatting - **Review:** One operation per line, generous whitespace, no clever one-liners - **Guideline:** When a line does too much, split it. When a function packs too many operations together, extract named intermediates. ```python # GOOD: Sparse — each step is clear expected_win_rate = 1 / (1 + 10 ** ((opponent_rating - team_rating) / 400)) actual_outcome = 1.0 if won else 0.0 rating_change = k_factor * (actual_outcome - expected_win_rate) # BAD: Dense — everything crammed together return int(32*(int(w)-1/(1+10**((o-r)/400)))) ``` #### Special Cases Aren't Special Enough / Practicality Beats Purity - **Review:** Allow pragmatic exceptions when properly documented - **Guideline:** The vectorization-first rule (Section 5) has legitimate exceptions: graph traversal, small fixed collections, side-effect loops. Data science code has domain-specific constants (`0.5` for coin-flip probability, `64` for tournament teams) that PLR2004 would flag — we suppress PLR2004 because practicality beats purity for these cases. The key: every exception must be *documented and intentional*, never silent. ```python # GOOD: Pragmatic exception with justification # PLR2004 suppressed project-wide — domain constants like tournament size # are clearer inline than as named constants def build_bracket(teams: list[Team]) -> Bracket: if len(teams) != 64: # NCAA tournament field size raise ValueError(f"Expected 64 teams, got {len(teams)}") ... # BAD: Bare magic number with no context def process(data): if len(data) > 42: # What is 42? data = data[:42] ``` #### Errors Should Never Pass Silently / Unless Explicitly Silenced - **Review:** Every exception handler must log or re-raise; no bare `except: pass` - **Guideline:** This is the project's most-violated PEP 20 principle (see Pattern D from codebase audit). The convention: 1. **Default:** Log at WARNING level with structured context, then re-raise or return a sentinel that callers check. 2. **Explicitly silenced:** Log at DEBUG level with a code comment explaining *why* silence is acceptable. Use `# noqa` or `# type: ignore` with specific codes. 3. **Never:** Bare `except Exception: pass` or `except Exception` that substitutes a value without logging. ```python # GOOD: Error surfaced with context try: response = fetch_team_schedule(team_id) except RequestError as exc: logger.warning("Failed to fetch schedule for team %d: %s", team_id, exc) raise # GOOD: Explicitly silenced with rationale try: optional_enrichment = fetch_extra_stats(team_id) except RequestError: # Enrichment is non-critical; base stats are sufficient for predictions. logger.debug("Optional enrichment unavailable for team %d", team_id) optional_enrichment = None # BAD: Silent swallowing — Pattern D violation try: data = fetch_team_schedule(team_id) except Exception: data = pd.DataFrame() # Caller has no idea data is missing ``` #### Refuse the Temptation to Guess - **Tooling:** `mypy --strict` — forces explicit types everywhere - **Review:** No implicit type coercions, no unvalidated assumptions - **Guideline:** This aphorism is the philosophical foundation for `mypy --strict`. When types are explicit, the compiler catches wrong assumptions before runtime. When types are `Any` or missing, the code *guesses* — and guesses wrong at the worst possible time. ```python # GOOD: Explicit types — no guessing def get_team_rating(team_id: int, ratings: dict[int, float]) -> float | None: return ratings.get(team_id) # BAD: Implicit types — code guesses about structure def get_team_rating(team_id, ratings): return ratings[team_id] # KeyError? ratings is a list? Who knows. ``` #### If the Implementation Is Hard to Explain, It's a Bad Idea / If It's Easy to Explain, It May Be a Good Idea - **Tooling:** McCabe complexity ≤ 10, max 50 lines per function - **Review:** If you can't describe a function in one sentence, it needs splitting - **Guideline:** This reinforces the complexity limits. If a code reviewer asks "what does this do?" and the answer requires a paragraph, the function is doing too much. The `run_training()` function in `cli/train.py` (70+ statements, 3 `noqa` suppressions) is the canonical example of code that is hard to explain. ```python # GOOD: Easy to explain — "calculates Brier score for probability predictions" def brier_score(probabilities: np.ndarray, outcomes: np.ndarray) -> float: return float(np.mean((probabilities - outcomes) ** 2)) # BAD: Hard to explain — "it processes... things... with conditions..." def process_and_validate_then_transform_with_fallback(data, config, mode, flags): ... # 80 lines that require 5 minutes to explain ``` ### Code Review Checklist for PEP 20 During code review, verify: - [ ] **Simplicity:** Functions have single, clear responsibilities (McCabe complexity ≤ 10) - [ ] **Single purpose:** Each function does exactly one thing; max 50 lines, nesting ≤ 3 - [ ] **Explicitness:** No magic numbers, parameters have clear names, behavior is obvious - [ ] **Readability:** Domain concepts use full words, not abbreviations - [ ] **Flatness:** Nesting depth ≤ 3, early returns preferred - [ ] **Consistency:** Follows existing project patterns (vectorization, type sharing, etc.) - [ ] **Complexity vs complication:** Inherent complexity is managed behind simple interfaces - [ ] **Sparseness:** One operation per line, no clever one-liners, named intermediates - [ ] **Pragmatic exceptions:** Deviations from rules are documented and intentional - [ ] **Error handling:** No silent exception swallowing; handlers log or re-raise - [ ] **Explicit types:** No `Any`, no missing annotations, no implicit coercions - [ ] **Explainability:** Every function describable in one sentence - [ ] **No complexity-code overrides:** No `# noqa` for PLR0911/PLR0912/PLR0913/C901 without PO approval; each includes a rationale - [ ] **Docstring detail:** Functions with 3+ operations have detailed description explaining *how* --- ### 6.1 Complexity Gates (Ruff Configuration) **Configured in:** `pyproject.toml` → `[tool.ruff.lint.mccabe]` | Metric | Limit | Enforced By | |---|---|---| | **McCabe Cyclomatic Complexity** | 10 | Ruff `C901` (pre-commit) | | **Max Function Length** | 50 lines | Manual review | | **Max Nesting Depth** | 3 | Manual review | | **Max Arguments** | 5 | Ruff `PLR0913` (pre-commit) | See `pyproject.toml` for exact configuration. --- ### 6.2 Pure Functions vs Side Effects > **Pure functions are easier to test, faster to run, and easier to reason about.** **Design guideline:** Keep business logic pure, push side effects to edges. #### Pure Functions (Preferred) **Definition:** Same input always produces same output, no side effects. **Characteristics:** - Deterministic (predictable) - No external dependencies (no I/O, no database, no network, no time/randomness) - No state mutation - Easy to test (just input → output, no mocking) - Perfect for property-based testing (Hypothesis) - Fast (no I/O) ```python # PURE: Always deterministic, easy to test def calculate_win_probability(rating_diff: int, k_factor: float = 32.0) -> float: """Calculate win probability from rating difference.""" return 1 / (1 + 10 ** (-rating_diff / 400)) # PURE: Data transformation, no side effects def normalize_team_names(names: pd.Series) -> pd.Series: """Normalize team names to standard format (vectorized).""" return names.str.strip().str.title().str.replace("St.", "Saint") # PURE: Mathematical calculation (vectorized) def calculate_margins(home_scores: np.ndarray, away_scores: np.ndarray) -> np.ndarray: """Calculate point margins (vectorized, pure).""" return home_scores - away_scores ``` **Testing pure functions:** ```python # Unit test: Simple input → output def test_win_probability_equal_ratings(): """Verify win probability is 50% for equal ratings.""" assert calculate_win_probability(rating_diff=0) == 0.5 # Property test: Perfect for pure functions @pytest.mark.property @given(rating_diff=st.integers(-1000, 1000)) def test_win_probability_always_bounded(rating_diff): """Verify win probability always in [0, 1] (invariant).""" prob = calculate_win_probability(rating_diff) assert 0 <= prob <= 1 ``` --- #### Side-Effect Functions (Push to Edges) **Definition:** Functions that interact with the outside world or modify state. **Characteristics:** - Non-deterministic (may vary based on external state) - External dependencies (files, database, network, time, randomness) - Modifies state (mutates objects, writes files, updates database) - Harder to test (requires mocks, stubs, fixtures) - Requires integration tests ```python # SIDE-EFFECT: Reads from file system def load_games(path: Path) -> pd.DataFrame: """Load games from CSV file (I/O operation).""" return pd.read_csv(path) # SIDE-EFFECT: Depends on current time (non-deterministic) def is_game_started(game_start: datetime) -> bool: """Check if game has started.""" return datetime.now() > game_start # Changes over time # SIDE-EFFECT: Mutates external state def update_team_rating(team_id: int, new_rating: int) -> None: """Update team rating in database.""" db.execute("UPDATE teams SET rating = ? WHERE id = ?", new_rating, team_id) ``` **Testing side-effect functions:** ```python # Integration test: Requires fixtures @pytest.mark.integration def test_load_games_returns_valid_dataframe(temp_data_dir): """Verify games can be loaded from CSV.""" # Setup: Create test file test_file = temp_data_dir / "games.csv" test_file.write_text("game_id,home_team,away_team\n1,Duke,UNC\n") # Execute games = load_games(test_file) # Assert assert len(games) == 1 assert "game_id" in games.columns ``` --- #### Good Separation: Pure Core + Side-Effect Shell **Pattern:** Keep calculations pure, orchestrate I/O at edges. ```python # PURE: Core business logic (easy to test, vectorized) def calculate_win_probabilities( home_ratings: np.ndarray, away_ratings: np.ndarray, ) -> np.ndarray: """Calculate win probabilities (pure, vectorized).""" rating_diff = home_ratings - away_ratings return 1 / (1 + 10 ** (-rating_diff / 400)) # SIDE-EFFECT: Orchestration at the edge def simulate_tournament(games_path: Path, ratings_path: Path) -> pd.DataFrame: """Simulate tournament (orchestrates pure logic + I/O).""" # Side effects: Load data games = pd.read_csv(games_path) ratings = pd.read_csv(ratings_path, index_col="team") # Side effects: Data prep games = games.merge(ratings, left_on="home_team", right_index=True) games = games.merge(ratings, left_on="away_team", right_index=True, suffixes=("_home", "_away")) # PURE: Core calculation (vectorized, easy to test separately) games["win_prob"] = calculate_win_probabilities( games["rating_home"].values, games["rating_away"].values, ) return games ``` **Why this is better:** 1. **Pure function (`calculate_win_probabilities`):** - Fast unit tests (no I/O) - Property-based tests (invariants) - Reusable in different contexts - Vectorized (meets NFR1) 2. **Side-effect function (`simulate_tournament`):** - Thin orchestration layer - Easy to see where I/O happens - Core logic testable independently --- #### Bad: Mixing Pure Logic with Side Effects ```python # BAD: Pure logic buried inside side effects def simulate_tournament_bad() -> pd.DataFrame: """Simulate tournament (mixed design - hard to test).""" games = pd.read_csv("data/games.csv") # Side effect results = [] for _, game in games.iterrows(): # Side effect + non-vectorized! # Side effects: Database calls home_rating = db.query("SELECT rating FROM teams WHERE id = ?", game.home_team) away_rating = db.query("SELECT rating FROM teams WHERE id = ?", game.away_team) # Pure logic buried inside (can't test without database!) rating_diff = home_rating - away_rating prob = 1 / (1 + 10 ** (-rating_diff / 400)) results.append(prob) return pd.DataFrame(results) ``` **Problems:** - Can't test calculation logic without database - Can't use property-based tests - Slow (I/O in loop) - Violates vectorization requirement - Hard to debug (mixed concerns) --- #### Testing Strategy by Function Type | Function Type | Test Type | Characteristics | Example | |---------------|-----------|-----------------|---------| | **Pure** | Unit test, Property-based | Fast, no mocking, deterministic | `calculate_win_probability()` | | **Side-effect** | Integration test | Slower, requires fixtures/mocks | `load_games()`, `update_team_rating()` | | **Mixed** | ❌ AVOID | Hard to test, refactor! | `simulate_tournament_bad()` | --- #### Code Review Checklist for Pure vs Side-Effect During code review, verify: - [ ] **Pure logic separated:** Business calculations are pure functions - [ ] **Side effects at edges:** I/O, database, network calls in orchestration layer - [ ] **No mixing:** Pure functions don't contain I/O operations - [ ] **Vectorization:** Pure functions use numpy/pandas operations (not loops) - [ ] **Property tests:** Pure functions have property-based tests for invariants - [ ] **Integration tests:** Side-effect functions have integration tests with fixtures --- ## 7. PR Checklist (Summary) Every pull request must pass the following gates. The actual PR template is at `.github/pull_request_template.md`. For the philosophy behind this two-tier approach, see [`docs/TESTING_STRATEGY.md`](TESTING_STRATEGY.md) (Section: Quality Gates and Execution Tiers). | Gate | Tool | Timing | |---|---|---| | Lint pass | Ruff | Pre-commit (fast) | | Type-check pass | Mypy (`--strict`) | Pre-commit (fast) | | Test pass | Pytest | PR / CI | | Docstring coverage | Manual review | PR review | | Docstring detail | Manual review | PR review | | No vectorization violations | Manual review | PR review | | Conventional commit messages | Commitizen | Pre-commit | | Function complexity | Manual review | PR review | | No complexity-code overrides | Manual review | PR review | | PEP 20 compliance | Manual review | PR review | | SOLID principles | Manual review | PR review | | Pure function design | Manual review | PR review | ### Review Criteria - Code follows naming conventions (Section 2). - Imports are ordered correctly (Section 3). - New public APIs have docstrings (Section 1). - Functions with 3+ operations include a detailed docstring description explaining *how* (not just *what*) — see Section 1. - No `for` loops over DataFrames for calculations (Section 5). - Type annotations are complete (Section 4). - Each function does one thing; manually verify single responsibility, max 50 lines, nesting ≤ 3. - No `# noqa` for PLR0911, PLR0912, PLR0913, or C901 without PO approval; each must include a rationale (Section 3, Lint Suppression Policy). - PEP 20 design principles respected (Section 6). - Pure functions used for business logic, side effects at edges (Section 6.2). - SOLID principles applied for testability (Section 10). - Data structures shared between Logic and UI use Pydantic models or TypedDicts. - Dashboard code never reads files directly — it calls `ncaa_eval` functions. --- ## 8. File & Module Organization ### Project Layout ``` src/ ncaa_eval/ __init__.py # Package root py.typed # PEP 561 marker cli/ # CLI entry points (main, train) ingest/ # Data source connectors transform/ # Feature engineering model/ # Model ABC and implementations (singular, not models/) evaluation/ # Metrics, CV, simulation utils/ # Shared helpers (logger.py) tests/ __init__.py conftest.py # Shared fixtures unit/ # Unit tests organized by domain integration/ # Integration tests fixtures/ # Test data fixtures dashboard/ # Streamlit UI (imports ncaa_eval — no direct IO) data/ # Local data store (git-ignored) ``` ### Rules 1. **One concept per module.** A module should do one thing. If it grows beyond ~300 lines, consider splitting. 2. **Mirror `src/` in `tests/`.** `src/ncaa_eval/model/elo.py` is tested by `tests/unit/test_elo.py` (or `tests/integration/test_elo_integration.py`). 3. **`__init__.py` re-exports.** Public symbols should be importable from the *submodule* level (e.g., `from ncaa_eval.model import Model, get_model`). The root `ncaa_eval` package does not re-export anything. Concrete model classes self-register via `@register_model` and are accessed through the registry (`get_model("elo")`) or direct import (`from ncaa_eval.model.elo import EloModel`). 4. **No circular imports.** If two modules need each other, extract shared types into a third module. 5. **Configuration lives in `pyproject.toml`.** Do not create separate config files for tools that support `pyproject.toml`. 6. **Line length is 110.** Not the default 88. Configured in `[tool.ruff] line-length = 110`. --- ## 9. Additional Architecture Rules These rules come from the project architecture and apply across all code: | Rule | Detail | |---|---| | **Type sharing** | All data structures shared between Logic and UI must use Pydantic models or TypedDicts. | | **No direct IO in UI** | The Streamlit dashboard must call `ncaa_eval` library functions — never read files directly. | | **Commit messages** | Use conventional commits format (`feat:`, `fix:`, `docs:`, etc.) enforced by Commitizen. | | **Python version** | `>=3.12,<4.0`. Use modern syntax (`match`, `type` statement, `X \| None`). | --- ## 10. SOLID Principles for Testability > **SOLID principles make code testable.** Violating SOLID = hard-to-test code. These five object-oriented design principles improve maintainability and testability. While not automatically enforced, they're checked during code review. ### S - Single Responsibility Principle (SRP) > "A class should have only one reason to change." **What it means:** Each class/function does ONE thing well. **Already enforced by:** PEP 20 complexity checks (Section 6) ```python # GOOD: Single responsibility class GameLoader: """Loads games from CSV files.""" def load(self, path: Path) -> pd.DataFrame: return pd.read_csv(path) class GameValidator: """Validates game data.""" def validate(self, games: pd.DataFrame) -> None: if games["home_score"].min() < 0: raise ValueError("Scores cannot be negative") # BAD: Multiple responsibilities class GameManager: """Does everything - loading, validating, processing, saving.""" def do_everything(self, path: Path) -> None: # Too many reasons to change! pass ``` **Testing impact:** Easy to test (one thing = one test class) --- ### O - Open/Closed Principle (OCP) > "Open for extension, closed for modification." **What it means:** Add new features without changing existing code. ```python # GOOD: Open for extension via inheritance class RatingModel(ABC): @abstractmethod def predict(self, game: Game) -> float: pass class EloModel(RatingModel): def predict(self, game: Game) -> float: return self._calculate_elo(game) class GlickoModel(RatingModel): # New model, no changes to existing code def predict(self, game: Game) -> float: return self._calculate_glicko(game) # BAD: Must modify existing code for each new model def predict(game: Game, model_type: str) -> float: if model_type == "elo": return calculate_elo(game) elif model_type == "glicko": # Must modify this function return calculate_glicko(game) ``` **Testing impact:** Easy to mock/stub new implementations --- ### L - Liskov Substitution Principle (LSP) > "Subtypes must be substitutable for their base types." **What it means:** If `Dog` inherits from `Animal`, you can use `Dog` anywhere you use `Animal` without breaking things. **Already enforced by:** Property-based tests verify contracts (Section: Test Purpose Guide) ```python # GOOD: Subclass honors parent contract class RatingModel(ABC): @abstractmethod def predict(self, game: Game) -> float: """Return probability in [0, 1].""" pass class EloModel(RatingModel): def predict(self, game: Game) -> float: # Always returns [0, 1] as promised return 1 / (1 + 10 ** ((opp_rating - rating) / 400)) # BAD: Subclass violates parent contract class BrokenModel(RatingModel): def predict(self, game: Game) -> float: # Returns values > 1.0, breaking the contract! return rating_difference * 100 ``` **Testing impact:** Property tests verify contracts hold across all implementations --- ### I - Interface Segregation Principle (ISP) > "Many specific interfaces are better than one general-purpose interface." **What it means:** Don't force classes to implement methods they don't need. **Already enforced by:** MyPy strict mode with ABCs and Protocols (Section 4) This project uses **ABCs as the primary pattern** for interface segregation (Model ABC, Repository ABC, Connector ABC) and **Protocols as a complement** for structural typing where duck typing is preferred (e.g., `ProbabilityProvider`, `ScoringRule` in the simulation engine). Choose based on context: - **ABCs:** When implementations must be explicitly registered and share behavior (e.g., all models inherit from `Model` and get common infrastructure). - **Protocols:** When you need structural typing — any object with matching methods qualifies, without inheriting from anything (e.g., `ProbabilityProvider` can be satisfied by any class with a `predict_proba` method). ```python # PRIMARY: ABCs for explicit interface contracts class Model(ABC): """Base class for all prediction models.""" @abstractmethod def predict(self, game: Game) -> float: ... class EloModel(Model): def predict(self, game: Game) -> float: ... # COMPLEMENT: Protocols for structural typing (duck typing) class ProbabilityProvider(Protocol): def predict_proba(self, matchup: Matchup) -> float: ... # Any class with predict_proba() satisfies this — no inheritance needed class EloProvider: def predict_proba(self, matchup: Matchup) -> float: ... # BAD: Fat interface forces unused methods class DoEverything(ABC): @abstractmethod def predict(self, game: Game) -> float: ... @abstractmethod def fit(self, games: pd.DataFrame) -> None: ... @abstractmethod def cross_validate(self, games: pd.DataFrame) -> dict: ... # Simple models forced to implement methods they don't need! ``` **Testing impact:** Less mocking needed (small interfaces) --- ### D - Dependency Inversion Principle (DIP) > "Depend on abstractions, not concretions." **What it means:** High-level code shouldn't depend on low-level details. **Already enforced by:** Architecture rule "Type sharing: use Pydantic models or TypedDicts" (Section 9) ```python # GOOD: Depends on abstraction (Protocol) class Simulator: def __init__(self, model: Predictable): # Any model works self.model = model def simulate_tournament(self, games: list[Game]) -> pd.DataFrame: for game in games: pred = self.model.predict(game) # Works with any Predictable ... # BAD: Depends on concrete implementation class Simulator: def __init__(self, elo_model: EloModel): # Locked to EloModel self.model = elo_model def simulate_tournament(self, games: list[Game]) -> pd.DataFrame: # Can only use EloModel, not flexible pass ``` **Testing impact:** Easy to inject test doubles --- ### SOLID Review Checklist During code review, verify: - [ ] **SRP:** Classes/functions have single, clear responsibility (covered by PEP 20 complexity) - [ ] **OCP:** New features added via extension (inheritance, composition), not modification - [ ] **LSP:** Subtypes honor parent contracts (property tests verify this) - [ ] **ISP:** Interfaces are small and focused (ABCs primary, Protocols for structural typing) - [ ] **DIP:** Depends on abstractions (Protocols, TypedDicts), not concrete classes ### Summary: What's Already Covered | SOLID Principle | Already Enforced By | |-----------------|---------------------| | **SRP** | PEP 20: "Simple is better than complex" (complexity ≤ 10) | | **OCP** | Manual review (can't automate) | | **LSP** | Property tests: "probabilities in [0, 1]" invariants | | **ISP** | MyPy strict mode: ABCs primary (Model, Repository, Connector), Protocols complement (ProbabilityProvider, ScoringRule) | | **DIP** | Architecture: "Type sharing via Pydantic/TypedDicts" | **Result:** SOLID compliance is mostly automated or already covered by existing standards. Code review adds a final check for OCP and overall SOLID adherence. --- ## 11. Quality Gate Architecture This project has two complementary quality gate systems. **Pre-commit is canonical; nox is the local convenience runner.** ### Pre-commit (Canonical) Pre-commit hooks (`.pre-commit-config.yaml`) run automatically on every commit and are enforced in CI. They are the single source of truth for code quality enforcement. | Hook | Stage | What It Does | |---|---|---| | `ruff-lint` | commit | Ruff linting with `--fix` (auto-corrects violations) | | `ruff-format` | commit | Ruff formatting (auto-formats code) | | `mypy-strict` | commit | Mypy strict type checking (`src/`, `tests/`, `noxfile.py`, `sync.py`) | | `pytest-smoke` | commit | Pytest smoke tests (`-m smoke`) | | `commitizen` | commit-msg | Conventional commit message enforcement | | `actionlint` | commit | GitHub Actions workflow linting | Pre-commit also includes housekeeping hooks: `end-of-file-fixer`, `trailing-whitespace`, `check-yaml`, `check-toml`, `detect-private-key`, `no-commit-to-branch`, `check-merge-conflict`, `debug-statements`. ### Nox (Local Convenience Runner) Nox (`noxfile.py`) provides a single command (`nox`) that runs the full quality pipeline locally. It is useful for manual full-suite runs before pushing but is NOT used in CI. | Session | What It Does | |---|---| | `lint` | `ruff check . --fix` + `ruff format --check .` | | `typecheck` | `mypy --strict` on `src/ncaa_eval`, `tests`, `noxfile.py`, `sync.py` | | `tests` | Full `pytest` suite | | `docs` | `sphinx-apidoc` + `sphinx-build` | ### Known Divergences (Intentional) | Area | Pre-commit | Nox | Rationale | |---|---|---|---| | **ruff format** | Auto-fix (rewrites files) | Check-only (`--check`) | Pre-commit should fix on commit; nox should only report for manual inspection | | **CI usage** | Runs in CI (`.github/workflows/`) | Not used in CI | Pre-commit + standalone pytest covers lint + typecheck + full tests in CI | ### CI Pipeline Two CI workflows run in GitHub Actions: **`.github/workflows/python-check.yaml`** (runs on every pull request): 1. **Pre-commit** — `pre-commit run --all-files` (all hooks: ruff, mypy, actionlint, etc.) 2. **Full test suite** — `pytest --cov=src/ncaa_eval` with coverage reporting **`.github/workflows/main-updated.yaml`** (runs on every push to `main`): 1. **Commitizen** — version bump and changelog (skipped for `bump:` commits) 2. **Sphinx docs** — `sphinx-apidoc` + `sphinx-build` → GitHub Pages deployment Quality gates are enforced twice: by the developer's local pre-commit hooks on every commit/push, and again by `python-check.yaml` on every pull request before merge. --- ## References - `pyproject.toml` — Single source of truth for all tool configurations - `specs/05-architecture-fullstack.md` Section 12 — Coding standards - `specs/05-architecture-fullstack.md` Section 10 — Development workflow - `specs/03-prd.md` Section 4 — Technical assumptions & constraints