ncaa_eval.transform.elo module

Game-by-game Elo rating engine for NCAA basketball feature engineering.

Computes Elo ratings as a feature building block — the resulting per-team ratings feed into models (XGBoost, etc.) as input features. This module does NOT implement model-level train/predict/save interfaces; those belong in Story 5.3.

Key design points:

  • update_game() returns the before ratings, then mutates internal state, guaranteeing walk-forward temporal safety.

  • Variable K-factor: early-season → regular-season → tournament.

  • Margin-of-victory scaling with diminishing returns (Silver/SBCB formula).

  • Home-court adjustment subtracted from effective rating before computing expected outcome.

  • Season mean-reversion toward conference mean (or global mean as fallback).

class ncaa_eval.transform.elo.EloConfig(initial_rating: float = 1500.0, k_early: float = 56.0, k_regular: float = 38.0, k_tournament: float = 47.5, early_game_threshold: int = 20, margin_exponent: float = 0.85, max_margin: int = 25, home_advantage_elo: float = 3.5, mean_reversion_fraction: float = 0.25)[source]

Bases: object

Frozen configuration for the Elo feature engine.

All K-factor, margin scaling, home-court, and mean-reversion parameters are configurable with sensible defaults matching the Silver/SBCB model.

early_game_threshold: int = 20
home_advantage_elo: float = 3.5
initial_rating: float = 1500.0
k_early: float = 56.0
k_regular: float = 38.0
k_tournament: float = 47.5
margin_exponent: float = 0.85
max_margin: int = 25
mean_reversion_fraction: float = 0.25
class ncaa_eval.transform.elo.EloFeatureEngine(config: EloConfig, conference_lookup: ConferenceLookup | None = None)[source]

Bases: object

Game-by-game Elo rating engine.

Parameters:
  • config – Frozen Elo configuration.

  • conference_lookup – Optional conference lookup for season mean-reversion. When None, mean-reversion falls back to global mean.

apply_season_mean_reversion(season: int) None[source]

Regress each team toward its conference mean (or global mean).

Groups all rated teams by conference via ConferenceLookup, computes each conference’s mean rating, then shifts every team’s rating a fraction mean_reversion_fraction of the way toward its conference mean. Teams with no conference entry fall back to the global mean; when no ConferenceLookup is provided all teams use the global mean. Is a no-op when no prior ratings exist.

static expected_score(rating_a: float, rating_b: float) float[source]

Logistic expected score for team A against team B.

expected = 1 / (1 + 10^((r_b r_a) / 400))

get_all_ratings() dict[int, float][source]

Return a copy of the current ratings dict.

get_game_counts() dict[int, int][source]

Return a copy of the current game-counts dict.

get_rating(team_id: int) float[source]

Return current Elo rating for team_id (initial_rating if unseen).

has_ratings() bool[source]

Return True if at least one team has a rating.

predict_matchup(team_a_id: int, team_b_id: int) float[source]

Return P(team_a wins) using the Elo expected-score formula.

process_season(games: list[Game], season: int) pd.DataFrame[source]

Process all games for a season, returning before-ratings per game.

Calls start_new_season(season) if prior ratings exist (i.e., this is not the very first season).

Parameters:
  • games – Games sorted in chronological order.

  • season – Season year.

Returns:

DataFrame with columns [game_id, elo_w_before, elo_l_before].

reset_game_counts() None[source]

Reset per-team game counts for a new season (affects variable K).

set_game_counts(counts: dict[int, int]) None[source]

Replace all game counts with counts.

set_ratings(ratings: dict[int, float]) None[source]

Replace all ratings with ratings.

start_new_season(season: int) None[source]

Orchestrate season transition: mean-reversion then reset counts.

update_game(w_team_id: int, l_team_id: int, w_score: int, l_score: int, loc: str, is_tournament: bool, *, num_ot: int = 0) tuple[float, float][source]

Process one game and update ratings.

Snapshots before-ratings for feature use, applies home-court effective-rating adjustment to expected-score computation, computes the margin-of-victory multiplier and variable K-factor, then mutates internal rating state for both teams.

Parameters:
  • w_team_id – Winner team ID.

  • l_team_id – Loser team ID.

  • w_score – Winner final score (raw).

  • l_score – Loser final score (raw).

  • loc"H" (winner home), "A" (winner away), "N" (neutral).

  • is_tournament – Whether this is a tournament game.

  • num_ot – Number of overtime periods (used for margin rescaling).

Returns:

Tuple of (elo_w_before, elo_l_before) — the winner’s and loser’s ratings before this game’s update, suitable for use as walk-forward feature values.