ncaa_eval.transform.serving module¶
Chronological data serving layer for walk-forward model training.
Provides ChronologicalDataServer, which wraps a Repository and
streams game data in strict date order with temporal boundary enforcement.
Downstream consumers (walk-forward splitters, feature pipelines) use this
layer to ensure no data from future games leaks into model training.
- class ncaa_eval.transform.serving.ChronologicalDataServer(repository: Repository)[source]¶
Bases:
objectServes game data in strict chronological order for walk-forward modeling.
Wraps a
Repositoryand enforces temporal boundaries so that callers cannot accidentally access future game data during walk-forward validation.- Parameters:
repository – The data store from which games are retrieved.
Example:
from ncaa_eval.ingest.repository import ParquetRepository from ncaa_eval.transform.serving import ChronologicalDataServer repo = ParquetRepository(Path("data/")) server = ChronologicalDataServer(repo) season = server.get_chronological_season(2023) for daily_batch in server.iter_games_by_date(2023): process(daily_batch)
- get_chronological_season(year: int, cutoff_date: date | None = None) SeasonGames[source]¶
Return all games for year sorted ascending by (date, game_id).
Applies optional temporal cutoff so callers cannot retrieve games that had not yet been played as of a given date. This is the primary leakage-prevention mechanism for walk-forward model training.
- Parameters:
year – Season year (e.g., 2023 for the 2022-23 season).
cutoff_date – If provided, only games on or before this date are returned. Must not be in the future.
- Returns:
SeasonGameswith games sorted by(date, game_id)and thehas_tournamentflag reflecting known tournament cancellations.- Raises:
ValueError – If
cutoff_dateis strictly after today’s date.
- iter_games_by_date(year: int, cutoff_date: date | None = None) Iterator[list[Game]][source]¶
Yield batches of games grouped by calendar date, in chronological order.
Each yielded list contains all games played on a single calendar date. Dates with no games are skipped. Applies the same
cutoff_datesemantics asget_chronological_season().- Parameters:
year – Season year.
cutoff_date – Optional temporal cutoff (must not be in the future).
- Yields:
Non-empty
list[Game]for each calendar date, in ascending order.
- class ncaa_eval.transform.serving.SeasonGames(year: int, games: list[Game], has_tournament: bool)[source]¶
Bases:
objectResult of a chronological season query.
- year¶
Season year (e.g., 2023 for the 2022-23 season).
- Type:
int
- games¶
All qualifying games sorted ascending by (date, game_id).
- Type:
- has_tournament¶
False only for known no-tournament years (e.g., 2020 COVID cancellation). Signals to downstream walk-forward splitters that tournament evaluation should be skipped for this season.
- Type:
bool
- has_tournament: bool¶
- year: int¶
- ncaa_eval.transform.serving.rescale_overtime(score: int, num_ot: int) float[source]¶
Rescale a game score to a 40-minute equivalent for OT normalization.
Overtime games inflate per-game scoring statistics because they involve more than 40 minutes of play. The standard correction (Edwards 2021) normalises every game to a 40-minute basis:
adjusted = raw_score × 40 / (40 + 5 × num_ot)
- Parameters:
score – Raw final score (not adjusted).
num_ot – Number of overtime periods played (0 for regulation).
- Returns:
Score normalised to a 40-minute equivalent.
Examples
>>> rescale_overtime(75, 0) # Regulation: no change 75.0 >>> rescale_overtime(80, 1) # 1 OT: 80 × 40 / 45 ≈ 71.11 71.11111111111111