ncaa_eval.evaluation.splitter module

Walk-forward cross-validation splitter with Leave-One-Tournament-Out folds.

Provides walk_forward_splits(), which partitions historical game data into train/test folds where each fold uses one tournament year as the test set and all prior years as training data. The 2020 COVID year is handled gracefully: its regular-season data is included in training but no test fold is yielded (the tournament was cancelled).

class ncaa_eval.evaluation.splitter.CVFold(train: DataFrame, test: DataFrame, year: int)[source]

Bases: object

A single cross-validation fold.

train

All games from seasons strictly before the test year.

Type:

pandas.core.frame.DataFrame

test

Tournament games only from the test year.

Type:

pandas.core.frame.DataFrame

year

The test season year.

Type:

int

test: DataFrame
train: DataFrame
year: int
ncaa_eval.evaluation.splitter.walk_forward_splits(seasons: Sequence[int], feature_server: StatefulFeatureServer, *, mode: Literal['batch', 'stateful'] = 'batch') Iterator[CVFold][source]

Generate walk-forward CV folds with Leave-One-Tournament-Out splits.

Parameters:
  • seasons – Ordered sequence of season years to include (e.g., range(2008, 2026)). Must contain at least 2 seasons.

  • feature_server – Configured StatefulFeatureServer for building feature matrices.

  • mode – Feature serving mode: "batch" (stateless models) or "stateful" (sequential-update models like Elo).

Yields:

CVFold – For each eligible test year (skipping no-tournament years like 2020): train contains all games from seasons strictly before the test year; test contains only tournament games from the test year; year is the test season year.

Raises:

ValueError – If seasons has fewer than 2 elements, or if mode is not "batch" or "stateful".