ncaa_eval.evaluation.backtest module¶

Parallel cross-validation backtest orchestrator.

Provides run_backtest(), which executes walk-forward cross-validation folds in parallel using joblib.Parallel. Each fold trains an independent deep-copied model instance, generates predictions on tournament games, and computes evaluation metrics. Results are aggregated into a BacktestResult containing per-fold details and a summary DataFrame.

class ncaa_eval.evaluation.backtest.BacktestResult(fold_results: tuple[FoldResult, ...], summary: DataFrame, elapsed_seconds: float)[source]¶

Bases: object

Aggregated result of a full backtest across all folds.

fold_results¶

Per-fold evaluation results, sorted by year.

Type:: tuple[ncaa_eval.evaluation.backtest.FoldResult, …]

summary¶

DataFrame with year as index, metric columns + elapsed_seconds.

Type:: pandas.core.frame.DataFrame

elapsed_seconds¶

Total wall-clock time for the entire backtest.

Type:: float

elapsed_seconds: float¶

fold_results: tuple[FoldResult, ...]¶

summary: DataFrame¶

class ncaa_eval.evaluation.backtest.FoldResult(year: int, predictions: ~pandas.core.series.Series, actuals: ~pandas.core.series.Series, metrics: ~collections.abc.Mapping[str, float], elapsed_seconds: float, test_game_ids: ~pandas.core.series.Series = <factory>, test_team_a_ids: ~pandas.core.series.Series = <factory>, test_team_b_ids: ~pandas.core.series.Series = <factory>)[source]¶

Bases: object

Result of evaluating a single cross-validation fold.

year¶

The test season year for this fold.

Type:: int

predictions¶

Predicted probabilities for tournament games.

Type:: pandas.core.series.Series

actuals¶

Actual binary outcomes for tournament games.

Type:: pandas.core.series.Series

metrics¶

Mapping of metric name to computed value.

Type:: collections.abc.Mapping[str, float]

elapsed_seconds¶

Wall-clock time for the fold evaluation.

Type:: float

test_game_ids¶

Game IDs from the test fold (aligned to predictions).

Type:: pandas.core.series.Series

test_team_a_ids¶

team_a IDs from the test fold.

Type:: pandas.core.series.Series

test_team_b_ids¶

team_b IDs from the test fold.

Type:: pandas.core.series.Series

actuals: Series¶

elapsed_seconds: float¶

metrics: Mapping[str, float]¶

predictions: Series¶

test_game_ids: Series¶

test_team_a_ids: Series¶

test_team_b_ids: Series¶

year: int¶

ncaa_eval.evaluation.backtest.default_metrics() → dict[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]][source]¶: Return all registered metric functions (built-in + user-registered).

ncaa_eval.evaluation.backtest.feature_cols(df: DataFrame) → list[str][source]¶

Return feature column names (everything not in METADATA_COLS).

Parameters:: df – DataFrame whose columns are inspected.
Returns:: List of column names that are not metadata.

ncaa_eval.evaluation.backtest.run_backtest(model: Model, feature_server: StatefulFeatureServer, *, seasons: Sequence[int], mode: Literal['batch', 'stateful'] = 'batch', n_jobs: int = -1, metric_fns: Mapping[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]] | None = None, console: Console | None = None, progress: bool = False) → BacktestResult[source]¶

Run parallelized walk-forward cross-validation backtest.

Parameters:

model – Model instance to evaluate (will be deep-copied per fold).
feature_server – Configured feature server for building CV folds.
seasons – Season years to include (passed to walk_forward_splits).
mode – Feature serving mode ("batch" or "stateful").
n_jobs – Number of parallel workers. -1 = all cores, 1 = sequential.
metric_fns – Metric functions to compute per fold. Defaults to {log_loss, brier_score, roc_auc, expected_calibration_error}.
console – Rich Console for progress output.
progress – Display a tqdm progress bar for fold evaluation. Most useful with n_jobs=1 (sequential execution).

Returns:

BacktestResult with per-fold results and summary DataFrame.

Raises:

ValueError – If mode is not "batch" or "stateful", or if seasons contains fewer than 2 elements (propagated from walk_forward_splits()).