ncaa_eval.evaluation.backtest module¶
Parallel cross-validation backtest orchestrator.
Provides run_backtest(), which executes walk-forward cross-validation
folds in parallel using joblib.Parallel. Each fold trains an independent
deep-copied model instance, generates predictions on tournament games, and
computes evaluation metrics. Results are aggregated into a
BacktestResult containing per-fold details and a summary DataFrame.
- class ncaa_eval.evaluation.backtest.BacktestResult(fold_results: tuple[FoldResult, ...], summary: DataFrame, elapsed_seconds: float)[source]¶
Bases:
objectAggregated result of a full backtest across all folds.
- fold_results¶
Per-fold evaluation results, sorted by year.
- Type:
tuple[ncaa_eval.evaluation.backtest.FoldResult, …]
- summary¶
DataFrame with year as index, metric columns + elapsed_seconds.
- Type:
pandas.core.frame.DataFrame
- elapsed_seconds¶
Total wall-clock time for the entire backtest.
- Type:
float
- elapsed_seconds: float¶
- fold_results: tuple[FoldResult, ...]¶
- summary: DataFrame¶
- class ncaa_eval.evaluation.backtest.FoldResult(year: int, predictions: ~pandas.core.series.Series, actuals: ~pandas.core.series.Series, metrics: ~collections.abc.Mapping[str, float], elapsed_seconds: float, test_game_ids: ~pandas.core.series.Series = <factory>, test_team_a_ids: ~pandas.core.series.Series = <factory>, test_team_b_ids: ~pandas.core.series.Series = <factory>)[source]¶
Bases:
objectResult of evaluating a single cross-validation fold.
- year¶
The test season year for this fold.
- Type:
int
- predictions¶
Predicted probabilities for tournament games.
- Type:
pandas.core.series.Series
- actuals¶
Actual binary outcomes for tournament games.
- Type:
pandas.core.series.Series
- metrics¶
Mapping of metric name to computed value.
- Type:
collections.abc.Mapping[str, float]
- elapsed_seconds¶
Wall-clock time for the fold evaluation.
- Type:
float
- test_game_ids¶
Game IDs from the test fold (aligned to predictions).
- Type:
pandas.core.series.Series
- test_team_a_ids¶
team_a IDs from the test fold.
- Type:
pandas.core.series.Series
- test_team_b_ids¶
team_b IDs from the test fold.
- Type:
pandas.core.series.Series
- actuals: Series¶
- elapsed_seconds: float¶
- metrics: Mapping[str, float]¶
- predictions: Series¶
- test_game_ids: Series¶
- test_team_a_ids: Series¶
- test_team_b_ids: Series¶
- year: int¶
- ncaa_eval.evaluation.backtest.default_metrics() dict[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]][source]¶
Return all registered metric functions (built-in + user-registered).
- ncaa_eval.evaluation.backtest.feature_cols(df: DataFrame) list[str][source]¶
Return feature column names (everything not in METADATA_COLS).
- Parameters:
df – DataFrame whose columns are inspected.
- Returns:
List of column names that are not metadata.
- ncaa_eval.evaluation.backtest.run_backtest(model: Model, feature_server: StatefulFeatureServer, *, seasons: Sequence[int], mode: Literal['batch', 'stateful'] = 'batch', n_jobs: int = -1, metric_fns: Mapping[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]] | None = None, console: Console | None = None, progress: bool = False) BacktestResult[source]¶
Run parallelized walk-forward cross-validation backtest.
- Parameters:
model – Model instance to evaluate (will be deep-copied per fold).
feature_server – Configured feature server for building CV folds.
seasons – Season years to include (passed to walk_forward_splits).
mode – Feature serving mode (
"batch"or"stateful").n_jobs – Number of parallel workers. -1 = all cores, 1 = sequential.
metric_fns – Metric functions to compute per fold. Defaults to {log_loss, brier_score, roc_auc, expected_calibration_error}.
console – Rich Console for progress output.
progress – Display a tqdm progress bar for fold evaluation. Most useful with
n_jobs=1(sequential execution).
- Returns:
BacktestResult with per-fold results and summary DataFrame.
- Raises:
ValueError – If
modeis not"batch"or"stateful", or ifseasonscontains fewer than 2 elements (propagated fromwalk_forward_splits()).