ncaa_eval.evaluation.backtest module

Parallel cross-validation backtest orchestrator.

Provides run_backtest(), which executes walk-forward cross-validation folds in parallel using joblib.Parallel. Each fold trains an independent deep-copied model instance, generates predictions on tournament games, and computes evaluation metrics. Results are aggregated into a BacktestResult containing per-fold details and a summary DataFrame.

class ncaa_eval.evaluation.backtest.BacktestResult(fold_results: tuple[FoldResult, ...], summary: DataFrame, elapsed_seconds: float)[source]

Bases: object

Aggregated result of a full backtest across all folds.

fold_results

Per-fold evaluation results, sorted by year.

Type:

tuple[ncaa_eval.evaluation.backtest.FoldResult, …]

summary

DataFrame with year as index, metric columns + elapsed_seconds.

Type:

pandas.core.frame.DataFrame

elapsed_seconds

Total wall-clock time for the entire backtest.

Type:

float

elapsed_seconds: float
fold_results: tuple[FoldResult, ...]
summary: DataFrame
class ncaa_eval.evaluation.backtest.FoldResult(year: int, predictions: ~pandas.core.series.Series, actuals: ~pandas.core.series.Series, metrics: ~collections.abc.Mapping[str, float], elapsed_seconds: float, test_game_ids: ~pandas.core.series.Series = <factory>, test_team_a_ids: ~pandas.core.series.Series = <factory>, test_team_b_ids: ~pandas.core.series.Series = <factory>)[source]

Bases: object

Result of evaluating a single cross-validation fold.

year

The test season year for this fold.

Type:

int

predictions

Predicted probabilities for tournament games.

Type:

pandas.core.series.Series

actuals

Actual binary outcomes for tournament games.

Type:

pandas.core.series.Series

metrics

Mapping of metric name to computed value.

Type:

collections.abc.Mapping[str, float]

elapsed_seconds

Wall-clock time for the fold evaluation.

Type:

float

test_game_ids

Game IDs from the test fold (aligned to predictions).

Type:

pandas.core.series.Series

test_team_a_ids

team_a IDs from the test fold.

Type:

pandas.core.series.Series

test_team_b_ids

team_b IDs from the test fold.

Type:

pandas.core.series.Series

actuals: Series
elapsed_seconds: float
metrics: Mapping[str, float]
predictions: Series
test_game_ids: Series
test_team_a_ids: Series
test_team_b_ids: Series
year: int
ncaa_eval.evaluation.backtest.default_metrics() dict[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]][source]

Return all registered metric functions (built-in + user-registered).

ncaa_eval.evaluation.backtest.feature_cols(df: DataFrame) list[str][source]

Return feature column names (everything not in METADATA_COLS).

Parameters:

df – DataFrame whose columns are inspected.

Returns:

List of column names that are not metadata.

ncaa_eval.evaluation.backtest.run_backtest(model: Model, feature_server: StatefulFeatureServer, *, seasons: Sequence[int], mode: Literal['batch', 'stateful'] = 'batch', n_jobs: int = -1, metric_fns: Mapping[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]] | None = None, console: Console | None = None, progress: bool = False) BacktestResult[source]

Run parallelized walk-forward cross-validation backtest.

Parameters:
  • model – Model instance to evaluate (will be deep-copied per fold).

  • feature_server – Configured feature server for building CV folds.

  • seasons – Season years to include (passed to walk_forward_splits).

  • mode – Feature serving mode ("batch" or "stateful").

  • n_jobs – Number of parallel workers. -1 = all cores, 1 = sequential.

  • metric_fns – Metric functions to compute per fold. Defaults to {log_loss, brier_score, roc_auc, expected_calibration_error}.

  • console – Rich Console for progress output.

  • progress – Display a tqdm progress bar for fold evaluation. Most useful with n_jobs=1 (sequential execution).

Returns:

BacktestResult with per-fold results and summary DataFrame.

Raises:

ValueError – If mode is not "batch" or "stateful", or if seasons contains fewer than 2 elements (propagated from walk_forward_splits()).