ncaa_eval.evaluation package

Submodules

Module contents

Evaluation metrics, cross-validation, and tournament simulation module.

class ncaa_eval.evaluation.BacktestResult(fold_results: tuple[FoldResult, ...], summary: DataFrame, elapsed_seconds: float)[source]

Bases: object

Aggregated result of a full backtest across all folds.

fold_results

Per-fold evaluation results, sorted by year.

Type:

tuple[ncaa_eval.evaluation.backtest.FoldResult, …]

summary

DataFrame with year as index, metric columns + elapsed_seconds.

Type:

pandas.core.frame.DataFrame

elapsed_seconds

Total wall-clock time for the entire backtest.

Type:

float

elapsed_seconds: float
fold_results: tuple[FoldResult, ...]
summary: DataFrame
class ncaa_eval.evaluation.BracketDistribution(scores: ndarray[tuple[Any, ...], dtype[float64]], percentiles: dict[int, float], mean: float, std: float, histogram_bins: ndarray[tuple[Any, ...], dtype[float64]], histogram_counts: ndarray[tuple[Any, ...], dtype[int64]])[source]

Bases: object

Score distribution statistics from Monte Carlo simulation.

scores

Raw per-simulation scores, shape (n_simulations,).

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

percentiles

Mapping of percentile → value for keys 5, 25, 50, 75, 95.

Type:

dict[int, float]

mean

Mean score across simulations.

Type:

float

std

Standard deviation of scores.

Type:

float

histogram_bins

Histogram bin edges, shape (n_bins + 1,).

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

histogram_counts

Histogram counts, shape (n_bins,).

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int64]]

histogram_bins: ndarray[tuple[Any, ...], dtype[float64]]
histogram_counts: ndarray[tuple[Any, ...], dtype[int64]]
mean: float
percentiles: dict[int, float]
scores: ndarray[tuple[Any, ...], dtype[float64]]
std: float
class ncaa_eval.evaluation.BracketNode(round_index: int, team_index: int = -1, left: BracketNode | None = None, right: BracketNode | None = None)[source]

Bases: object

Node in a tournament bracket tree.

A leaf node represents a single team; an internal node represents a game whose winner advances.

round_index

Round number (0-indexed). Leaves have round_index=-1.

Type:

int

team_index

Index into the bracket’s team_ids tuple for leaf nodes. -1 for internal nodes.

Type:

int

left

Left child (None for leaves).

Type:

ncaa_eval.evaluation.bracket.BracketNode | None

right

Right child (None for leaves).

Type:

ncaa_eval.evaluation.bracket.BracketNode | None

property is_leaf: bool

Return True if this is a leaf (team) node.

left: BracketNode | None = None
right: BracketNode | None = None
round_index: int
team_index: int = -1
class ncaa_eval.evaluation.BracketStructure(root: ~ncaa_eval.evaluation.bracket.BracketNode, team_ids: tuple[int, ...], team_index_map: dict[int, int], seed_map: dict[int, int] = <factory>)[source]

Bases: object

Immutable tournament bracket.

root

Root BracketNode of the bracket tree.

Type:

ncaa_eval.evaluation.bracket.BracketNode

team_ids

Tuple of team IDs in bracket-position order (leaf order).

Type:

tuple[int, …]

team_index_map

Mapping of team_id index into team_ids.

Type:

dict[int, int]

seed_map

Mapping of team_id seed_num for seed-aware scoring.

Type:

dict[int, int]

root: BracketNode
seed_map: dict[int, int]
team_ids: tuple[int, ...]
team_index_map: dict[int, int]
class ncaa_eval.evaluation.CVFold(train: DataFrame, test: DataFrame, year: int)[source]

Bases: object

A single cross-validation fold.

train

All games from seasons strictly before the test year.

Type:

pandas.core.frame.DataFrame

test

Tournament games only from the test year.

Type:

pandas.core.frame.DataFrame

year

The test season year.

Type:

int

test: DataFrame
train: DataFrame
year: int
class ncaa_eval.evaluation.CustomScoring(scoring_fn: Callable[[int], float], scoring_name: str)[source]

Bases: object

User-defined scoring rule wrapping a callable.

Parameters:
  • scoring_fn – Callable mapping round_idx → points.

  • scoring_name – Name for this custom rule.

property name: str

Return the custom rule name.

points_per_round(round_idx: int) float[source]

Return points from the wrapped callable.

class ncaa_eval.evaluation.DictScoring(points: dict[int, float], scoring_name: str)[source]

Bases: object

Scoring rule from a dict mapping round_idx to points.

Parameters:
  • points – Mapping of round_idx points for rounds 0–5.

  • scoring_name – Name for this rule.

Raises:

ValueError – If points does not contain exactly 6 entries (rounds 0–5).

property name: str

Return the rule name.

points_per_round(round_idx: int) float[source]

Return points for round_idx.

class ncaa_eval.evaluation.EloProvider(model: Any)[source]

Bases: object

Wraps a StatefulModel as a ProbabilityProvider.

Uses the model’s predict_matchup method for probability computation.

Parameters:

model – Any StatefulModel instance with predict_matchup.

batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]

Return batch probabilities by looping predict_matchup.

Iterates team pairs, calling predict_matchup per matchup, and collects results into a list.

Elo is O(1) per pair so looping is acceptable.

matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) float[source]

Return P(team_a beats team_b) via the model’s predict_matchup.

Delegates to the model’s predict_matchup method, which retrieves both teams’ current ratings and applies the Elo logistic expected-score formula.

class ncaa_eval.evaluation.EnsembleProvider(ensemble: StackedEnsemble, data_dir: Path, season: int)[source]

Bases: object

Wraps a StackedEnsemble as a ProbabilityProvider.

Calls ensemble.predict_bracket(data_dir, season) once on first use and caches the result as a MatrixProvider for subsequent lookups. This allows a StackedEnsemble to be passed to build_probability_matrix() and the Monte Carlo bracket simulator identically to single-model mode.

Parameters:
  • ensemble – A trained StackedEnsemble instance.

  • data_dir – Path to the local Parquet data store.

  • season – Target season year.

batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]

Return batch probabilities from the cached ensemble matrix.

Triggers ensemble bracket prediction on first call; subsequent calls use the cached matrix.

matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) float[source]

Return P(team_a beats team_b) from the ensemble probability matrix.

Triggers ensemble bracket prediction on first call; subsequent calls use the cached matrix.

class ncaa_eval.evaluation.FibonacciScoring[source]

Bases: object

Fibonacci-style scoring: 2-3-5-8-13-21 (231 total for perfect bracket).

property name: str

Return 'fibonacci'.

points_per_round(round_idx: int) float[source]

Return Fibonacci scoring points for round_idx.

class ncaa_eval.evaluation.FoldResult(year: int, predictions: ~pandas.core.series.Series, actuals: ~pandas.core.series.Series, metrics: ~collections.abc.Mapping[str, float], elapsed_seconds: float, test_game_ids: ~pandas.core.series.Series = <factory>, test_team_a_ids: ~pandas.core.series.Series = <factory>, test_team_b_ids: ~pandas.core.series.Series = <factory>)[source]

Bases: object

Result of evaluating a single cross-validation fold.

year

The test season year for this fold.

Type:

int

predictions

Predicted probabilities for tournament games.

Type:

pandas.core.series.Series

actuals

Actual binary outcomes for tournament games.

Type:

pandas.core.series.Series

metrics

Mapping of metric name to computed value.

Type:

collections.abc.Mapping[str, float]

elapsed_seconds

Wall-clock time for the fold evaluation.

Type:

float

test_game_ids

Game IDs from the test fold (aligned to predictions).

Type:

pandas.core.series.Series

test_team_a_ids

team_a IDs from the test fold.

Type:

pandas.core.series.Series

test_team_b_ids

team_b IDs from the test fold.

Type:

pandas.core.series.Series

actuals: Series
elapsed_seconds: float
metrics: Mapping[str, float]
predictions: Series
test_game_ids: Series
test_team_a_ids: Series
test_team_b_ids: Series
year: int
class ncaa_eval.evaluation.MatchupContext(season: int, day_num: int, is_neutral: bool)[source]

Bases: object

Context for a hypothetical matchup probability query.

Passed to ProbabilityProvider so that stateless models can construct the correct feature row for a hypothetical pairing. Stateful models (Elo) typically ignore context and use internal ratings.

season

Tournament season year (e.g. 2024).

Type:

int

day_num

Tournament day number (e.g. 136 for Round of 64).

Type:

int

is_neutral

True for all tournament games (neutral site).

Type:

bool

day_num: int
is_neutral: bool
season: int
class ncaa_eval.evaluation.MatrixProvider(prob_matrix: ndarray[tuple[Any, ...], dtype[float64]], team_ids: Sequence[int])[source]

Bases: object

Wraps a pre-computed probability matrix as a ProbabilityProvider.

Parameters:
  • prob_matrix – n×n pairwise probability matrix.

  • team_ids – Sequence of team IDs matching matrix indices.

batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]

Return batch probabilities from the stored matrix.

Extracts row/column indices from the team pairs, vectorizes lookups into the probability matrix, and returns a list of win probabilities.

matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) float[source]

Return P(team_a beats team_b) from the stored matrix.

Indexes into the pre-built probability matrix using the team-to-index mapping, returning P(team_i beats team_j) directly from the stored array.

exception ncaa_eval.evaluation.MetricNotFoundError[source]

Bases: KeyError

Raised when a requested metric name is not in the registry.

class ncaa_eval.evaluation.MostLikelyBracket(winners: tuple[int, ...], champion_team_id: int, log_likelihood: float)[source]

Bases: object

Maximum-likelihood bracket from greedy traversal.

winners

Tuple of team indices for each game’s predicted winner, in round-major order matching SimulationResult.sim_winners rows — all Round-of-64 games first (indices 0–31 for 64 teams), then Round-of-32 (32–47), through to the championship (index 62). 63 entries for a 64-team bracket. Pass directly to score_bracket_against_sims() as chosen_bracket.

Type:

tuple[int, …]

champion_team_id

Canonical team ID of the predicted champion (from BracketStructure.team_ids[champion_index]).

Type:

int

log_likelihood

Sum of log(max(P[left, right], P[right, left])) across all games.

Type:

float

champion_team_id: int
log_likelihood: float
winners: tuple[int, ...]
class ncaa_eval.evaluation.ProbabilityProvider(*args, **kwargs)[source]

Bases: Protocol

Protocol for pairwise win probability computation.

All implementations must satisfy the complementarity contract: P(A beats B) + P(B beats A) = 1 for every (A, B) pair.

batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]

Return P(a_i beats b_i) for all pairs.

Parameters:
  • team_a_ids – Sequence of first-team IDs.

  • team_b_ids – Sequence of second-team IDs (same length).

  • context – Matchup context.

Returns:

1-D float64 array of shape (len(team_a_ids),).

matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) float[source]

Return P(team_a beats team_b).

Parameters:
  • team_a_id – First team’s canonical ID.

  • team_b_id – Second team’s canonical ID.

  • context – Matchup context (season, day_num, neutral).

Returns:

Probability in [0, 1].

class ncaa_eval.evaluation.ReliabilityData(fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]], mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]], bin_counts: ndarray[tuple[Any, ...], dtype[int64]], bin_edges: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int)[source]

Bases: object

Structured return type for reliability diagram data.

fraction_of_positives

Observed fraction of positives per bin (from calibration_curve).

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

mean_predicted_value

Mean predicted probability per bin (from calibration_curve).

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

bin_counts

Number of samples in each non-empty bin.

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int64]]

bin_edges

Full bin edge array of shape (n_bins + 1,), i.e. np.linspace(0.0, 1.0, n_bins + 1). Includes both the lower (0.0) and upper (1.0) boundaries so callers do not need to recompute them.

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

n_bins

Requested number of bins.

Type:

int

bin_counts: ndarray[tuple[Any, ...], dtype[int64]]
bin_edges: ndarray[tuple[Any, ...], dtype[float64]]
fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]]
mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]]
n_bins: int
exception ncaa_eval.evaluation.ScoringNotFoundError[source]

Bases: KeyError

Raised when a requested scoring name is not in the registry.

class ncaa_eval.evaluation.ScoringRule(*args, **kwargs)[source]

Bases: Protocol

Protocol for tournament bracket scoring rules.

property name: str

Human-readable name of the scoring rule.

points_per_round(round_idx: int) float[source]

Return points awarded for a correct pick in round round_idx.

Parameters:

round_idx – Zero-indexed round number (0=R64 through 5=NCG).

Returns:

Points as a float.

class ncaa_eval.evaluation.SeedDiffBonusScoring(seed_map: dict[int, int])[source]

Bases: object

Base points + seed-difference bonus when lower seed wins.

Uses same base as StandardScoring (1-2-4-8-16-32). When the lower seed (higher seed number) wins, adds |seed_a - seed_b| bonus.

Note: This scoring rule’s points_per_round returns only the base points. Full EP computation for seed-diff scoring (which requires per-matchup seed information) is deferred to Story 6.6, which will add a dedicated compute_expected_points_seed_diff function.

Parameters:

seed_map – Mapping of team_id seed_num.

property name: str

Return 'seed_diff_bonus'.

points_per_round(round_idx: int) float[source]

Return base points (excludes seed-diff bonus).

seed_diff_bonus(seed_a: int, seed_b: int) float[source]

Return bonus points when the lower seed wins.

Parameters:
  • seed_a – Winner’s seed number.

  • seed_b – Loser’s seed number.

Returns:

|seed_a - seed_b| if winner has higher seed number (lower seed = upset), else 0.

property seed_map: dict[int, int]

Return the seed lookup map.

class ncaa_eval.evaluation.SimulationResult(season: int, advancement_probs: ndarray[tuple[Any, ...], dtype[float64]], expected_points: dict[str, ndarray[tuple[Any, ...], dtype[float64]]], method: str, n_simulations: int | None, confidence_intervals: dict[str, tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]] | None, score_distribution: dict[str, ndarray[tuple[Any, ...], dtype[float64]]] | None, bracket_distributions: dict[str, BracketDistribution] | None = None, sim_winners: ndarray[tuple[Any, ...], dtype[int32]] | None = None)[source]

Bases: object

Result of tournament simulation for one season.

Both the analytical path and MC path produce a SimulationResult.

season

Tournament season year.

Type:

int

advancement_probs

Per-team advancement probabilities, shape (n_teams, n_rounds).

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

expected_points

Mapping of scoring_rule_name per-team EP, each shape (n_teams,).

Type:

dict[str, numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]]

method

"analytical" or "monte_carlo".

Type:

str

n_simulations

None for analytical; N for MC.

Type:

int | None

confidence_intervals

Optional mapping of rule_name (lower, upper) arrays.

Type:

dict[str, tuple[numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]]] | None

score_distribution

Optional mapping of rule_name per-sim scores array, shape (n_simulations,).

Type:

dict[str, numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]] | None

bracket_distributions

Optional mapping of rule_name BracketDistribution (MC only; None for analytical). Note: distributions are computed from the chalk-bracket score (how many pre-game favorites won). For pool scoring analysis (“how would my chosen bracket score across all simulations?”), use sim_winners with score_bracket_against_sims().

Type:

dict[str, ncaa_eval.evaluation.simulation.BracketDistribution] | None

sim_winners

Optional array of per-simulation game winners, shape (n_simulations, n_games) (MC only; None for analytical).

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int32]] | None

advancement_probs: ndarray[tuple[Any, ...], dtype[float64]]
bracket_distributions: dict[str, BracketDistribution] | None = None
confidence_intervals: dict[str, tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]] | None
expected_points: dict[str, ndarray[tuple[Any, ...], dtype[float64]]]
method: str
n_simulations: int | None
score_distribution: dict[str, ndarray[tuple[Any, ...], dtype[float64]]] | None
season: int
sim_winners: ndarray[tuple[Any, ...], dtype[int32]] | None = None
class ncaa_eval.evaluation.StandardScoring[source]

Bases: object

ESPN-style scoring: 1-2-4-8-16-32 (192 total for perfect bracket).

property name: str

Return 'standard'.

points_per_round(round_idx: int) float[source]

Return standard scoring points for round_idx.

ncaa_eval.evaluation.brier_score(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]

Compute Brier Score for binary predictions.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

Returns:

Brier Score value (lower is better).

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.build_bracket(seeds: list[TourneySeed], season: int) BracketStructure[source]

Construct a 64-team bracket tree from tournament seeds.

Play-in teams (is_play_in=True) are excluded. Exactly 64 non-play-in seeds are required.

Parameters:
  • seeds – List of TourneySeed objects for the given season.

  • season – Season year to filter seeds.

Returns:

Fully constructed BracketStructure.

Raises:

ValueError – If the number of non-play-in seeds for season is not 64.

ncaa_eval.evaluation.build_probability_matrix(provider: ProbabilityProvider, team_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]

Build n×n pairwise win probability matrix.

Uses upper-triangle batch call, then fills P[j,i] = 1 - P[i,j] via the complementarity contract.

Parameters:
  • provider – Probability provider implementing the protocol.

  • team_ids – Team IDs in bracket order.

  • context – Matchup context.

Returns:

Float64 array of shape (n, n). Diagonal is zero.

ncaa_eval.evaluation.build_seed_prior_matrix(seed_map: dict[int, int], team_ids: Sequence[int]) ndarray[tuple[Any, ...], dtype[float64]][source]

Build an (n × n) seed prior probability matrix.

S[i, j] = P(team_i beats team_j) based on historical seed win rates keyed by |seed_i seed_j|.

Parameters:
  • seed_map – Maps team_id seed_number (1–16).

  • team_ids – Ordered team IDs matching matrix indices.

Returns:

Float64 matrix of shape (n, n) with diagonal zeros and S[i,j] + S[j,i] = 1.

ncaa_eval.evaluation.compute_advancement_probs(bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]]) ndarray[tuple[Any, ...], dtype[float64]][source]

Compute exact advancement probabilities via the Phylourny algorithm.

Post-order traversal of the bracket tree computing Win Probability Vectors (WPVs) at each internal node using the formula:

R = V (P^T · W) + W (P^T · V)

Parameters:
  • bracket – Tournament bracket structure.

  • P – Pairwise win probability matrix, shape (n, n).

Returns:

Advancement probabilities, shape (n, n_rounds). adv_probs[i, r] = P(team i wins their game in round r).

Raises:

ValueError – If n is not a power of 2 or does not match the bracket’s team count.

ncaa_eval.evaluation.compute_bracket_distribution(scores: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int = 50) BracketDistribution[source]

Compute score distribution statistics from raw MC scores.

Computes the 5th/25th/50th/75th/95th percentiles via np.percentile, builds a n_bins-bucket histogram via np.histogram, and wraps all statistics into a BracketDistribution.

Parameters:
  • scores – Raw per-simulation scores, shape (n_simulations,).

  • n_bins – Number of histogram bins (default 50).

Returns:

BracketDistribution with percentiles, mean, std, and histogram.

ncaa_eval.evaluation.compute_expected_points(adv_probs: ndarray[tuple[Any, ...], dtype[float64]], scoring_rule: ScoringRule) ndarray[tuple[Any, ...], dtype[float64]][source]

Compute Expected Points per team via matrix-vector multiply.

Parameters:
  • adv_probs – Advancement probabilities, shape (n, n_rounds).

  • scoring_rule – Scoring rule providing per-round point values.

Returns:

Expected Points per team, shape (n,).

ncaa_eval.evaluation.compute_expected_points_seed_diff(adv_probs: ndarray[tuple[Any, ...], dtype[float64]], bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]], seed_map: dict[int, int]) ndarray[tuple[Any, ...], dtype[float64]][source]

Compute Expected Points with seed-difference upset bonus.

Extends standard EP by adding per-matchup seed-diff bonus. For each internal bracket node at round r, the bonus contribution for team i beating opponent j is:

P(i reaches node) * P(i beats j) * P(j reaches node) * bonus(seed_i, seed_j)

where bonus = |seed_i - seed_j| when seed_i > seed_j (upset), else 0.

Uses SeedDiffBonusScoring base points for standard round points and a post-order traversal of the bracket tree (reusing WPVs from compute_advancement_probs() logic) for bonus computation.

Parameters:
  • adv_probs – Advancement probabilities, shape (n, n_rounds).

  • bracket – Tournament bracket structure (for tree traversal).

  • P – Pairwise win probability matrix, shape (n, n).

  • seed_map – Mapping of team_id seed_num.

Returns:

Expected Points per team, shape (n,), including base + bonus.

ncaa_eval.evaluation.compute_most_likely_bracket(bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]]) MostLikelyBracket[source]

Compute the maximum-likelihood bracket via greedy traversal.

At each internal node, picks the team with the higher win probability (argmax(P[left, right])). Returns the full bracket of winners and the log-likelihood of the chosen bracket.

The winners array is in round-major order — the same order as SimulationResult.sim_winners rows — so it can be passed directly to score_bracket_against_sims(): all Round-of-64 games first (indices 0–31), then Round-of-32 (32–47), through to the championship game (index 62).

Parameters:
  • bracket – Tournament bracket structure.

  • P – Pairwise win probability matrix, shape (n, n).

Returns:

MostLikelyBracket with winners, champion, and log-likelihood.

ncaa_eval.evaluation.default_metrics() dict[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]][source]

Return all registered metric functions (built-in + user-registered).

ncaa_eval.evaluation.expected_calibration_error(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) float[source]

Compute Expected Calibration Error (ECE) using vectorized numpy.

ECE measures how well predicted probabilities match observed frequencies. Predictions are binned into n_bins equal-width bins on [0, 1], and ECE is the weighted average of per-bin |accuracy - confidence| gaps.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

  • n_bins – Number of equal-width bins (default 10).

Returns:

ECE value in [0, 1] (lower is better).

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.feature_cols(df: DataFrame) list[str][source]

Return feature column names (everything not in METADATA_COLS).

Parameters:

df – DataFrame whose columns are inspected.

Returns:

List of column names that are not metadata.

ncaa_eval.evaluation.format_kaggle_submission(season: int, team_ids: Sequence[int], prob_matrix: ndarray[tuple[Any, ...], dtype[float64]]) str[source]

Format a probability matrix as a Kaggle submission CSV string.

Parameters:
  • season – Tournament season year (e.g. 2025).

  • team_ids – Team IDs corresponding to matrix rows/columns.

  • prob_matrix – n×n pairwise probability matrix where P[i,j] is P(team_ids[i] beats team_ids[j]).

Returns:

CSV string with header ID,Pred and C(n,2) data rows.

Raises:

ValueError – If the matrix shape doesn’t match the team count.

ncaa_eval.evaluation.get_metric(name: str) Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float][source]

Return the metric function registered under name.

Raises:

MetricNotFoundError – If name is not registered.

ncaa_eval.evaluation.get_scoring(name: str) type[source]

Return the scoring class registered under name.

Raises:

ScoringNotFoundError – If name is not registered.

ncaa_eval.evaluation.list_metrics() list[str][source]

Return all registered metric names (sorted).

ncaa_eval.evaluation.list_scoring_display_names() dict[str, str][source]

Return a mapping of registry keys to display names.

Returns:

Dict mapping scoring name → human-readable display name.

ncaa_eval.evaluation.list_scorings() list[str][source]

Return all registered scoring names (sorted).

ncaa_eval.evaluation.log_loss(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]

Compute Log Loss (cross-entropy loss) for binary predictions.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

Returns:

Log Loss value.

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.perturb_probability_matrix(P: ndarray[tuple[Any, ...], dtype[float64]], seed_map: dict[int, int], team_ids: Sequence[int], temperature: float = 1.0, seed_weight: float = 0.0) ndarray[tuple[Any, ...], dtype[float64]][source]

Apply game-theory slider perturbation to a pairwise probability matrix.

Applies two independent transformations in sequence:

  1. Temperature scaling: p' = p^(1/T) / (p^(1/T) + (1-p)^(1/T))

  2. Seed blend: p'' = (1-w)*p' + w*p_seed_prior

Parameters:
  • P – Square probability matrix where P[i,j] is the probability that team i beats team j. Must satisfy P[i,j] + P[j,i] = 1 and P[i,i] = 0.

  • seed_map – Maps team_id seed_number (1–16).

  • team_ids – Ordered team IDs matching matrix indices.

  • temperature – Controls upset/chalk spectrum. T > 1 = more upsets, T < 1 = more chalk, T = 1 = neutral.

  • seed_weight – Controls model/seed blend. 0 = pure model, 1 = pure seed prior.

Returns:

Perturbed matrix of same shape satisfying complementarity.

Raises:

ValueError – If temperature ≤ 0 or seed_weight not in [0, 1].

ncaa_eval.evaluation.plot_advancement_heatmap(result: SimulationResult, team_labels: Mapping[int, str] | None = None) Figure[source]

Heatmap of per-team advancement probabilities by round.

Parameters:
  • result – Simulation result with advancement_probs array.

  • team_labels – Optional mapping of team index (0..n-1, bracket position order) to display name. When None, team indices are shown as-is. Note: keys are bracket indices, not canonical team IDs — use BracketStructure.team_index_map to translate from team IDs to indices before passing this argument.

Returns:

Interactive Plotly Figure showing a heatmap with teams on y-axis and rounds on x-axis.

ncaa_eval.evaluation.plot_backtest_summary(result: BacktestResult, *, metrics: Sequence[str] | None = None) Figure[source]

Per-year metric values from a backtest result.

Parameters:
  • result – Backtest result containing the summary DataFrame.

  • metrics – Metric column names to include. Defaults to all metric columns (excludes elapsed_seconds).

Returns:

Interactive Plotly Figure with one line per metric, x=year.

ncaa_eval.evaluation.plot_metric_comparison(results: Mapping[str, BacktestResult], metric: str) Figure[source]

Multi-model overlay: one line per model for a given metric across years.

Parameters:
  • results – Mapping of model name to BacktestResult.

  • metric – Metric column name to compare.

Returns:

Interactive Plotly Figure with one line per model.

ncaa_eval.evaluation.plot_reliability_diagram(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10, title: str | None = None) Figure[source]

Reliability diagram: predicted vs. actual probability with bin counts.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

  • n_bins – Number of calibration bins (default 10).

  • title – Optional figure title.

Returns:

Interactive Plotly Figure with calibration curve, diagonal reference, and bar overlay of per-bin sample counts.

ncaa_eval.evaluation.plot_score_distribution(dist: BracketDistribution, *, title: str | None = None) Figure[source]

Histogram of bracket score distribution with percentile markers.

Parameters:
  • dist – Bracket distribution with pre-computed histogram data and percentile values.

  • title – Optional figure title.

Returns:

Interactive Plotly Figure with histogram bars and vertical percentile lines at 5th, 25th, 50th, 75th, and 95th.

ncaa_eval.evaluation.power_transform(P: ndarray[tuple[Any, ...], dtype[float64]], temperature: float) ndarray[tuple[Any, ...], dtype[float64]][source]

Apply power/temperature scaling to a probability matrix.

Computes p' = p^(1/T) / (p^(1/T) + (1-p)^(1/T)) element-wise.

Properties:
  • T=1.0: identity (p’ = p).

  • T>1: compresses probabilities toward 0.5 (more upsets).

  • T<1: sharpens probabilities away from 0.5 (more chalk).

  • Preserves p=0, p=1, p=0.5 as fixed points.

  • Preserves diagonal zeros.

  • Preserves complementarity: p'[i,j] + p'[j,i] = 1.

Parameters:
  • P – Square probability matrix with diagonal zeros.

  • temperature – Temperature T > 0.

Returns:

Transformed matrix with same shape and dtype.

Raises:

ValueError – If temperature is not positive.

ncaa_eval.evaluation.register_metric(name: str) Callable[[_MF], _MF][source]

Function decorator that registers a metric function.

Parameters:

name – Registry key for the metric.

Returns:

Decorator that registers the function and returns it unchanged.

Raises:

ValueError – If name is already registered.

ncaa_eval.evaluation.register_scoring(name: str, *, display_name: str | None = None) Callable[[_ST], _ST][source]

Class decorator that registers a scoring rule class.

Parameters:
  • name – Registry key for the scoring rule.

  • display_name – Optional human-readable label for UI display. Falls back to name if not provided.

Returns:

Decorator that registers the class and returns it unchanged.

Raises:

ValueError – If name is already registered.

ncaa_eval.evaluation.reliability_diagram_data(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) ReliabilityData[source]

Generate reliability diagram data for calibration visualization.

Uses sklearn.calibration.calibration_curve for bin statistics and augments with per-bin sample counts.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

  • n_bins – Number of bins (default 10).

Returns:

Structured data containing fraction of positives, mean predicted values, bin counts, bin edges, and requested number of bins.

Raises:

ValueError – If inputs are empty, mismatched, n_bins < 1, or probabilities are outside [0, 1].

ncaa_eval.evaluation.roc_auc(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]

Compute ROC-AUC for binary predictions.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

Returns:

ROC-AUC value.

Raises:

ValueError – If inputs are empty, mismatched, probabilities are outside [0, 1], or y_true contains only one class (AUC is undefined).

ncaa_eval.evaluation.run_backtest(model: Model, feature_server: StatefulFeatureServer, *, seasons: Sequence[int], mode: Literal['batch', 'stateful'] = 'batch', n_jobs: int = -1, metric_fns: Mapping[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]] | None = None, console: Console | None = None, progress: bool = False) BacktestResult[source]

Run parallelized walk-forward cross-validation backtest.

Parameters:
  • model – Model instance to evaluate (will be deep-copied per fold).

  • feature_server – Configured feature server for building CV folds.

  • seasons – Season years to include (passed to walk_forward_splits).

  • mode – Feature serving mode ("batch" or "stateful").

  • n_jobs – Number of parallel workers. -1 = all cores, 1 = sequential.

  • metric_fns – Metric functions to compute per fold. Defaults to {log_loss, brier_score, roc_auc, expected_calibration_error}.

  • console – Rich Console for progress output.

  • progress – Display a tqdm progress bar for fold evaluation. Most useful with n_jobs=1 (sequential execution).

Returns:

BacktestResult with per-fold results and summary DataFrame.

Raises:

ValueError – If mode is not "batch" or "stateful", or if seasons contains fewer than 2 elements (propagated from walk_forward_splits()).

ncaa_eval.evaluation.score_bracket_against_sims(chosen_bracket: ndarray[tuple[Any, ...], dtype[int32]], sim_winners: ndarray[tuple[Any, ...], dtype[int32]], scoring_rules: Sequence[ScoringRule]) dict[str, ndarray[tuple[Any, ...], dtype[float64]]][source]

Score a chosen bracket against each simulated tournament outcome.

Broadcasts chosen_bracket across all simulations to build a boolean match matrix (sim_winners == chosen_bracket[None, :]). For each scoring rule, constructs a per-game point vector by iterating rounds with a running game_offset, then computes per-sim scores as (matches * game_points).sum(axis=1) — one vectorized dot product per rule, no Python loop over simulations.

Parameters:
  • chosen_bracket – Game winners for the chosen bracket, shape (n_games,).

  • sim_winners – Per-simulation game winners, shape (n_simulations, n_games).

  • scoring_rules – Scoring rules to score against.

Returns:

Mapping of rule_name per-sim scores, each shape (n_simulations,).

ncaa_eval.evaluation.scoring_from_config(config: dict[str, Any]) ScoringRule[source]

Create a scoring rule from a configuration dict.

Dispatches on config["type"]:

Parameters:

config – Configuration dict with at least a "type" key.

Returns:

Instantiated scoring rule.

Raises:

ValueError – If type is unknown or required keys are missing.

ncaa_eval.evaluation.simulate_tournament(bracket: BracketStructure, probability_provider: ProbabilityProvider, context: MatchupContext, scoring_rules: Sequence[ScoringRule] | None = None, method: str = 'analytical', n_simulations: int = 10000, rng: Generator | None = None, progress: bool = False, progress_callback: Callable[[int, int], None] | None = None) SimulationResult[source]

High-level tournament simulation orchestrator.

Dispatches to analytical (Phylourny) or Monte Carlo path based on method.

Parameters:
  • bracket – Tournament bracket structure.

  • probability_provider – Provider for pairwise win probabilities.

  • context – Matchup context (season, day_num, neutral).

  • scoring_rules – Scoring rules for EP computation. Defaults to StandardScoring only.

  • method"analytical" (default) or "monte_carlo".

  • n_simulations – Number of MC simulations (ignored for analytical).

  • rng – NumPy random generator (MC only).

  • progress – Display a tqdm progress bar for MC simulation rounds. Ignored when method="analytical".

  • progress_callback – Optional callback invoked after each MC round with (round_completed, total_rounds). Ignored when method="analytical".

Returns:

SimulationResult.

Raises:

ValueError – If method is not "analytical" or "monte_carlo", or if MC is requested with n_simulations < 100.

ncaa_eval.evaluation.simulate_tournament_mc(bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]], scoring_rules: Sequence[ScoringRule], season: int, n_simulations: int = 10000, rng: Generator | None = None, progress: bool = False, progress_callback: Callable[[int, int], None] | None = None) SimulationResult[source]

Vectorized Monte Carlo tournament simulation.

All N simulations run in parallel per round (no per-sim Python loops). Pre-generates random numbers and uses fancy indexing for batch outcome determination.

Parameters:
  • bracket – Tournament bracket structure (64 teams).

  • P – Pairwise win probability matrix, shape (n, n).

  • scoring_rules – Scoring rules to compute scores for.

  • season – Tournament season year.

  • n_simulations – Number of simulations (default 10,000).

  • rng – NumPy random generator for reproducibility.

  • progress – Display a tqdm progress bar for simulation rounds.

  • progress_callback – Optional callback invoked after each round with (round_completed, total_rounds). UI-agnostic hook for external progress reporting (e.g. Streamlit st.progress).

Returns:

SimulationResult with MC-derived advancement probs, expected points, and score distributions.

Raises:

ValueError – If n_simulations < 100.

ncaa_eval.evaluation.slider_to_temperature(slider_value: int) float[source]

Map an integer slider value to a temperature parameter.

Parameters:

slider_value – Integer in [-5, +5].

Returns:

T = 2^(slider_value / 3). T=1.0 at slider_value=0 (neutral).

Raises:

ValueError – If slider_value is outside [-5, +5].

ncaa_eval.evaluation.walk_forward_splits(seasons: Sequence[int], feature_server: StatefulFeatureServer, *, mode: Literal['batch', 'stateful'] = 'batch') Iterator[CVFold][source]

Generate walk-forward CV folds with Leave-One-Tournament-Out splits.

Parameters:
  • seasons – Ordered sequence of season years to include (e.g., range(2008, 2026)). Must contain at least 2 seasons.

  • feature_server – Configured StatefulFeatureServer for building feature matrices.

  • mode – Feature serving mode: "batch" (stateless models) or "stateful" (sequential-update models like Elo).

Yields:

CVFold – For each eligible test year (skipping no-tournament years like 2020): train contains all games from seasons strictly before the test year; test contains only tournament games from the test year; year is the test season year.

Raises:

ValueError – If seasons has fewer than 2 elements, or if mode is not "batch" or "stateful".