ncaa_eval.evaluation package¶

Submodules¶

Module contents¶

Evaluation metrics, cross-validation, and tournament simulation module.

class ncaa_eval.evaluation.BacktestResult(fold_results: tuple[FoldResult, ...], summary: DataFrame, elapsed_seconds: float)[source]¶

Bases: object

Aggregated result of a full backtest across all folds.

fold_results¶

Per-fold evaluation results, sorted by year.

Type:: tuple[ncaa_eval.evaluation.backtest.FoldResult, …]

summary¶

DataFrame with year as index, metric columns + elapsed_seconds.

Type:: pandas.core.frame.DataFrame

elapsed_seconds¶

Total wall-clock time for the entire backtest.

Type:: float

elapsed_seconds: float¶

fold_results: tuple[FoldResult, ...]¶

summary: DataFrame¶

class ncaa_eval.evaluation.BracketDistribution(scores: ndarray[tuple[Any, ...], dtype[float64]], percentiles: dict[int, float], mean: float, std: float, histogram_bins: ndarray[tuple[Any, ...], dtype[float64]], histogram_counts: ndarray[tuple[Any, ...], dtype[int64]])[source]¶

Bases: object

Score distribution statistics from Monte Carlo simulation.

scores¶

Raw per-simulation scores, shape (n_simulations,).

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

percentiles¶

Mapping of percentile → value for keys 5, 25, 50, 75, 95.

Type:: dict[int, float]

mean¶

Mean score across simulations.

Type:: float

std¶

Standard deviation of scores.

Type:: float

histogram_bins¶

Histogram bin edges, shape (n_bins + 1,).

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

histogram_counts¶

Histogram counts, shape (n_bins,).

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int64]]

histogram_bins: ndarray[tuple[Any, ...], dtype[float64]]¶

histogram_counts: ndarray[tuple[Any, ...], dtype[int64]]¶

mean: float¶

percentiles: dict[int, float]¶

scores: ndarray[tuple[Any, ...], dtype[float64]]¶

std: float¶

class ncaa_eval.evaluation.BracketNode(round_index: int, team_index: int = -1, left: BracketNode | None = None, right: BracketNode | None = None)[source]¶

Bases: object

Node in a tournament bracket tree.

A leaf node represents a single team; an internal node represents a game whose winner advances.

round_index¶

Round number (0-indexed). Leaves have round_index=-1.

Type:: int

team_index¶

Index into the bracket’s team_ids tuple for leaf nodes. -1 for internal nodes.

Type:: int

left¶

Left child (None for leaves).

Type:: ncaa_eval.evaluation.bracket.BracketNode | None

right¶

Right child (None for leaves).

Type:: ncaa_eval.evaluation.bracket.BracketNode | None

property is_leaf: bool¶: Return True if this is a leaf (team) node.

left: BracketNode | None = None¶

right: BracketNode | None = None¶

round_index: int¶

team_index: int = -1¶

class ncaa_eval.evaluation.BracketStructure(root: ~ncaa_eval.evaluation.bracket.BracketNode, team_ids: tuple[int, ...], team_index_map: dict[int, int], seed_map: dict[int, int] = <factory>)[source]¶

Bases: object

Immutable tournament bracket.

root¶

Root BracketNode of the bracket tree.

Type:: ncaa_eval.evaluation.bracket.BracketNode

team_ids¶

Tuple of team IDs in bracket-position order (leaf order).

Type:: tuple[int, …]

team_index_map¶

Mapping of team_id → index into team_ids.

Type:: dict[int, int]

seed_map¶

Mapping of team_id → seed_num for seed-aware scoring.

Type:: dict[int, int]

root: BracketNode¶

seed_map: dict[int, int]¶

team_ids: tuple[int, ...]¶

team_index_map: dict[int, int]¶

class ncaa_eval.evaluation.CVFold(train: DataFrame, test: DataFrame, year: int)[source]¶

Bases: object

A single cross-validation fold.

train¶

All games from seasons strictly before the test year.

Type:: pandas.core.frame.DataFrame

test¶

Tournament games only from the test year.

Type:: pandas.core.frame.DataFrame

year¶

The test season year.

Type:: int

test: DataFrame¶

train: DataFrame¶

year: int¶

class ncaa_eval.evaluation.CustomScoring(scoring_fn: Callable[[int], float], scoring_name: str)[source]¶

Bases: object

User-defined scoring rule wrapping a callable.

Parameters:

scoring_fn – Callable mapping round_idx → points.
scoring_name – Name for this custom rule.

property name: str¶: Return the custom rule name.

points_per_round(round_idx: int) → float[source]¶: Return points from the wrapped callable.

class ncaa_eval.evaluation.DictScoring(points: dict[int, float], scoring_name: str)[source]¶

Bases: object

Scoring rule from a dict mapping round_idx to points.

Parameters:

points – Mapping of round_idx → points for rounds 0–5.
scoring_name – Name for this rule.

Raises:

ValueError – If points does not contain exactly 6 entries (rounds 0–5).

property name: str¶: Return the rule name.

points_per_round(round_idx: int) → float[source]¶: Return points for round_idx.

class ncaa_eval.evaluation.EloProvider(model: Any)[source]¶

Bases: object

Wraps a StatefulModel as a ProbabilityProvider.

Uses the model’s predict_matchup method for probability computation.

Parameters:: model – Any StatefulModel instance with predict_matchup.

batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Return batch probabilities by looping predict_matchup.

Iterates team pairs, calling predict_matchup per matchup, and collects results into a list.

Elo is O(1) per pair so looping is acceptable.

matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) → float[source]¶

Return P(team_a beats team_b) via the model’s predict_matchup.

Delegates to the model’s predict_matchup method, which retrieves both teams’ current ratings and applies the Elo logistic expected-score formula.

class ncaa_eval.evaluation.EnsembleProvider(ensemble: StackedEnsemble, data_dir: Path, season: int)[source]¶

Bases: object

Wraps a StackedEnsemble as a ProbabilityProvider.

Calls ensemble.predict_bracket(data_dir, season) once on first use and caches the result as a MatrixProvider for subsequent lookups. This allows a StackedEnsemble to be passed to build_probability_matrix() and the Monte Carlo bracket simulator identically to single-model mode.

Parameters:

ensemble – A trained StackedEnsemble instance.
data_dir – Path to the local Parquet data store.
season – Target season year.

batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Return batch probabilities from the cached ensemble matrix.

Triggers ensemble bracket prediction on first call; subsequent calls use the cached matrix.

matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) → float[source]¶

Return P(team_a beats team_b) from the ensemble probability matrix.

Triggers ensemble bracket prediction on first call; subsequent calls use the cached matrix.

class ncaa_eval.evaluation.FibonacciScoring[source]¶

Bases: object

Fibonacci-style scoring: 2-3-5-8-13-21 (231 total for perfect bracket).

property name: str¶: Return 'fibonacci'.

points_per_round(round_idx: int) → float[source]¶: Return Fibonacci scoring points for round_idx.

class ncaa_eval.evaluation.FoldResult(year: int, predictions: ~pandas.core.series.Series, actuals: ~pandas.core.series.Series, metrics: ~collections.abc.Mapping[str, float], elapsed_seconds: float, test_game_ids: ~pandas.core.series.Series = <factory>, test_team_a_ids: ~pandas.core.series.Series = <factory>, test_team_b_ids: ~pandas.core.series.Series = <factory>)[source]¶

Bases: object

Result of evaluating a single cross-validation fold.

year¶

The test season year for this fold.

Type:: int

predictions¶

Predicted probabilities for tournament games.

Type:: pandas.core.series.Series

actuals¶

Actual binary outcomes for tournament games.

Type:: pandas.core.series.Series

metrics¶

Mapping of metric name to computed value.

Type:: collections.abc.Mapping[str, float]

elapsed_seconds¶

Wall-clock time for the fold evaluation.

Type:: float

test_game_ids¶

Game IDs from the test fold (aligned to predictions).

Type:: pandas.core.series.Series

test_team_a_ids¶

team_a IDs from the test fold.

Type:: pandas.core.series.Series

test_team_b_ids¶

team_b IDs from the test fold.

Type:: pandas.core.series.Series

actuals: Series¶

elapsed_seconds: float¶

metrics: Mapping[str, float]¶

predictions: Series¶

test_game_ids: Series¶

test_team_a_ids: Series¶

test_team_b_ids: Series¶

year: int¶

class ncaa_eval.evaluation.MatchupContext(season: int, day_num: int, is_neutral: bool)[source]¶

Bases: object

Context for a hypothetical matchup probability query.

Passed to ProbabilityProvider so that stateless models can construct the correct feature row for a hypothetical pairing. Stateful models (Elo) typically ignore context and use internal ratings.

season¶

Tournament season year (e.g. 2024).

Type:: int

day_num¶

Tournament day number (e.g. 136 for Round of 64).

Type:: int

is_neutral¶

True for all tournament games (neutral site).

Type:: bool

day_num: int¶

is_neutral: bool¶

season: int¶

class ncaa_eval.evaluation.MatrixProvider(prob_matrix: ndarray[tuple[Any, ...], dtype[float64]], team_ids: Sequence[int])[source]¶

Bases: object

Wraps a pre-computed probability matrix as a ProbabilityProvider.

Parameters:

prob_matrix – n×n pairwise probability matrix.
team_ids – Sequence of team IDs matching matrix indices.

batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Return batch probabilities from the stored matrix.

Extracts row/column indices from the team pairs, vectorizes lookups into the probability matrix, and returns a list of win probabilities.

matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) → float[source]¶

Return P(team_a beats team_b) from the stored matrix.

Indexes into the pre-built probability matrix using the team-to-index mapping, returning P(team_i beats team_j) directly from the stored array.

exception ncaa_eval.evaluation.MetricNotFoundError[source]¶

Bases: KeyError

Raised when a requested metric name is not in the registry.

class ncaa_eval.evaluation.MostLikelyBracket(winners: tuple[int, ...], champion_team_id: int, log_likelihood: float)[source]¶

Bases: object

Maximum-likelihood bracket from greedy traversal.

winners¶

Tuple of team indices for each game’s predicted winner, in round-major order matching SimulationResult.sim_winners rows — all Round-of-64 games first (indices 0–31 for 64 teams), then Round-of-32 (32–47), through to the championship (index 62). 63 entries for a 64-team bracket. Pass directly to score_bracket_against_sims() as chosen_bracket.

Type:: tuple[int, …]

champion_team_id¶

Canonical team ID of the predicted champion (from BracketStructure.team_ids[champion_index]).

Type:: int

log_likelihood¶

Sum of log(max(P[left, right], P[right, left])) across all games.

Type:: float

champion_team_id: int¶

log_likelihood: float¶

winners: tuple[int, ...]¶

class ncaa_eval.evaluation.ProbabilityProvider(*args, **kwargs)[source]¶

Bases: Protocol

Protocol for pairwise win probability computation.

All implementations must satisfy the complementarity contract: P(A beats B) + P(B beats A) = 1 for every (A, B) pair.

batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Return P(a_i beats b_i) for all pairs.

Parameters:

team_a_ids – Sequence of first-team IDs.
team_b_ids – Sequence of second-team IDs (same length).
context – Matchup context.

Returns:

1-D float64 array of shape (len(team_a_ids),).

matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) → float[source]¶

Return P(team_a beats team_b).

Parameters:

team_a_id – First team’s canonical ID.
team_b_id – Second team’s canonical ID.
context – Matchup context (season, day_num, neutral).

Returns:

Probability in [0, 1].

class ncaa_eval.evaluation.ReliabilityData(fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]], mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]], bin_counts: ndarray[tuple[Any, ...], dtype[int64]], bin_edges: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int)[source]¶

Bases: object

Structured return type for reliability diagram data.

fraction_of_positives¶

Observed fraction of positives per bin (from calibration_curve).

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

mean_predicted_value¶

Mean predicted probability per bin (from calibration_curve).

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

bin_counts¶

Number of samples in each non-empty bin.

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int64]]

bin_edges¶

Full bin edge array of shape (n_bins + 1,), i.e. np.linspace(0.0, 1.0, n_bins + 1). Includes both the lower (0.0) and upper (1.0) boundaries so callers do not need to recompute them.

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

n_bins¶

Requested number of bins.

Type:: int

bin_counts: ndarray[tuple[Any, ...], dtype[int64]]¶

bin_edges: ndarray[tuple[Any, ...], dtype[float64]]¶

fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]]¶

mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]]¶

n_bins: int¶

exception ncaa_eval.evaluation.ScoringNotFoundError[source]¶

Bases: KeyError

Raised when a requested scoring name is not in the registry.

class ncaa_eval.evaluation.ScoringRule(*args, **kwargs)[source]¶

Bases: Protocol

Protocol for tournament bracket scoring rules.

property name: str¶: Human-readable name of the scoring rule.

points_per_round(round_idx: int) → float[source]¶

Return points awarded for a correct pick in round round_idx.

Parameters:: round_idx – Zero-indexed round number (0=R64 through 5=NCG).
Returns:: Points as a float.

class ncaa_eval.evaluation.SeedDiffBonusScoring(seed_map: dict[int, int])[source]¶

Bases: object

Base points + seed-difference bonus when lower seed wins.

Uses same base as StandardScoring (1-2-4-8-16-32). When the lower seed (higher seed number) wins, adds |seed_a - seed_b| bonus.

Note: This scoring rule’s points_per_round returns only the base points. Full EP computation for seed-diff scoring (which requires per-matchup seed information) is deferred to Story 6.6, which will add a dedicated compute_expected_points_seed_diff function.

Parameters:: seed_map – Mapping of team_id → seed_num.

property name: str¶: Return 'seed_diff_bonus'.

points_per_round(round_idx: int) → float[source]¶: Return base points (excludes seed-diff bonus).

seed_diff_bonus(seed_a: int, seed_b: int) → float[source]¶

Return bonus points when the lower seed wins.

Parameters:

seed_a – Winner’s seed number.
seed_b – Loser’s seed number.

Returns:

|seed_a - seed_b| if winner has higher seed number (lower seed = upset), else 0.

property seed_map: dict[int, int]¶: Return the seed lookup map.

class ncaa_eval.evaluation.SimulationResult(season: int, advancement_probs: ndarray[tuple[Any, ...], dtype[float64]], expected_points: dict[str, ndarray[tuple[Any, ...], dtype[float64]]], method: str, n_simulations: int | None, confidence_intervals: dict[str, tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]] | None, score_distribution: dict[str, ndarray[tuple[Any, ...], dtype[float64]]] | None, bracket_distributions: dict[str, BracketDistribution] | None = None, sim_winners: ndarray[tuple[Any, ...], dtype[int32]] | None = None)[source]¶

Bases: object

Result of tournament simulation for one season.

Both the analytical path and MC path produce a SimulationResult.

season¶

Tournament season year.

Type:: int

advancement_probs¶

Per-team advancement probabilities, shape (n_teams, n_rounds).

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

expected_points¶

Mapping of scoring_rule_name → per-team EP, each shape (n_teams,).

Type:: dict[str, numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]]

method¶

"analytical" or "monte_carlo".

Type:: str

n_simulations¶

None for analytical; N for MC.

Type:: int | None

confidence_intervals¶

Optional mapping of rule_name → (lower, upper) arrays.

Type:: dict[str, tuple[numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]]] | None

score_distribution¶

Optional mapping of rule_name → per-sim scores array, shape (n_simulations,).

Type:: dict[str, numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]] | None

bracket_distributions¶

Optional mapping of rule_name → BracketDistribution (MC only; None for analytical). Note: distributions are computed from the chalk-bracket score (how many pre-game favorites won). For pool scoring analysis (“how would my chosen bracket score across all simulations?”), use sim_winners with score_bracket_against_sims().

Type:: dict[str, ncaa_eval.evaluation.simulation.BracketDistribution] | None

sim_winners¶

Optional array of per-simulation game winners, shape (n_simulations, n_games) (MC only; None for analytical).

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int32]] | None

advancement_probs: ndarray[tuple[Any, ...], dtype[float64]]¶

bracket_distributions: dict[str, BracketDistribution] | None = None¶

confidence_intervals: dict[str, tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]] | None¶

expected_points: dict[str, ndarray[tuple[Any, ...], dtype[float64]]]¶

method: str¶

n_simulations: int | None¶

score_distribution: dict[str, ndarray[tuple[Any, ...], dtype[float64]]] | None¶

season: int¶

sim_winners: ndarray[tuple[Any, ...], dtype[int32]] | None = None¶

class ncaa_eval.evaluation.StandardScoring[source]¶

Bases: object

ESPN-style scoring: 1-2-4-8-16-32 (192 total for perfect bracket).

property name: str¶: Return 'standard'.

points_per_round(round_idx: int) → float[source]¶: Return standard scoring points for round_idx.

ncaa_eval.evaluation.brier_score(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) → float[source]¶

Compute Brier Score for binary predictions.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.

Returns:

Brier Score value (lower is better).

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.build_bracket(seeds: list[TourneySeed], season: int) → BracketStructure[source]¶

Construct a 64-team bracket tree from tournament seeds.

Play-in teams (is_play_in=True) are excluded. Exactly 64 non-play-in seeds are required.

Parameters:

seeds – List of TourneySeed objects for the given season.
season – Season year to filter seeds.

Returns:

Fully constructed BracketStructure.

Raises:

ValueError – If the number of non-play-in seeds for season is not 64.

ncaa_eval.evaluation.build_probability_matrix(provider: ProbabilityProvider, team_ids: Sequence[int], context: MatchupContext) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Build n×n pairwise win probability matrix.

Uses upper-triangle batch call, then fills P[j,i] = 1 - P[i,j] via the complementarity contract.

Parameters:

provider – Probability provider implementing the protocol.
team_ids – Team IDs in bracket order.
context – Matchup context.

Returns:

Float64 array of shape (n, n). Diagonal is zero.

ncaa_eval.evaluation.build_seed_prior_matrix(seed_map: dict[int, int], team_ids: Sequence[int]) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Build an (n × n) seed prior probability matrix.

S[i, j] = P(team_i beats team_j) based on historical seed win rates keyed by |seed_i − seed_j|.

Parameters:

seed_map – Maps team_id → seed_number (1–16).
team_ids – Ordered team IDs matching matrix indices.

Returns:

Float64 matrix of shape (n, n) with diagonal zeros and S[i,j] + S[j,i] = 1.

ncaa_eval.evaluation.compute_advancement_probs(bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]]) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Compute exact advancement probabilities via the Phylourny algorithm.

Post-order traversal of the bracket tree computing Win Probability Vectors (WPVs) at each internal node using the formula:

R = V ⊙ (P^T · W) + W ⊙ (P^T · V)

Parameters:

bracket – Tournament bracket structure.
P – Pairwise win probability matrix, shape (n, n).

Returns:

Advancement probabilities, shape (n, n_rounds). adv_probs[i, r] = P(team i wins their game in round r).

Raises:

ValueError – If n is not a power of 2 or does not match the bracket’s team count.

ncaa_eval.evaluation.compute_bracket_distribution(scores: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int = 50) → BracketDistribution[source]¶

Compute score distribution statistics from raw MC scores.

Computes the 5th/25th/50th/75th/95th percentiles via np.percentile, builds a n_bins-bucket histogram via np.histogram, and wraps all statistics into a BracketDistribution.

Parameters:

scores – Raw per-simulation scores, shape (n_simulations,).
n_bins – Number of histogram bins (default 50).

Returns:

BracketDistribution with percentiles, mean, std, and histogram.

ncaa_eval.evaluation.compute_expected_points(adv_probs: ndarray[tuple[Any, ...], dtype[float64]], scoring_rule: ScoringRule) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Compute Expected Points per team via matrix-vector multiply.

Parameters:

adv_probs – Advancement probabilities, shape (n, n_rounds).
scoring_rule – Scoring rule providing per-round point values.

Returns:

Expected Points per team, shape (n,).

ncaa_eval.evaluation.compute_expected_points_seed_diff(adv_probs: ndarray[tuple[Any, ...], dtype[float64]], bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]], seed_map: dict[int, int]) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Compute Expected Points with seed-difference upset bonus.

Extends standard EP by adding per-matchup seed-diff bonus. For each internal bracket node at round r, the bonus contribution for team i beating opponent j is:

P(i reaches node) * P(i beats j) * P(j reaches node) * bonus(seed_i, seed_j)

where bonus = |seed_i - seed_j| when seed_i > seed_j (upset), else 0.

Uses SeedDiffBonusScoring base points for standard round points and a post-order traversal of the bracket tree (reusing WPVs from compute_advancement_probs() logic) for bonus computation.

Parameters:

adv_probs – Advancement probabilities, shape (n, n_rounds).
bracket – Tournament bracket structure (for tree traversal).
P – Pairwise win probability matrix, shape (n, n).
seed_map – Mapping of team_id → seed_num.

Returns:

Expected Points per team, shape (n,), including base + bonus.

ncaa_eval.evaluation.compute_most_likely_bracket(bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]]) → MostLikelyBracket[source]¶

Compute the maximum-likelihood bracket via greedy traversal.

At each internal node, picks the team with the higher win probability (argmax(P[left, right])). Returns the full bracket of winners and the log-likelihood of the chosen bracket.

The winners array is in round-major order — the same order as SimulationResult.sim_winners rows — so it can be passed directly to score_bracket_against_sims(): all Round-of-64 games first (indices 0–31), then Round-of-32 (32–47), through to the championship game (index 62).

Parameters:

bracket – Tournament bracket structure.
P – Pairwise win probability matrix, shape (n, n).

Returns:

MostLikelyBracket with winners, champion, and log-likelihood.

ncaa_eval.evaluation.default_metrics() → dict[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]][source]¶: Return all registered metric functions (built-in + user-registered).

ncaa_eval.evaluation.expected_calibration_error(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) → float[source]¶

Compute Expected Calibration Error (ECE) using vectorized numpy.

ECE measures how well predicted probabilities match observed frequencies. Predictions are binned into n_bins equal-width bins on [0, 1], and ECE is the weighted average of per-bin |accuracy - confidence| gaps.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of equal-width bins (default 10).

Returns:

ECE value in [0, 1] (lower is better).

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.feature_cols(df: DataFrame) → list[str][source]¶

Return feature column names (everything not in METADATA_COLS).

Parameters:: df – DataFrame whose columns are inspected.
Returns:: List of column names that are not metadata.

ncaa_eval.evaluation.format_kaggle_submission(season: int, team_ids: Sequence[int], prob_matrix: ndarray[tuple[Any, ...], dtype[float64]]) → str[source]¶

Format a probability matrix as a Kaggle submission CSV string.

Parameters:

season – Tournament season year (e.g. 2025).
team_ids – Team IDs corresponding to matrix rows/columns.
prob_matrix – n×n pairwise probability matrix where P[i,j] is P(team_ids[i] beats team_ids[j]).

Returns:

CSV string with header ID,Pred and C(n,2) data rows.

Raises:

ValueError – If the matrix shape doesn’t match the team count.

ncaa_eval.evaluation.get_metric(name: str) → Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float][source]¶

Return the metric function registered under name.

Raises:: MetricNotFoundError – If name is not registered.

ncaa_eval.evaluation.get_scoring(name: str) → type[source]¶

Return the scoring class registered under name.

Raises:: ScoringNotFoundError – If name is not registered.

ncaa_eval.evaluation.list_metrics() → list[str][source]¶: Return all registered metric names (sorted).

ncaa_eval.evaluation.list_scoring_display_names() → dict[str, str][source]¶

Return a mapping of registry keys to display names.

Returns:: Dict mapping scoring name → human-readable display name.

ncaa_eval.evaluation.list_scorings() → list[str][source]¶: Return all registered scoring names (sorted).

ncaa_eval.evaluation.log_loss(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) → float[source]¶

Compute Log Loss (cross-entropy loss) for binary predictions.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.

Returns:

Log Loss value.

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.perturb_probability_matrix(P: ndarray[tuple[Any, ...], dtype[float64]], seed_map: dict[int, int], team_ids: Sequence[int], temperature: float = 1.0, seed_weight: float = 0.0) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Apply game-theory slider perturbation to a pairwise probability matrix.

Applies two independent transformations in sequence:

Temperature scaling: p' = p^(1/T) / (p^(1/T) + (1-p)^(1/T))
Seed blend: p'' = (1-w)*p' + w*p_seed_prior

Parameters:

P – Square probability matrix where P[i,j] is the probability that team i beats team j. Must satisfy P[i,j] + P[j,i] = 1 and P[i,i] = 0.
seed_map – Maps team_id → seed_number (1–16).
team_ids – Ordered team IDs matching matrix indices.
temperature – Controls upset/chalk spectrum. T > 1 = more upsets, T < 1 = more chalk, T = 1 = neutral.
seed_weight – Controls model/seed blend. 0 = pure model, 1 = pure seed prior.

Returns:

Perturbed matrix of same shape satisfying complementarity.

Raises:

ValueError – If temperature ≤ 0 or seed_weight not in [0, 1].

ncaa_eval.evaluation.plot_advancement_heatmap(result: SimulationResult, team_labels: Mapping[int, str] | None = None) → Figure[source]¶

Heatmap of per-team advancement probabilities by round.

Parameters:

result – Simulation result with advancement_probs array.
team_labels – Optional mapping of team index (0..n-1, bracket position order) to display name. When None, team indices are shown as-is. Note: keys are bracket indices, not canonical team IDs — use BracketStructure.team_index_map to translate from team IDs to indices before passing this argument.

Returns:

Interactive Plotly Figure showing a heatmap with teams on y-axis and rounds on x-axis.

ncaa_eval.evaluation.plot_backtest_summary(result: BacktestResult, *, metrics: Sequence[str] | None = None) → Figure[source]¶

Per-year metric values from a backtest result.

Parameters:

result – Backtest result containing the summary DataFrame.
metrics – Metric column names to include. Defaults to all metric columns (excludes elapsed_seconds).

Returns:

Interactive Plotly Figure with one line per metric, x=year.

ncaa_eval.evaluation.plot_metric_comparison(results: Mapping[str, BacktestResult], metric: str) → Figure[source]¶

Multi-model overlay: one line per model for a given metric across years.

Parameters:

results – Mapping of model name to BacktestResult.
metric – Metric column name to compare.

Returns:

Interactive Plotly Figure with one line per model.

ncaa_eval.evaluation.plot_reliability_diagram(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10, title: str | None = None) → Figure[source]¶

Reliability diagram: predicted vs. actual probability with bin counts.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of calibration bins (default 10).
title – Optional figure title.

Returns:

Interactive Plotly Figure with calibration curve, diagonal reference, and bar overlay of per-bin sample counts.

ncaa_eval.evaluation.plot_score_distribution(dist: BracketDistribution, *, title: str | None = None) → Figure[source]¶

Histogram of bracket score distribution with percentile markers.

Parameters:

dist – Bracket distribution with pre-computed histogram data and percentile values.
title – Optional figure title.

Returns:

Interactive Plotly Figure with histogram bars and vertical percentile lines at 5th, 25th, 50th, 75th, and 95th.

ncaa_eval.evaluation.power_transform(P: ndarray[tuple[Any, ...], dtype[float64]], temperature: float) → ndarray[tuple[Any, ...], dtype[float64]][source]¶

Apply power/temperature scaling to a probability matrix.

Computes p' = p^(1/T) / (p^(1/T) + (1-p)^(1/T)) element-wise.

Properties:

T=1.0: identity (p’ = p).
T>1: compresses probabilities toward 0.5 (more upsets).
T<1: sharpens probabilities away from 0.5 (more chalk).
Preserves p=0, p=1, p=0.5 as fixed points.
Preserves diagonal zeros.
Preserves complementarity: p'[i,j] + p'[j,i] = 1.

Parameters:

P – Square probability matrix with diagonal zeros.
temperature – Temperature T > 0.

Returns:

Transformed matrix with same shape and dtype.

Raises:

ValueError – If temperature is not positive.

ncaa_eval.evaluation.register_metric(name: str) → Callable[[_MF], _MF][source]¶

Function decorator that registers a metric function.

Parameters:: name – Registry key for the metric.
Returns:: Decorator that registers the function and returns it unchanged.
Raises:: ValueError – If name is already registered.

ncaa_eval.evaluation.register_scoring(name: str, *, display_name: str | None = None) → Callable[[_ST], _ST][source]¶

Class decorator that registers a scoring rule class.

Parameters:

name – Registry key for the scoring rule.
display_name – Optional human-readable label for UI display. Falls back to name if not provided.

Returns:

Decorator that registers the class and returns it unchanged.

Raises:

ValueError – If name is already registered.

ncaa_eval.evaluation.reliability_diagram_data(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) → ReliabilityData[source]¶

Generate reliability diagram data for calibration visualization.

Uses sklearn.calibration.calibration_curve for bin statistics and augments with per-bin sample counts.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of bins (default 10).

Returns:

Structured data containing fraction of positives, mean predicted values, bin counts, bin edges, and requested number of bins.

Raises:

ValueError – If inputs are empty, mismatched, n_bins < 1, or probabilities are outside [0, 1].

ncaa_eval.evaluation.roc_auc(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) → float[source]¶

Compute ROC-AUC for binary predictions.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.

Returns:

ROC-AUC value.

Raises:

ValueError – If inputs are empty, mismatched, probabilities are outside [0, 1], or y_true contains only one class (AUC is undefined).

ncaa_eval.evaluation.run_backtest(model: Model, feature_server: StatefulFeatureServer, *, seasons: Sequence[int], mode: Literal['batch', 'stateful'] = 'batch', n_jobs: int = -1, metric_fns: Mapping[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]] | None = None, console: Console | None = None, progress: bool = False) → BacktestResult[source]¶

Run parallelized walk-forward cross-validation backtest.

Parameters:

model – Model instance to evaluate (will be deep-copied per fold).
feature_server – Configured feature server for building CV folds.
seasons – Season years to include (passed to walk_forward_splits).
mode – Feature serving mode ("batch" or "stateful").
n_jobs – Number of parallel workers. -1 = all cores, 1 = sequential.
metric_fns – Metric functions to compute per fold. Defaults to {log_loss, brier_score, roc_auc, expected_calibration_error}.
console – Rich Console for progress output.
progress – Display a tqdm progress bar for fold evaluation. Most useful with n_jobs=1 (sequential execution).

Returns:

BacktestResult with per-fold results and summary DataFrame.

Raises:

ValueError – If mode is not "batch" or "stateful", or if seasons contains fewer than 2 elements (propagated from walk_forward_splits()).

ncaa_eval.evaluation.score_bracket_against_sims(chosen_bracket: ndarray[tuple[Any, ...], dtype[int32]], sim_winners: ndarray[tuple[Any, ...], dtype[int32]], scoring_rules: Sequence[ScoringRule]) → dict[str, ndarray[tuple[Any, ...], dtype[float64]]][source]¶

Score a chosen bracket against each simulated tournament outcome.

Broadcasts chosen_bracket across all simulations to build a boolean match matrix (sim_winners == chosen_bracket[None, :]). For each scoring rule, constructs a per-game point vector by iterating rounds with a running game_offset, then computes per-sim scores as (matches * game_points).sum(axis=1) — one vectorized dot product per rule, no Python loop over simulations.

Parameters:

chosen_bracket – Game winners for the chosen bracket, shape (n_games,).
sim_winners – Per-simulation game winners, shape (n_simulations, n_games).
scoring_rules – Scoring rules to score against.

Returns:

Mapping of rule_name → per-sim scores, each shape (n_simulations,).

ncaa_eval.evaluation.scoring_from_config(config: dict[str, Any]) → ScoringRule[source]¶

Create a scoring rule from a configuration dict.

Dispatches on config["type"]:

"standard" → StandardScoring
"fibonacci" → FibonacciScoring
"seed_diff_bonus" → SeedDiffBonusScoring (requires seed_map)
"dict" → DictScoring (requires points and name)
"custom" → CustomScoring (requires callable and name)

Parameters:: config – Configuration dict with at least a "type" key.
Returns:: Instantiated scoring rule.
Raises:: ValueError – If type is unknown or required keys are missing.

ncaa_eval.evaluation.simulate_tournament(bracket: BracketStructure, probability_provider: ProbabilityProvider, context: MatchupContext, scoring_rules: Sequence[ScoringRule] | None = None, method: str = 'analytical', n_simulations: int = 10000, rng: Generator | None = None, progress: bool = False, progress_callback: Callable[[int, int], None] | None = None) → SimulationResult[source]¶

High-level tournament simulation orchestrator.

Dispatches to analytical (Phylourny) or Monte Carlo path based on method.

Parameters:

bracket – Tournament bracket structure.
probability_provider – Provider for pairwise win probabilities.
context – Matchup context (season, day_num, neutral).
scoring_rules – Scoring rules for EP computation. Defaults to StandardScoring only.
method – "analytical" (default) or "monte_carlo".
n_simulations – Number of MC simulations (ignored for analytical).
rng – NumPy random generator (MC only).
progress – Display a tqdm progress bar for MC simulation rounds. Ignored when method="analytical".
progress_callback – Optional callback invoked after each MC round with (round_completed, total_rounds). Ignored when method="analytical".

Returns:

SimulationResult.

Raises:

ValueError – If method is not "analytical" or "monte_carlo", or if MC is requested with n_simulations < 100.

ncaa_eval.evaluation.simulate_tournament_mc(bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]], scoring_rules: Sequence[ScoringRule], season: int, n_simulations: int = 10000, rng: Generator | None = None, progress: bool = False, progress_callback: Callable[[int, int], None] | None = None) → SimulationResult[source]¶

Vectorized Monte Carlo tournament simulation.

All N simulations run in parallel per round (no per-sim Python loops). Pre-generates random numbers and uses fancy indexing for batch outcome determination.

Parameters:

bracket – Tournament bracket structure (64 teams).
P – Pairwise win probability matrix, shape (n, n).
scoring_rules – Scoring rules to compute scores for.
season – Tournament season year.
n_simulations – Number of simulations (default 10,000).
rng – NumPy random generator for reproducibility.
progress – Display a tqdm progress bar for simulation rounds.
progress_callback – Optional callback invoked after each round with (round_completed, total_rounds). UI-agnostic hook for external progress reporting (e.g. Streamlit st.progress).

Returns:

SimulationResult with MC-derived advancement probs, expected points, and score distributions.

Raises:

ValueError – If n_simulations < 100.

ncaa_eval.evaluation.slider_to_temperature(slider_value: int) → float[source]¶

Map an integer slider value to a temperature parameter.

Parameters:: slider_value – Integer in [-5, +5].
Returns:: T = 2^(slider_value / 3). T=1.0 at slider_value=0 (neutral).
Raises:: ValueError – If slider_value is outside [-5, +5].

ncaa_eval.evaluation.walk_forward_splits(seasons: Sequence[int], feature_server: StatefulFeatureServer, *, mode: Literal['batch', 'stateful'] = 'batch') → Iterator[CVFold][source]¶

Generate walk-forward CV folds with Leave-One-Tournament-Out splits.

Parameters:

seasons – Ordered sequence of season years to include (e.g., range(2008, 2026)). Must contain at least 2 seasons.
feature_server – Configured StatefulFeatureServer for building feature matrices.
mode – Feature serving mode: "batch" (stateless models) or "stateful" (sequential-update models like Elo).

Yields:

CVFold – For each eligible test year (skipping no-tournament years like 2020): train contains all games from seasons strictly before the test year; test contains only tournament games from the test year; year is the test season year.

Raises:

ValueError – If seasons has fewer than 2 elements, or if mode is not "batch" or "stateful".