ncaa_eval.evaluation package¶
Submodules¶
- ncaa_eval.evaluation.backtest module
BacktestResultFoldResultFoldResult.yearFoldResult.predictionsFoldResult.actualsFoldResult.metricsFoldResult.elapsed_secondsFoldResult.test_game_idsFoldResult.test_team_a_idsFoldResult.test_team_b_idsFoldResult.actualsFoldResult.elapsed_secondsFoldResult.metricsFoldResult.predictionsFoldResult.test_game_idsFoldResult.test_team_a_idsFoldResult.test_team_b_idsFoldResult.year
default_metrics()feature_cols()run_backtest()
- ncaa_eval.evaluation.bracket module
- ncaa_eval.evaluation.kaggle_export module
- ncaa_eval.evaluation.metrics module
- Metric Registry
MetricFnMetricNotFoundErrorReliabilityDataReliabilityData.fraction_of_positivesReliabilityData.mean_predicted_valueReliabilityData.bin_countsReliabilityData.bin_edgesReliabilityData.n_binsReliabilityData.bin_countsReliabilityData.bin_edgesReliabilityData.fraction_of_positivesReliabilityData.mean_predicted_valueReliabilityData.n_bins
brier_score()expected_calibration_error()get_metric()list_metrics()log_loss()register_metric()reliability_diagram_data()roc_auc()
- ncaa_eval.evaluation.perturbation module
- ncaa_eval.evaluation.plotting module
- ncaa_eval.evaluation.providers module
- ncaa_eval.evaluation.scoring module
- ncaa_eval.evaluation.simulation module
BracketDistributionBracketDistribution.scoresBracketDistribution.percentilesBracketDistribution.meanBracketDistribution.stdBracketDistribution.histogram_binsBracketDistribution.histogram_countsBracketDistribution.histogram_binsBracketDistribution.histogram_countsBracketDistribution.meanBracketDistribution.percentilesBracketDistribution.scoresBracketDistribution.std
BracketNodeBracketStructureCustomScoringDictScoringEloProviderFibonacciScoringMatchupContextMatrixProviderMostLikelyBracketProbabilityProviderScoringNotFoundErrorScoringRuleSeedDiffBonusScoringSimulationResultSimulationResult.seasonSimulationResult.advancement_probsSimulationResult.expected_pointsSimulationResult.methodSimulationResult.n_simulationsSimulationResult.confidence_intervalsSimulationResult.score_distributionSimulationResult.bracket_distributionsSimulationResult.sim_winnersSimulationResult.advancement_probsSimulationResult.bracket_distributionsSimulationResult.confidence_intervalsSimulationResult.expected_pointsSimulationResult.methodSimulationResult.n_simulationsSimulationResult.score_distributionSimulationResult.seasonSimulationResult.sim_winners
StandardScoringbuild_bracket()build_probability_matrix()compute_advancement_probs()compute_bracket_distribution()compute_expected_points()compute_expected_points_seed_diff()compute_most_likely_bracket()get_scoring()list_scorings()register_scoring()score_bracket_against_sims()scoring_from_config()simulate_tournament()simulate_tournament_mc()
- ncaa_eval.evaluation.splitter module
Module contents¶
Evaluation metrics, cross-validation, and tournament simulation module.
- class ncaa_eval.evaluation.BacktestResult(fold_results: tuple[FoldResult, ...], summary: DataFrame, elapsed_seconds: float)[source]¶
Bases:
objectAggregated result of a full backtest across all folds.
- fold_results¶
Per-fold evaluation results, sorted by year.
- Type:
tuple[ncaa_eval.evaluation.backtest.FoldResult, …]
- summary¶
DataFrame with year as index, metric columns + elapsed_seconds.
- Type:
pandas.core.frame.DataFrame
- elapsed_seconds¶
Total wall-clock time for the entire backtest.
- Type:
float
- elapsed_seconds: float¶
- fold_results: tuple[FoldResult, ...]¶
- summary: DataFrame¶
- class ncaa_eval.evaluation.BracketDistribution(scores: ndarray[tuple[Any, ...], dtype[float64]], percentiles: dict[int, float], mean: float, std: float, histogram_bins: ndarray[tuple[Any, ...], dtype[float64]], histogram_counts: ndarray[tuple[Any, ...], dtype[int64]])[source]¶
Bases:
objectScore distribution statistics from Monte Carlo simulation.
- scores¶
Raw per-simulation scores, shape
(n_simulations,).- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]
- percentiles¶
Mapping of percentile → value for keys 5, 25, 50, 75, 95.
- Type:
dict[int, float]
- mean¶
Mean score across simulations.
- Type:
float
- std¶
Standard deviation of scores.
- Type:
float
- histogram_bins¶
Histogram bin edges, shape
(n_bins + 1,).- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]
- histogram_counts¶
Histogram counts, shape
(n_bins,).- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int64]]
- histogram_bins: ndarray[tuple[Any, ...], dtype[float64]]¶
- histogram_counts: ndarray[tuple[Any, ...], dtype[int64]]¶
- mean: float¶
- percentiles: dict[int, float]¶
- scores: ndarray[tuple[Any, ...], dtype[float64]]¶
- std: float¶
- class ncaa_eval.evaluation.BracketNode(round_index: int, team_index: int = -1, left: BracketNode | None = None, right: BracketNode | None = None)[source]¶
Bases:
objectNode in a tournament bracket tree.
A leaf node represents a single team; an internal node represents a game whose winner advances.
- round_index¶
Round number (0-indexed). Leaves have
round_index=-1.- Type:
int
- team_index¶
Index into the bracket’s
team_idstuple for leaf nodes.-1for internal nodes.- Type:
int
- left¶
Left child (
Nonefor leaves).- Type:
- right¶
Right child (
Nonefor leaves).- Type:
- property is_leaf: bool¶
Return
Trueif this is a leaf (team) node.
- left: BracketNode | None = None¶
- right: BracketNode | None = None¶
- round_index: int¶
- team_index: int = -1¶
- class ncaa_eval.evaluation.BracketStructure(root: ~ncaa_eval.evaluation.bracket.BracketNode, team_ids: tuple[int, ...], team_index_map: dict[int, int], seed_map: dict[int, int] = <factory>)[source]¶
Bases:
objectImmutable tournament bracket.
- root¶
Root
BracketNodeof the bracket tree.
- team_ids¶
Tuple of team IDs in bracket-position order (leaf order).
- Type:
tuple[int, …]
- team_index_map¶
Mapping of
team_id → indexintoteam_ids.- Type:
dict[int, int]
- seed_map¶
Mapping of
team_id → seed_numfor seed-aware scoring.- Type:
dict[int, int]
- root: BracketNode¶
- seed_map: dict[int, int]¶
- team_ids: tuple[int, ...]¶
- team_index_map: dict[int, int]¶
- class ncaa_eval.evaluation.CVFold(train: DataFrame, test: DataFrame, year: int)[source]¶
Bases:
objectA single cross-validation fold.
- train¶
All games from seasons strictly before the test year.
- Type:
pandas.core.frame.DataFrame
- test¶
Tournament games only from the test year.
- Type:
pandas.core.frame.DataFrame
- year¶
The test season year.
- Type:
int
- test: DataFrame¶
- train: DataFrame¶
- year: int¶
- class ncaa_eval.evaluation.CustomScoring(scoring_fn: Callable[[int], float], scoring_name: str)[source]¶
Bases:
objectUser-defined scoring rule wrapping a callable.
- Parameters:
scoring_fn – Callable mapping
round_idx→ points.scoring_name – Name for this custom rule.
- property name: str¶
Return the custom rule name.
- class ncaa_eval.evaluation.DictScoring(points: dict[int, float], scoring_name: str)[source]¶
Bases:
objectScoring rule from a dict mapping round_idx to points.
- Parameters:
points – Mapping of
round_idx → pointsfor rounds 0–5.scoring_name – Name for this rule.
- Raises:
ValueError – If points does not contain exactly 6 entries (rounds 0–5).
- property name: str¶
Return the rule name.
- class ncaa_eval.evaluation.EloProvider(model: Any)[source]¶
Bases:
objectWraps a
StatefulModelas aProbabilityProvider.Uses the model’s
predict_matchupmethod for probability computation.- Parameters:
model – Any
StatefulModelinstance withpredict_matchup.
- batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Return batch probabilities by looping
predict_matchup.Iterates team pairs, calling predict_matchup per matchup, and collects results into a list.
Elo is O(1) per pair so looping is acceptable.
- matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) float[source]¶
Return P(team_a beats team_b) via the model’s
predict_matchup.Delegates to the model’s predict_matchup method, which retrieves both teams’ current ratings and applies the Elo logistic expected-score formula.
- class ncaa_eval.evaluation.EnsembleProvider(ensemble: StackedEnsemble, data_dir: Path, season: int)[source]¶
Bases:
objectWraps a
StackedEnsembleas aProbabilityProvider.Calls
ensemble.predict_bracket(data_dir, season)once on first use and caches the result as aMatrixProviderfor subsequent lookups. This allows aStackedEnsembleto be passed tobuild_probability_matrix()and the Monte Carlo bracket simulator identically to single-model mode.- Parameters:
ensemble – A trained
StackedEnsembleinstance.data_dir – Path to the local Parquet data store.
season – Target season year.
- batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Return batch probabilities from the cached ensemble matrix.
Triggers ensemble bracket prediction on first call; subsequent calls use the cached matrix.
- matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) float[source]¶
Return P(team_a beats team_b) from the ensemble probability matrix.
Triggers ensemble bracket prediction on first call; subsequent calls use the cached matrix.
- class ncaa_eval.evaluation.FibonacciScoring[source]¶
Bases:
objectFibonacci-style scoring: 2-3-5-8-13-21 (231 total for perfect bracket).
- property name: str¶
Return
'fibonacci'.
- class ncaa_eval.evaluation.FoldResult(year: int, predictions: ~pandas.core.series.Series, actuals: ~pandas.core.series.Series, metrics: ~collections.abc.Mapping[str, float], elapsed_seconds: float, test_game_ids: ~pandas.core.series.Series = <factory>, test_team_a_ids: ~pandas.core.series.Series = <factory>, test_team_b_ids: ~pandas.core.series.Series = <factory>)[source]¶
Bases:
objectResult of evaluating a single cross-validation fold.
- year¶
The test season year for this fold.
- Type:
int
- predictions¶
Predicted probabilities for tournament games.
- Type:
pandas.core.series.Series
- actuals¶
Actual binary outcomes for tournament games.
- Type:
pandas.core.series.Series
- metrics¶
Mapping of metric name to computed value.
- Type:
collections.abc.Mapping[str, float]
- elapsed_seconds¶
Wall-clock time for the fold evaluation.
- Type:
float
- test_game_ids¶
Game IDs from the test fold (aligned to predictions).
- Type:
pandas.core.series.Series
- test_team_a_ids¶
team_a IDs from the test fold.
- Type:
pandas.core.series.Series
- test_team_b_ids¶
team_b IDs from the test fold.
- Type:
pandas.core.series.Series
- actuals: Series¶
- elapsed_seconds: float¶
- metrics: Mapping[str, float]¶
- predictions: Series¶
- test_game_ids: Series¶
- test_team_a_ids: Series¶
- test_team_b_ids: Series¶
- year: int¶
- class ncaa_eval.evaluation.MatchupContext(season: int, day_num: int, is_neutral: bool)[source]¶
Bases:
objectContext for a hypothetical matchup probability query.
Passed to
ProbabilityProviderso that stateless models can construct the correct feature row for a hypothetical pairing. Stateful models (Elo) typically ignore context and use internal ratings.- season¶
Tournament season year (e.g. 2024).
- Type:
int
- day_num¶
Tournament day number (e.g. 136 for Round of 64).
- Type:
int
- is_neutral¶
Truefor all tournament games (neutral site).- Type:
bool
- day_num: int¶
- is_neutral: bool¶
- season: int¶
- class ncaa_eval.evaluation.MatrixProvider(prob_matrix: ndarray[tuple[Any, ...], dtype[float64]], team_ids: Sequence[int])[source]¶
Bases:
objectWraps a pre-computed probability matrix as a
ProbabilityProvider.- Parameters:
prob_matrix – n×n pairwise probability matrix.
team_ids – Sequence of team IDs matching matrix indices.
- batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Return batch probabilities from the stored matrix.
Extracts row/column indices from the team pairs, vectorizes lookups into the probability matrix, and returns a list of win probabilities.
- matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) float[source]¶
Return P(team_a beats team_b) from the stored matrix.
Indexes into the pre-built probability matrix using the team-to-index mapping, returning P(team_i beats team_j) directly from the stored array.
- exception ncaa_eval.evaluation.MetricNotFoundError[source]¶
Bases:
KeyErrorRaised when a requested metric name is not in the registry.
- class ncaa_eval.evaluation.MostLikelyBracket(winners: tuple[int, ...], champion_team_id: int, log_likelihood: float)[source]¶
Bases:
objectMaximum-likelihood bracket from greedy traversal.
- winners¶
Tuple of team indices for each game’s predicted winner, in round-major order matching
SimulationResult.sim_winnersrows — all Round-of-64 games first (indices 0–31 for 64 teams), then Round-of-32 (32–47), through to the championship (index 62). 63 entries for a 64-team bracket. Pass directly toscore_bracket_against_sims()aschosen_bracket.- Type:
tuple[int, …]
- champion_team_id¶
Canonical team ID of the predicted champion (from BracketStructure.team_ids[champion_index]).
- Type:
int
- log_likelihood¶
Sum of
log(max(P[left, right], P[right, left]))across all games.- Type:
float
- champion_team_id: int¶
- log_likelihood: float¶
- winners: tuple[int, ...]¶
- class ncaa_eval.evaluation.ProbabilityProvider(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for pairwise win probability computation.
All implementations must satisfy the complementarity contract:
P(A beats B) + P(B beats A) = 1for every(A, B)pair.- batch_matchup_probabilities(team_a_ids: Sequence[int], team_b_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Return P(a_i beats b_i) for all pairs.
- Parameters:
team_a_ids – Sequence of first-team IDs.
team_b_ids – Sequence of second-team IDs (same length).
context – Matchup context.
- Returns:
1-D float64 array of shape
(len(team_a_ids),).
- matchup_probability(team_a_id: int, team_b_id: int, context: MatchupContext) float[source]¶
Return P(team_a beats team_b).
- Parameters:
team_a_id – First team’s canonical ID.
team_b_id – Second team’s canonical ID.
context – Matchup context (season, day_num, neutral).
- Returns:
Probability in
[0, 1].
- class ncaa_eval.evaluation.ReliabilityData(fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]], mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]], bin_counts: ndarray[tuple[Any, ...], dtype[int64]], bin_edges: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int)[source]¶
Bases:
objectStructured return type for reliability diagram data.
- fraction_of_positives¶
Observed fraction of positives per bin (from calibration_curve).
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]
- mean_predicted_value¶
Mean predicted probability per bin (from calibration_curve).
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]
- bin_counts¶
Number of samples in each non-empty bin.
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int64]]
- bin_edges¶
Full bin edge array of shape
(n_bins + 1,), i.e.np.linspace(0.0, 1.0, n_bins + 1). Includes both the lower (0.0) and upper (1.0) boundaries so callers do not need to recompute them.- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]
- n_bins¶
Requested number of bins.
- Type:
int
- bin_counts: ndarray[tuple[Any, ...], dtype[int64]]¶
- bin_edges: ndarray[tuple[Any, ...], dtype[float64]]¶
- fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]]¶
- mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]]¶
- n_bins: int¶
- exception ncaa_eval.evaluation.ScoringNotFoundError[source]¶
Bases:
KeyErrorRaised when a requested scoring name is not in the registry.
- class ncaa_eval.evaluation.ScoringRule(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for tournament bracket scoring rules.
- property name: str¶
Human-readable name of the scoring rule.
- class ncaa_eval.evaluation.SeedDiffBonusScoring(seed_map: dict[int, int])[source]¶
Bases:
objectBase points + seed-difference bonus when lower seed wins.
Uses same base as StandardScoring (1-2-4-8-16-32). When the lower seed (higher seed number) wins, adds
|seed_a - seed_b|bonus.Note: This scoring rule’s
points_per_roundreturns only the base points. Full EP computation for seed-diff scoring (which requires per-matchup seed information) is deferred to Story 6.6, which will add a dedicatedcompute_expected_points_seed_difffunction.- Parameters:
seed_map – Mapping of
team_id → seed_num.
- property name: str¶
Return
'seed_diff_bonus'.
- seed_diff_bonus(seed_a: int, seed_b: int) float[source]¶
Return bonus points when the lower seed wins.
- Parameters:
seed_a – Winner’s seed number.
seed_b – Loser’s seed number.
- Returns:
|seed_a - seed_b|if winner has higher seed number (lower seed = upset), else 0.
- property seed_map: dict[int, int]¶
Return the seed lookup map.
- class ncaa_eval.evaluation.SimulationResult(season: int, advancement_probs: ndarray[tuple[Any, ...], dtype[float64]], expected_points: dict[str, ndarray[tuple[Any, ...], dtype[float64]]], method: str, n_simulations: int | None, confidence_intervals: dict[str, tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]] | None, score_distribution: dict[str, ndarray[tuple[Any, ...], dtype[float64]]] | None, bracket_distributions: dict[str, BracketDistribution] | None = None, sim_winners: ndarray[tuple[Any, ...], dtype[int32]] | None = None)[source]¶
Bases:
objectResult of tournament simulation for one season.
Both the analytical path and MC path produce a
SimulationResult.- season¶
Tournament season year.
- Type:
int
- advancement_probs¶
Per-team advancement probabilities, shape
(n_teams, n_rounds).- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]
- expected_points¶
Mapping of
scoring_rule_name → per-team EP, each shape(n_teams,).- Type:
dict[str, numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]]
- method¶
"analytical"or"monte_carlo".- Type:
str
- n_simulations¶
Nonefor analytical; N for MC.- Type:
int | None
- confidence_intervals¶
Optional mapping of
rule_name → (lower, upper)arrays.- Type:
dict[str, tuple[numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]], numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]]] | None
- score_distribution¶
Optional mapping of
rule_name → per-sim scoresarray, shape(n_simulations,).- Type:
dict[str, numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]] | None
- bracket_distributions¶
Optional mapping of
rule_name → BracketDistribution(MC only;Nonefor analytical). Note: distributions are computed from the chalk-bracket score (how many pre-game favorites won). For pool scoring analysis (“how would my chosen bracket score across all simulations?”), usesim_winnerswithscore_bracket_against_sims().- Type:
dict[str, ncaa_eval.evaluation.simulation.BracketDistribution] | None
- sim_winners¶
Optional array of per-simulation game winners, shape
(n_simulations, n_games)(MC only;Nonefor analytical).- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int32]] | None
- advancement_probs: ndarray[tuple[Any, ...], dtype[float64]]¶
- bracket_distributions: dict[str, BracketDistribution] | None = None¶
- confidence_intervals: dict[str, tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]] | None¶
- expected_points: dict[str, ndarray[tuple[Any, ...], dtype[float64]]]¶
- method: str¶
- n_simulations: int | None¶
- score_distribution: dict[str, ndarray[tuple[Any, ...], dtype[float64]]] | None¶
- season: int¶
- sim_winners: ndarray[tuple[Any, ...], dtype[int32]] | None = None¶
- class ncaa_eval.evaluation.StandardScoring[source]¶
Bases:
objectESPN-style scoring: 1-2-4-8-16-32 (192 total for perfect bracket).
- property name: str¶
Return
'standard'.
- ncaa_eval.evaluation.brier_score(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]¶
Compute Brier Score for binary predictions.
- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
- Returns:
Brier Score value (lower is better).
- Raises:
ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].
- ncaa_eval.evaluation.build_bracket(seeds: list[TourneySeed], season: int) BracketStructure[source]¶
Construct a 64-team bracket tree from tournament seeds.
Play-in teams (
is_play_in=True) are excluded. Exactly 64 non-play-in seeds are required.- Parameters:
seeds – List of
TourneySeedobjects for the given season.season – Season year to filter seeds.
- Returns:
Fully constructed
BracketStructure.- Raises:
ValueError – If the number of non-play-in seeds for season is not 64.
- ncaa_eval.evaluation.build_probability_matrix(provider: ProbabilityProvider, team_ids: Sequence[int], context: MatchupContext) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Build n×n pairwise win probability matrix.
Uses upper-triangle batch call, then fills
P[j,i] = 1 - P[i,j]via the complementarity contract.- Parameters:
provider – Probability provider implementing the protocol.
team_ids – Team IDs in bracket order.
context – Matchup context.
- Returns:
Float64 array of shape
(n, n). Diagonal is zero.
- ncaa_eval.evaluation.build_seed_prior_matrix(seed_map: dict[int, int], team_ids: Sequence[int]) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Build an (n × n) seed prior probability matrix.
S[i, j] = P(team_i beats team_j)based on historical seed win rates keyed by|seed_i − seed_j|.- Parameters:
seed_map – Maps
team_id → seed_number(1–16).team_ids – Ordered team IDs matching matrix indices.
- Returns:
Float64 matrix of shape
(n, n)with diagonal zeros andS[i,j] + S[j,i] = 1.
- ncaa_eval.evaluation.compute_advancement_probs(bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]]) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Compute exact advancement probabilities via the Phylourny algorithm.
Post-order traversal of the bracket tree computing Win Probability Vectors (WPVs) at each internal node using the formula:
R = V ⊙ (P^T · W) + W ⊙ (P^T · V)- Parameters:
bracket – Tournament bracket structure.
P – Pairwise win probability matrix, shape
(n, n).
- Returns:
Advancement probabilities, shape
(n, n_rounds).adv_probs[i, r]= P(team i wins their game in round r).- Raises:
ValueError – If
nis not a power of 2 or does not match the bracket’s team count.
- ncaa_eval.evaluation.compute_bracket_distribution(scores: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int = 50) BracketDistribution[source]¶
Compute score distribution statistics from raw MC scores.
Computes the 5th/25th/50th/75th/95th percentiles via
np.percentile, builds an_bins-bucket histogram vianp.histogram, and wraps all statistics into aBracketDistribution.- Parameters:
scores – Raw per-simulation scores, shape
(n_simulations,).n_bins – Number of histogram bins (default 50).
- Returns:
BracketDistributionwith percentiles, mean, std, and histogram.
- ncaa_eval.evaluation.compute_expected_points(adv_probs: ndarray[tuple[Any, ...], dtype[float64]], scoring_rule: ScoringRule) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Compute Expected Points per team via matrix-vector multiply.
- Parameters:
adv_probs – Advancement probabilities, shape
(n, n_rounds).scoring_rule – Scoring rule providing per-round point values.
- Returns:
Expected Points per team, shape
(n,).
- ncaa_eval.evaluation.compute_expected_points_seed_diff(adv_probs: ndarray[tuple[Any, ...], dtype[float64]], bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]], seed_map: dict[int, int]) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Compute Expected Points with seed-difference upset bonus.
Extends standard EP by adding per-matchup seed-diff bonus. For each internal bracket node at round r, the bonus contribution for team i beating opponent j is:
P(i reaches node) * P(i beats j) * P(j reaches node) * bonus(seed_i, seed_j)
where
bonus = |seed_i - seed_j|whenseed_i > seed_j(upset), else 0.Uses
SeedDiffBonusScoringbase points for standard round points and a post-order traversal of the bracket tree (reusing WPVs fromcompute_advancement_probs()logic) for bonus computation.- Parameters:
adv_probs – Advancement probabilities, shape
(n, n_rounds).bracket – Tournament bracket structure (for tree traversal).
P – Pairwise win probability matrix, shape
(n, n).seed_map – Mapping of
team_id → seed_num.
- Returns:
Expected Points per team, shape
(n,), including base + bonus.
- ncaa_eval.evaluation.compute_most_likely_bracket(bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]]) MostLikelyBracket[source]¶
Compute the maximum-likelihood bracket via greedy traversal.
At each internal node, picks the team with the higher win probability (
argmax(P[left, right])). Returns the full bracket of winners and the log-likelihood of the chosen bracket.The
winnersarray is in round-major order — the same order asSimulationResult.sim_winnersrows — so it can be passed directly toscore_bracket_against_sims(): all Round-of-64 games first (indices 0–31), then Round-of-32 (32–47), through to the championship game (index 62).- Parameters:
bracket – Tournament bracket structure.
P – Pairwise win probability matrix, shape
(n, n).
- Returns:
MostLikelyBracketwith winners, champion, and log-likelihood.
- ncaa_eval.evaluation.default_metrics() dict[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]][source]¶
Return all registered metric functions (built-in + user-registered).
- ncaa_eval.evaluation.expected_calibration_error(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) float[source]¶
Compute Expected Calibration Error (ECE) using vectorized numpy.
ECE measures how well predicted probabilities match observed frequencies. Predictions are binned into
n_binsequal-width bins on [0, 1], and ECE is the weighted average of per-bin |accuracy - confidence| gaps.- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of equal-width bins (default 10).
- Returns:
ECE value in [0, 1] (lower is better).
- Raises:
ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].
- ncaa_eval.evaluation.feature_cols(df: DataFrame) list[str][source]¶
Return feature column names (everything not in METADATA_COLS).
- Parameters:
df – DataFrame whose columns are inspected.
- Returns:
List of column names that are not metadata.
- ncaa_eval.evaluation.format_kaggle_submission(season: int, team_ids: Sequence[int], prob_matrix: ndarray[tuple[Any, ...], dtype[float64]]) str[source]¶
Format a probability matrix as a Kaggle submission CSV string.
- Parameters:
season – Tournament season year (e.g. 2025).
team_ids – Team IDs corresponding to matrix rows/columns.
prob_matrix – n×n pairwise probability matrix where
P[i,j]is P(team_ids[i] beats team_ids[j]).
- Returns:
CSV string with header
ID,Predand C(n,2) data rows.- Raises:
ValueError – If the matrix shape doesn’t match the team count.
- ncaa_eval.evaluation.get_metric(name: str) Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float][source]¶
Return the metric function registered under name.
- Raises:
MetricNotFoundError – If name is not registered.
- ncaa_eval.evaluation.get_scoring(name: str) type[source]¶
Return the scoring class registered under name.
- Raises:
ScoringNotFoundError – If name is not registered.
- ncaa_eval.evaluation.list_scoring_display_names() dict[str, str][source]¶
Return a mapping of registry keys to display names.
- Returns:
Dict mapping scoring name → human-readable display name.
- ncaa_eval.evaluation.list_scorings() list[str][source]¶
Return all registered scoring names (sorted).
- ncaa_eval.evaluation.log_loss(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]¶
Compute Log Loss (cross-entropy loss) for binary predictions.
- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
- Returns:
Log Loss value.
- Raises:
ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].
- ncaa_eval.evaluation.perturb_probability_matrix(P: ndarray[tuple[Any, ...], dtype[float64]], seed_map: dict[int, int], team_ids: Sequence[int], temperature: float = 1.0, seed_weight: float = 0.0) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Apply game-theory slider perturbation to a pairwise probability matrix.
Applies two independent transformations in sequence:
Temperature scaling:
p' = p^(1/T) / (p^(1/T) + (1-p)^(1/T))Seed blend:
p'' = (1-w)*p' + w*p_seed_prior
- Parameters:
P – Square probability matrix where
P[i,j]is the probability that team i beats team j. Must satisfyP[i,j] + P[j,i] = 1andP[i,i] = 0.seed_map – Maps
team_id → seed_number(1–16).team_ids – Ordered team IDs matching matrix indices.
temperature – Controls upset/chalk spectrum. T > 1 = more upsets, T < 1 = more chalk, T = 1 = neutral.
seed_weight – Controls model/seed blend. 0 = pure model, 1 = pure seed prior.
- Returns:
Perturbed matrix of same shape satisfying complementarity.
- Raises:
ValueError – If temperature ≤ 0 or seed_weight not in
[0, 1].
- ncaa_eval.evaluation.plot_advancement_heatmap(result: SimulationResult, team_labels: Mapping[int, str] | None = None) Figure[source]¶
Heatmap of per-team advancement probabilities by round.
- Parameters:
result – Simulation result with
advancement_probsarray.team_labels – Optional mapping of team index (0..n-1, bracket position order) to display name. When
None, team indices are shown as-is. Note: keys are bracket indices, not canonical team IDs — useBracketStructure.team_index_mapto translate from team IDs to indices before passing this argument.
- Returns:
Interactive Plotly Figure showing a heatmap with teams on y-axis and rounds on x-axis.
- ncaa_eval.evaluation.plot_backtest_summary(result: BacktestResult, *, metrics: Sequence[str] | None = None) Figure[source]¶
Per-year metric values from a backtest result.
- Parameters:
result – Backtest result containing the summary DataFrame.
metrics – Metric column names to include. Defaults to all metric columns (excludes
elapsed_seconds).
- Returns:
Interactive Plotly Figure with one line per metric, x=year.
- ncaa_eval.evaluation.plot_metric_comparison(results: Mapping[str, BacktestResult], metric: str) Figure[source]¶
Multi-model overlay: one line per model for a given metric across years.
- Parameters:
results – Mapping of model name to BacktestResult.
metric – Metric column name to compare.
- Returns:
Interactive Plotly Figure with one line per model.
- ncaa_eval.evaluation.plot_reliability_diagram(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10, title: str | None = None) Figure[source]¶
Reliability diagram: predicted vs. actual probability with bin counts.
- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of calibration bins (default 10).
title – Optional figure title.
- Returns:
Interactive Plotly Figure with calibration curve, diagonal reference, and bar overlay of per-bin sample counts.
- ncaa_eval.evaluation.plot_score_distribution(dist: BracketDistribution, *, title: str | None = None) Figure[source]¶
Histogram of bracket score distribution with percentile markers.
- Parameters:
dist – Bracket distribution with pre-computed histogram data and percentile values.
title – Optional figure title.
- Returns:
Interactive Plotly Figure with histogram bars and vertical percentile lines at 5th, 25th, 50th, 75th, and 95th.
- ncaa_eval.evaluation.power_transform(P: ndarray[tuple[Any, ...], dtype[float64]], temperature: float) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Apply power/temperature scaling to a probability matrix.
Computes
p' = p^(1/T) / (p^(1/T) + (1-p)^(1/T))element-wise.- Properties:
T=1.0: identity (p’ = p).
T>1: compresses probabilities toward 0.5 (more upsets).
T<1: sharpens probabilities away from 0.5 (more chalk).
Preserves p=0, p=1, p=0.5 as fixed points.
Preserves diagonal zeros.
Preserves complementarity:
p'[i,j] + p'[j,i] = 1.
- Parameters:
P – Square probability matrix with diagonal zeros.
temperature – Temperature T > 0.
- Returns:
Transformed matrix with same shape and dtype.
- Raises:
ValueError – If temperature is not positive.
- ncaa_eval.evaluation.register_metric(name: str) Callable[[_MF], _MF][source]¶
Function decorator that registers a metric function.
- Parameters:
name – Registry key for the metric.
- Returns:
Decorator that registers the function and returns it unchanged.
- Raises:
ValueError – If name is already registered.
- ncaa_eval.evaluation.register_scoring(name: str, *, display_name: str | None = None) Callable[[_ST], _ST][source]¶
Class decorator that registers a scoring rule class.
- Parameters:
name – Registry key for the scoring rule.
display_name – Optional human-readable label for UI display. Falls back to name if not provided.
- Returns:
Decorator that registers the class and returns it unchanged.
- Raises:
ValueError – If name is already registered.
- ncaa_eval.evaluation.reliability_diagram_data(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) ReliabilityData[source]¶
Generate reliability diagram data for calibration visualization.
Uses
sklearn.calibration.calibration_curvefor bin statistics and augments with per-bin sample counts.- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of bins (default 10).
- Returns:
Structured data containing fraction of positives, mean predicted values, bin counts, bin edges, and requested number of bins.
- Raises:
ValueError – If inputs are empty, mismatched,
n_bins < 1, or probabilities are outside [0, 1].
- ncaa_eval.evaluation.roc_auc(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]¶
Compute ROC-AUC for binary predictions.
- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
- Returns:
ROC-AUC value.
- Raises:
ValueError – If inputs are empty, mismatched, probabilities are outside [0, 1], or
y_truecontains only one class (AUC is undefined).
- ncaa_eval.evaluation.run_backtest(model: Model, feature_server: StatefulFeatureServer, *, seasons: Sequence[int], mode: Literal['batch', 'stateful'] = 'batch', n_jobs: int = -1, metric_fns: Mapping[str, Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float]] | None = None, console: Console | None = None, progress: bool = False) BacktestResult[source]¶
Run parallelized walk-forward cross-validation backtest.
- Parameters:
model – Model instance to evaluate (will be deep-copied per fold).
feature_server – Configured feature server for building CV folds.
seasons – Season years to include (passed to walk_forward_splits).
mode – Feature serving mode (
"batch"or"stateful").n_jobs – Number of parallel workers. -1 = all cores, 1 = sequential.
metric_fns – Metric functions to compute per fold. Defaults to {log_loss, brier_score, roc_auc, expected_calibration_error}.
console – Rich Console for progress output.
progress – Display a tqdm progress bar for fold evaluation. Most useful with
n_jobs=1(sequential execution).
- Returns:
BacktestResult with per-fold results and summary DataFrame.
- Raises:
ValueError – If
modeis not"batch"or"stateful", or ifseasonscontains fewer than 2 elements (propagated fromwalk_forward_splits()).
- ncaa_eval.evaluation.score_bracket_against_sims(chosen_bracket: ndarray[tuple[Any, ...], dtype[int32]], sim_winners: ndarray[tuple[Any, ...], dtype[int32]], scoring_rules: Sequence[ScoringRule]) dict[str, ndarray[tuple[Any, ...], dtype[float64]]][source]¶
Score a chosen bracket against each simulated tournament outcome.
Broadcasts
chosen_bracketacross all simulations to build a boolean match matrix (sim_winners == chosen_bracket[None, :]). For each scoring rule, constructs a per-game point vector by iterating rounds with a runninggame_offset, then computes per-sim scores as(matches * game_points).sum(axis=1)— one vectorized dot product per rule, no Python loop over simulations.- Parameters:
chosen_bracket – Game winners for the chosen bracket, shape
(n_games,).sim_winners – Per-simulation game winners, shape
(n_simulations, n_games).scoring_rules – Scoring rules to score against.
- Returns:
Mapping of
rule_name → per-sim scores, each shape(n_simulations,).
- ncaa_eval.evaluation.scoring_from_config(config: dict[str, Any]) ScoringRule[source]¶
Create a scoring rule from a configuration dict.
Dispatches on
config["type"]:"standard"→StandardScoring"fibonacci"→FibonacciScoring"seed_diff_bonus"→SeedDiffBonusScoring(requiresseed_map)"dict"→DictScoring(requirespointsandname)"custom"→CustomScoring(requirescallableandname)
- Parameters:
config – Configuration dict with at least a
"type"key.- Returns:
Instantiated scoring rule.
- Raises:
ValueError – If
typeis unknown or required keys are missing.
- ncaa_eval.evaluation.simulate_tournament(bracket: BracketStructure, probability_provider: ProbabilityProvider, context: MatchupContext, scoring_rules: Sequence[ScoringRule] | None = None, method: str = 'analytical', n_simulations: int = 10000, rng: Generator | None = None, progress: bool = False, progress_callback: Callable[[int, int], None] | None = None) SimulationResult[source]¶
High-level tournament simulation orchestrator.
Dispatches to analytical (Phylourny) or Monte Carlo path based on method.
- Parameters:
bracket – Tournament bracket structure.
probability_provider – Provider for pairwise win probabilities.
context – Matchup context (season, day_num, neutral).
scoring_rules – Scoring rules for EP computation. Defaults to
StandardScoringonly.method –
"analytical"(default) or"monte_carlo".n_simulations – Number of MC simulations (ignored for analytical).
rng – NumPy random generator (MC only).
progress – Display a tqdm progress bar for MC simulation rounds. Ignored when
method="analytical".progress_callback – Optional callback invoked after each MC round with
(round_completed, total_rounds). Ignored whenmethod="analytical".
- Returns:
- Raises:
ValueError – If method is not
"analytical"or"monte_carlo", or if MC is requested withn_simulations < 100.
- ncaa_eval.evaluation.simulate_tournament_mc(bracket: BracketStructure, P: ndarray[tuple[Any, ...], dtype[float64]], scoring_rules: Sequence[ScoringRule], season: int, n_simulations: int = 10000, rng: Generator | None = None, progress: bool = False, progress_callback: Callable[[int, int], None] | None = None) SimulationResult[source]¶
Vectorized Monte Carlo tournament simulation.
All N simulations run in parallel per round (no per-sim Python loops). Pre-generates random numbers and uses fancy indexing for batch outcome determination.
- Parameters:
bracket – Tournament bracket structure (64 teams).
P – Pairwise win probability matrix, shape
(n, n).scoring_rules – Scoring rules to compute scores for.
season – Tournament season year.
n_simulations – Number of simulations (default 10,000).
rng – NumPy random generator for reproducibility.
progress – Display a tqdm progress bar for simulation rounds.
progress_callback – Optional callback invoked after each round with
(round_completed, total_rounds). UI-agnostic hook for external progress reporting (e.g. Streamlitst.progress).
- Returns:
SimulationResultwith MC-derived advancement probs, expected points, and score distributions.- Raises:
ValueError – If
n_simulations < 100.
- ncaa_eval.evaluation.slider_to_temperature(slider_value: int) float[source]¶
Map an integer slider value to a temperature parameter.
- Parameters:
slider_value – Integer in
[-5, +5].- Returns:
T = 2^(slider_value / 3). T=1.0 at slider_value=0 (neutral).- Raises:
ValueError – If slider_value is outside
[-5, +5].
- ncaa_eval.evaluation.walk_forward_splits(seasons: Sequence[int], feature_server: StatefulFeatureServer, *, mode: Literal['batch', 'stateful'] = 'batch') Iterator[CVFold][source]¶
Generate walk-forward CV folds with Leave-One-Tournament-Out splits.
- Parameters:
seasons – Ordered sequence of season years to include (e.g.,
range(2008, 2026)). Must contain at least 2 seasons.feature_server – Configured StatefulFeatureServer for building feature matrices.
mode – Feature serving mode:
"batch"(stateless models) or"stateful"(sequential-update models like Elo).
- Yields:
CVFold – For each eligible test year (skipping no-tournament years like 2020):
traincontains all games from seasons strictly before the test year;testcontains only tournament games from the test year;yearis the test season year.- Raises:
ValueError – If
seasonshas fewer than 2 elements, or ifmodeis not"batch"or"stateful".