ncaa_eval.evaluation.metrics module¶

Evaluation metrics for NCAA basketball model predictions.

Provides metric functions for evaluating probabilistic predictions:

log_loss() — Log Loss via sklearn.metrics.log_loss
brier_score() — Brier Score via sklearn.metrics.brier_score_loss
roc_auc() — ROC-AUC via sklearn.metrics.roc_auc_score
expected_calibration_error() — ECE via vectorized numpy binning
reliability_diagram_data() — Reliability diagram bin data via sklearn.calibration.calibration_curve

All functions accept npt.NDArray[np.float64] inputs and return float scalars or structured data (ReliabilityData).

Metric Registry¶

register_metric() — decorator to register a metric function
get_metric() — look up a metric by name
list_metrics() — list all registered metric names
MetricNotFoundError — raised for unknown metric names

ncaa_eval.evaluation.metrics.MetricFn¶

(y_true, y_prob) -> float.

Type:: Signature for metric functions

alias of Callable[[ndarray[tuple[Any, …], dtype[float64]], ndarray[tuple[Any, …], dtype[float64]]], float]

exception ncaa_eval.evaluation.metrics.MetricNotFoundError[source]¶

Bases: KeyError

Raised when a requested metric name is not in the registry.

class ncaa_eval.evaluation.metrics.ReliabilityData(fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]], mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]], bin_counts: ndarray[tuple[Any, ...], dtype[int64]], bin_edges: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int)[source]¶

Bases: object

Structured return type for reliability diagram data.

fraction_of_positives¶

Observed fraction of positives per bin (from calibration_curve).

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

mean_predicted_value¶

Mean predicted probability per bin (from calibration_curve).

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

bin_counts¶

Number of samples in each non-empty bin.

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int64]]

bin_edges¶

Full bin edge array of shape (n_bins + 1,), i.e. np.linspace(0.0, 1.0, n_bins + 1). Includes both the lower (0.0) and upper (1.0) boundaries so callers do not need to recompute them.

Type:: numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

n_bins¶

Requested number of bins.

Type:: int

bin_counts: ndarray[tuple[Any, ...], dtype[int64]]¶

bin_edges: ndarray[tuple[Any, ...], dtype[float64]]¶

fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]]¶

mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]]¶

n_bins: int¶

ncaa_eval.evaluation.metrics.brier_score(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) → float[source]¶

Compute Brier Score for binary predictions.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.

Returns:

Brier Score value (lower is better).

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.metrics.expected_calibration_error(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) → float[source]¶

Compute Expected Calibration Error (ECE) using vectorized numpy.

ECE measures how well predicted probabilities match observed frequencies. Predictions are binned into n_bins equal-width bins on [0, 1], and ECE is the weighted average of per-bin |accuracy - confidence| gaps.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of equal-width bins (default 10).

Returns:

ECE value in [0, 1] (lower is better).

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.metrics.get_metric(name: str) → Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float][source]¶

Return the metric function registered under name.

Raises:: MetricNotFoundError – If name is not registered.

ncaa_eval.evaluation.metrics.list_metrics() → list[str][source]¶: Return all registered metric names (sorted).

ncaa_eval.evaluation.metrics.log_loss(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) → float[source]¶

Compute Log Loss (cross-entropy loss) for binary predictions.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.

Returns:

Log Loss value.

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.metrics.register_metric(name: str) → Callable[[_MF], _MF][source]¶

Function decorator that registers a metric function.

Parameters:: name – Registry key for the metric.
Returns:: Decorator that registers the function and returns it unchanged.
Raises:: ValueError – If name is already registered.

ncaa_eval.evaluation.metrics.reliability_diagram_data(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) → ReliabilityData[source]¶

Generate reliability diagram data for calibration visualization.

Uses sklearn.calibration.calibration_curve for bin statistics and augments with per-bin sample counts.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of bins (default 10).

Returns:

Structured data containing fraction of positives, mean predicted values, bin counts, bin edges, and requested number of bins.

Raises:

ValueError – If inputs are empty, mismatched, n_bins < 1, or probabilities are outside [0, 1].

ncaa_eval.evaluation.metrics.roc_auc(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) → float[source]¶

Compute ROC-AUC for binary predictions.

Parameters:

y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.

Returns:

ROC-AUC value.

Raises:

ValueError – If inputs are empty, mismatched, probabilities are outside [0, 1], or y_true contains only one class (AUC is undefined).