ncaa_eval.evaluation.metrics module

Evaluation metrics for NCAA basketball model predictions.

Provides metric functions for evaluating probabilistic predictions:

All functions accept npt.NDArray[np.float64] inputs and return float scalars or structured data (ReliabilityData).

Metric Registry

ncaa_eval.evaluation.metrics.MetricFn

(y_true, y_prob) -> float.

Type:

Signature for metric functions

alias of Callable[[ndarray[tuple[Any, …], dtype[float64]], ndarray[tuple[Any, …], dtype[float64]]], float]

exception ncaa_eval.evaluation.metrics.MetricNotFoundError[source]

Bases: KeyError

Raised when a requested metric name is not in the registry.

class ncaa_eval.evaluation.metrics.ReliabilityData(fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]], mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]], bin_counts: ndarray[tuple[Any, ...], dtype[int64]], bin_edges: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int)[source]

Bases: object

Structured return type for reliability diagram data.

fraction_of_positives

Observed fraction of positives per bin (from calibration_curve).

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

mean_predicted_value

Mean predicted probability per bin (from calibration_curve).

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

bin_counts

Number of samples in each non-empty bin.

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int64]]

bin_edges

Full bin edge array of shape (n_bins + 1,), i.e. np.linspace(0.0, 1.0, n_bins + 1). Includes both the lower (0.0) and upper (1.0) boundaries so callers do not need to recompute them.

Type:

numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]

n_bins

Requested number of bins.

Type:

int

bin_counts: ndarray[tuple[Any, ...], dtype[int64]]
bin_edges: ndarray[tuple[Any, ...], dtype[float64]]
fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]]
mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]]
n_bins: int
ncaa_eval.evaluation.metrics.brier_score(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]

Compute Brier Score for binary predictions.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

Returns:

Brier Score value (lower is better).

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.metrics.expected_calibration_error(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) float[source]

Compute Expected Calibration Error (ECE) using vectorized numpy.

ECE measures how well predicted probabilities match observed frequencies. Predictions are binned into n_bins equal-width bins on [0, 1], and ECE is the weighted average of per-bin |accuracy - confidence| gaps.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

  • n_bins – Number of equal-width bins (default 10).

Returns:

ECE value in [0, 1] (lower is better).

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.metrics.get_metric(name: str) Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float][source]

Return the metric function registered under name.

Raises:

MetricNotFoundError – If name is not registered.

ncaa_eval.evaluation.metrics.list_metrics() list[str][source]

Return all registered metric names (sorted).

ncaa_eval.evaluation.metrics.log_loss(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]

Compute Log Loss (cross-entropy loss) for binary predictions.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

Returns:

Log Loss value.

Raises:

ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].

ncaa_eval.evaluation.metrics.register_metric(name: str) Callable[[_MF], _MF][source]

Function decorator that registers a metric function.

Parameters:

name – Registry key for the metric.

Returns:

Decorator that registers the function and returns it unchanged.

Raises:

ValueError – If name is already registered.

ncaa_eval.evaluation.metrics.reliability_diagram_data(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) ReliabilityData[source]

Generate reliability diagram data for calibration visualization.

Uses sklearn.calibration.calibration_curve for bin statistics and augments with per-bin sample counts.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

  • n_bins – Number of bins (default 10).

Returns:

Structured data containing fraction of positives, mean predicted values, bin counts, bin edges, and requested number of bins.

Raises:

ValueError – If inputs are empty, mismatched, n_bins < 1, or probabilities are outside [0, 1].

ncaa_eval.evaluation.metrics.roc_auc(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]

Compute ROC-AUC for binary predictions.

Parameters:
  • y_true – Binary labels (0 or 1).

  • y_prob – Predicted probabilities for the positive class.

Returns:

ROC-AUC value.

Raises:

ValueError – If inputs are empty, mismatched, probabilities are outside [0, 1], or y_true contains only one class (AUC is undefined).