ncaa_eval.evaluation.metrics module¶
Evaluation metrics for NCAA basketball model predictions.
Provides metric functions for evaluating probabilistic predictions:
log_loss()— Log Loss viasklearn.metrics.log_lossbrier_score()— Brier Score viasklearn.metrics.brier_score_lossroc_auc()— ROC-AUC viasklearn.metrics.roc_auc_scoreexpected_calibration_error()— ECE via vectorized numpy binningreliability_diagram_data()— Reliability diagram bin data viasklearn.calibration.calibration_curve
All functions accept npt.NDArray[np.float64] inputs and return float
scalars or structured data (ReliabilityData).
Metric Registry¶
register_metric()— decorator to register a metric functionget_metric()— look up a metric by namelist_metrics()— list all registered metric namesMetricNotFoundError— raised for unknown metric names
- ncaa_eval.evaluation.metrics.MetricFn¶
(y_true, y_prob) -> float.- Type:
Signature for metric functions
alias of
Callable[[ndarray[tuple[Any, …],dtype[float64]],ndarray[tuple[Any, …],dtype[float64]]],float]
- exception ncaa_eval.evaluation.metrics.MetricNotFoundError[source]¶
Bases:
KeyErrorRaised when a requested metric name is not in the registry.
- class ncaa_eval.evaluation.metrics.ReliabilityData(fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]], mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]], bin_counts: ndarray[tuple[Any, ...], dtype[int64]], bin_edges: ndarray[tuple[Any, ...], dtype[float64]], n_bins: int)[source]¶
Bases:
objectStructured return type for reliability diagram data.
- fraction_of_positives¶
Observed fraction of positives per bin (from calibration_curve).
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]
- mean_predicted_value¶
Mean predicted probability per bin (from calibration_curve).
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]
- bin_counts¶
Number of samples in each non-empty bin.
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.int64]]
- bin_edges¶
Full bin edge array of shape
(n_bins + 1,), i.e.np.linspace(0.0, 1.0, n_bins + 1). Includes both the lower (0.0) and upper (1.0) boundaries so callers do not need to recompute them.- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.float64]]
- n_bins¶
Requested number of bins.
- Type:
int
- bin_counts: ndarray[tuple[Any, ...], dtype[int64]]¶
- bin_edges: ndarray[tuple[Any, ...], dtype[float64]]¶
- fraction_of_positives: ndarray[tuple[Any, ...], dtype[float64]]¶
- mean_predicted_value: ndarray[tuple[Any, ...], dtype[float64]]¶
- n_bins: int¶
- ncaa_eval.evaluation.metrics.brier_score(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]¶
Compute Brier Score for binary predictions.
- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
- Returns:
Brier Score value (lower is better).
- Raises:
ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].
- ncaa_eval.evaluation.metrics.expected_calibration_error(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) float[source]¶
Compute Expected Calibration Error (ECE) using vectorized numpy.
ECE measures how well predicted probabilities match observed frequencies. Predictions are binned into
n_binsequal-width bins on [0, 1], and ECE is the weighted average of per-bin |accuracy - confidence| gaps.- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of equal-width bins (default 10).
- Returns:
ECE value in [0, 1] (lower is better).
- Raises:
ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].
- ncaa_eval.evaluation.metrics.get_metric(name: str) Callable[[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]], float][source]¶
Return the metric function registered under name.
- Raises:
MetricNotFoundError – If name is not registered.
- ncaa_eval.evaluation.metrics.list_metrics() list[str][source]¶
Return all registered metric names (sorted).
- ncaa_eval.evaluation.metrics.log_loss(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]¶
Compute Log Loss (cross-entropy loss) for binary predictions.
- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
- Returns:
Log Loss value.
- Raises:
ValueError – If inputs are empty, mismatched, or probabilities are outside [0, 1].
- ncaa_eval.evaluation.metrics.register_metric(name: str) Callable[[_MF], _MF][source]¶
Function decorator that registers a metric function.
- Parameters:
name – Registry key for the metric.
- Returns:
Decorator that registers the function and returns it unchanged.
- Raises:
ValueError – If name is already registered.
- ncaa_eval.evaluation.metrics.reliability_diagram_data(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]], *, n_bins: int = 10) ReliabilityData[source]¶
Generate reliability diagram data for calibration visualization.
Uses
sklearn.calibration.calibration_curvefor bin statistics and augments with per-bin sample counts.- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
n_bins – Number of bins (default 10).
- Returns:
Structured data containing fraction of positives, mean predicted values, bin counts, bin edges, and requested number of bins.
- Raises:
ValueError – If inputs are empty, mismatched,
n_bins < 1, or probabilities are outside [0, 1].
- ncaa_eval.evaluation.metrics.roc_auc(y_true: ndarray[tuple[Any, ...], dtype[float64]], y_prob: ndarray[tuple[Any, ...], dtype[float64]]) float[source]¶
Compute ROC-AUC for binary predictions.
- Parameters:
y_true – Binary labels (0 or 1).
y_prob – Predicted probabilities for the positive class.
- Returns:
ROC-AUC value.
- Raises:
ValueError – If inputs are empty, mismatched, probabilities are outside [0, 1], or
y_truecontains only one class (AUC is undefined).