# How to Add a Custom Metric

This tutorial shows how to extend NCAA_eval's evaluation engine with custom
metrics and custom tournament scoring rules.

## Prerequisites

- Project installed (`poetry install`)
- Data synced and at least one model trained (see the
  [Getting Started Tutorial](getting-started.md))

## Part 1: Custom Metric Function

The evaluation engine uses metric functions with a simple contract:

```python
def my_metric(y_true: np.ndarray, y_prob: np.ndarray) -> float
```

- `y_true` — binary labels (0 or 1), shape `(n_games,)`
- `y_prob` — predicted probabilities in [0, 1], shape `(n_games,)`
- Returns a single `float` score

### Step 1: Write the Metric

Here is an example that computes the average absolute error (a simpler
alternative to Brier Score):

```python
import numpy as np
import numpy.typing as npt


def mean_absolute_error(
    y_true: npt.NDArray[np.float64],
    y_prob: npt.NDArray[np.float64],
) -> float:
    """Mean Absolute Error between predictions and outcomes.

    Lower is better. Random baseline (predicting 0.5): 0.5.
    """
    return float(np.mean(np.abs(y_prob - y_true)))
```

Another example — a "surprise" metric that measures how often the model was
confidently wrong:

```python
def surprise_rate(
    y_true: npt.NDArray[np.float64],
    y_prob: npt.NDArray[np.float64],
    threshold: float = 0.75,
) -> float:
    """Fraction of games where model was confident but wrong.

    A game is a "surprise" if model predicted > threshold for one team
    but the other team won. Lower is better.
    """
    confident_team_a = y_prob >= threshold
    confident_team_b = y_prob <= (1.0 - threshold)
    confident = confident_team_a | confident_team_b

    if not np.any(confident):
        return 0.0

    confident_wrong = (
        (confident_team_a & (y_true == 0))
        | (confident_team_b & (y_true == 1))
    )
    return float(np.sum(confident_wrong) / np.sum(confident))
```

```{note}
Metric functions with extra parameters (like `threshold` above) need a wrapper
to match the `(y_true, y_prob) -> float` signature when passed to the backtest.
See Step 2 below.
```

### Step 2: Use in a Backtest

Pass your custom metrics to `run_backtest` via the `metric_fns` parameter:

```python
from functools import partial
from pathlib import Path

import numpy as np
import numpy.typing as npt

from ncaa_eval.evaluation.backtest import run_backtest, DEFAULT_METRICS
from ncaa_eval.evaluation.metrics import log_loss, brier_score
from ncaa_eval.ingest import ParquetRepository
from ncaa_eval.model import get_model
from ncaa_eval.transform.feature_serving import FeatureConfig, StatefulFeatureServer
from ncaa_eval.transform.serving import ChronologicalDataServer


def mean_absolute_error(
    y_true: npt.NDArray[np.float64],
    y_prob: npt.NDArray[np.float64],
) -> float:
    return float(np.mean(np.abs(y_prob - y_true)))


def surprise_rate(
    y_true: npt.NDArray[np.float64],
    y_prob: npt.NDArray[np.float64],
    threshold: float = 0.75,
) -> float:
    confident_a = y_prob >= threshold
    confident_b = y_prob <= (1.0 - threshold)
    confident = confident_a | confident_b
    if not np.any(confident):
        return 0.0
    wrong = (confident_a & (y_true == 0)) | (confident_b & (y_true == 1))
    return float(np.sum(wrong) / np.sum(confident))


# Build metric dictionary — include built-in metrics plus custom ones
my_metrics = {
    "log_loss": log_loss,
    "brier_score": brier_score,
    "mae": mean_absolute_error,
    "surprise_75": partial(surprise_rate, threshold=0.75),
    "surprise_90": partial(surprise_rate, threshold=0.90),
}

# Set up model and feature server
model_cls = get_model("elo")
model = model_cls()
repo = ParquetRepository(base_path=Path("data/"))
data_server = ChronologicalDataServer(repo)
server = StatefulFeatureServer(config=FeatureConfig(), data_server=data_server)

# Run backtest with custom metrics
result = run_backtest(
    model=model,
    feature_server=server,
    seasons=list(range(2015, 2026)),
    mode="stateful",
    metric_fns=my_metrics,
)

# View per-year results
print(result.summary)
```

Expected output:

```text
      log_loss  brier_score    mae  surprise_75  surprise_90  elapsed_seconds
2016    0.5601      0.2082  0.412        0.182        0.091            1.23
2017    0.5483      0.2041  0.405        0.175        0.085            1.18
...
```

````{tip}
The `metric_fns=` dict approach is useful for ad-hoc metrics that should not be
globally registered. For metrics you want available everywhere (backtests,
dashboard, leaderboard), use the registry approach in Step 3 below.
````

### Step 3: Register for Global Use (Recommended)

Use the `@register_metric` decorator to make your metric available
automatically in all backtests and the dashboard — no need to pass
`metric_fns=` every time:

```python
import numpy as np
import numpy.typing as npt

from ncaa_eval.evaluation import register_metric


@register_metric("my_mae")
def mean_absolute_error(
    y_true: npt.NDArray[np.float64],
    y_prob: npt.NDArray[np.float64],
) -> float:
    """Mean Absolute Error between predictions and outcomes."""
    return float(np.mean(np.abs(y_prob - y_true)))
```

Once registered, the metric is automatically included when you run a backtest
with the default metrics (i.e., without passing `metric_fns=`):

```python
from ncaa_eval.evaluation.backtest import run_backtest

# No metric_fns= needed — all registered metrics (built-in + custom) are used
result = run_backtest(
    model=model,
    feature_server=server,
    seasons=list(range(2015, 2026)),
    mode="stateful",
)

# my_mae appears alongside the built-in metrics
print(result.summary.columns.tolist())
# ['brier_score', 'ece', 'log_loss', 'my_mae', 'roc_auc', 'elapsed_seconds']
```

Verify the registry contents with `list_metrics()`:

```python
from ncaa_eval.evaluation import list_metrics

print(list_metrics())
# ['brier_score', 'ece', 'log_loss', 'my_mae', 'roc_auc']
```

```{note}
**Custom metrics vs. custom scoring rules** are different extension mechanisms.
Metrics evaluate model prediction quality (input: predicted probabilities →
output: a score like log loss). Scoring rules define how bracket points are
awarded per tournament round (used in simulation). See Part 2 below for
custom scoring rules.
```

### Step 4: Verify Your Metric

Test your metric with known inputs to make sure it behaves correctly:

```python
import numpy as np

# Perfect predictions
y_true = np.array([1.0, 0.0, 1.0, 0.0])
y_prob = np.array([1.0, 0.0, 1.0, 0.0])
assert mean_absolute_error(y_true, y_prob) == 0.0

# Random predictions
y_true = np.array([1.0, 0.0, 1.0, 0.0])
y_prob = np.array([0.5, 0.5, 0.5, 0.5])
assert mean_absolute_error(y_true, y_prob) == 0.5

# Worst-case predictions
y_true = np.array([1.0, 0.0, 1.0, 0.0])
y_prob = np.array([0.0, 1.0, 0.0, 1.0])
assert mean_absolute_error(y_true, y_prob) == 1.0
```

## Part 2: Custom Tournament Scoring Rule

Scoring rules define how bracket points are awarded per round.  The platform
uses a `ScoringRule` protocol:

```python
class ScoringRule(Protocol):
    @property
    def name(self) -> str: ...

    def points_per_round(self, round_idx: int) -> float: ...
```

`round_idx` maps to tournament rounds:

| `round_idx` | Round |
|:-----------:|-------|
| 0 | Round of 64 |
| 1 | Round of 32 |
| 2 | Sweet 16 |
| 3 | Elite Eight |
| 4 | Final Four |
| 5 | Championship |

### Step 1: Create a Scoring Rule

Here is an example scoring rule where each round is worth 10x the previous:

```python
class ExponentialScoring:
    """Exponential scoring: 1-10-100-1000-10000-100000."""

    _POINTS = (1.0, 10.0, 100.0, 1000.0, 10_000.0, 100_000.0)

    @property
    def name(self) -> str:
        return "exponential"

    def points_per_round(self, round_idx: int) -> float:
        return self._POINTS[round_idx]
```

Alternatively, use the built-in `DictScoring` helper:

```python
from ncaa_eval.evaluation.scoring import DictScoring  # canonical path

my_pool_scoring = DictScoring(
    points={0: 2, 1: 3, 2: 5, 3: 10, 4: 15, 5: 25},
    scoring_name="my_pool",
)
```

### Step 2: Use in Simulation

Pass your scoring rule to the tournament simulator:

```python
from pathlib import Path

from ncaa_eval.evaluation.bracket import MatchupContext, build_bracket  # canonical path
from ncaa_eval.evaluation.providers import EloProvider  # canonical path
from ncaa_eval.evaluation.scoring import StandardScoring  # canonical path
from ncaa_eval.evaluation.simulation import simulate_tournament
from ncaa_eval.model.tracking import RunStore
from ncaa_eval.transform.normalization import TourneySeedTable

# Replace with the run ID printed when you trained the model
run_id = "<your-run-id>"

# Load tournament seeds from the Kaggle CSV in your data directory
seed_table = TourneySeedTable.from_csv(Path("data/kaggle/MNCAATourneySeeds.csv"))
seeds = seed_table.all_seeds(season=2024)  # list[TourneySeed]

# Build the 64-team bracket tree (play-in teams excluded automatically)
bracket = build_bracket(seeds, season=2024)

# Load the trained model via RunStore
store = RunStore(Path("data/"))
model = store.load_model(run_id)
assert model is not None, f"No model artifacts found for run_id={run_id!r}"

# Create probability provider (wraps the model's _predict_one method)
# NOTE: EloProvider requires a StatefulModel (one with _predict_one).
# Use the run_id of an Elo or other stateful model training run.
provider = EloProvider(model)
context = MatchupContext(season=2024, day_num=136, is_neutral=True)

# Simulate with both built-in and custom scoring
result = simulate_tournament(
    bracket=bracket,
    probability_provider=provider,
    context=context,
    scoring_rules=[StandardScoring(), ExponentialScoring()],
    method="monte_carlo",
    n_simulations=10_000,
)

# result.expected_points maps rule name → per-team expected-points array.
# bracket.team_ids[i] gives the team ID for bracket position i.
for rule_name, ep_array in result.expected_points.items():
    print(f"\n{rule_name}:")
    for i, ep in enumerate(ep_array):
        team_id = bracket.team_ids[i]
        print(f"  Team {team_id}: {ep:.1f} EP")
```

### Step 3: Register for CLI Use (Optional)

To make your scoring rule available in the dashboard and CLI, register it:

```python
from ncaa_eval.evaluation.scoring import register_scoring  # canonical path


@register_scoring("exponential")
class ExponentialScoring:
    """Exponential scoring: 1-10-100-1000-10000-100000."""

    _POINTS = (1.0, 10.0, 100.0, 1000.0, 10_000.0, 100_000.0)

    @property
    def name(self) -> str:
        return "exponential"

    def points_per_round(self, round_idx: int) -> float:
        return self._POINTS[round_idx]
```

Verify registration:

```python
from ncaa_eval.evaluation.scoring import list_scorings  # canonical path

print(list_scorings())
# ['fibonacci', 'seed_diff_bonus', 'standard', 'exponential']
```

## Summary

| Extension Point | Contract | Where to Use |
|----------------|----------|--------------|
| Custom metric (registered) | `@register_metric("name")` on `(y_true, y_prob) -> float` | Automatic in all backtests and dashboard |
| Custom metric (ad-hoc) | `(y_true, y_prob) -> float` | `run_backtest(metric_fns=...)` |
| Custom scoring rule | `ScoringRule` protocol (`.name`, `.points_per_round()`) | `simulate_tournament(scoring_rules=...)` |
| Dict-based scoring | `DictScoring(points={0: ..., 5: ...})` | Quick custom point schedules |

## Next Steps

- **Build a custom model** — See the [Custom Model Tutorial](custom-model.md)
- **Explore the built-in metrics** — See
  `src/ncaa_eval/evaluation/metrics.py` and the
  [User Guide — Evaluation Metrics](../user-guide.md#evaluation-metrics)
- **Tournament scoring details** — See the
  [User Guide — Tournament Scoring](../user-guide.md#tournament-scoring)