# How to Create a Custom Model

This tutorial walks you through building and registering a custom prediction
model.  By the end, you will have a working model that integrates with the CLI,
evaluation engine, and dashboard.

NCAA_eval supports two model paradigms:

- **Stateless** — batch-trained classifiers (like XGBoost or Logistic Regression)
- **Stateful** — sequential-update models (like Elo) that maintain per-team
  ratings updated game-by-game

This tutorial covers both.

## Prerequisites

- Project installed (`poetry install`)
- Data synced (`python sync.py --source all --dest data/`)
- At least one model trained (see the [Getting Started Tutorial](getting-started.md))

## Part 1: Stateless Model (Feature-Based)

A stateless model receives a feature matrix `X` and binary labels `y`, and
produces win probabilities.  This is the simpler paradigm — if you have a
standard ML classifier, wrap it here.

### Step 1: Define the Config

Every model needs a Pydantic config class that extends `ModelConfig`:

```python
# my_model.py
from __future__ import annotations

import json
from pathlib import Path
from typing import Literal, Self

import numpy as np
import pandas as pd
from pydantic import Field

from ncaa_eval.model.base import Model, ModelConfig
from ncaa_eval.model.registry import register_model


class WeightedAverageConfig(ModelConfig):
    """Hyperparameters for the weighted-average model."""

    model_name: Literal["weighted_avg"] = "weighted_avg"
    home_weight: float = Field(default=0.6, ge=0.0, le=1.0)
    recency_decay: float = Field(default=0.95, ge=0.0, le=1.0)
```

The `model_name` field must match the name you will use with `@register_model`.

### Step 2: Implement the Model ABC

Subclass `Model` and implement all five abstract methods:

```python
@register_model("weighted_avg")
class WeightedAverageModel(Model):
    """A simple model that predicts based on weighted feature averages."""

    def __init__(self, config: WeightedAverageConfig | None = None) -> None:
        self._config = config or WeightedAverageConfig()
        self._weights: np.ndarray | None = None

    def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
        """Learn feature weights from training data."""
        # Simple example: correlation between each feature and outcome
        correlations = X.corrwith(y).fillna(0.0)
        self._weights = correlations.values

    def predict_proba(self, X: pd.DataFrame) -> pd.Series:
        """Return P(team_a wins) for each row."""
        if self._weights is None:
            msg = "Model must be fit() before predict_proba()"
            raise RuntimeError(msg)
        # Weighted sum → sigmoid → probability
        raw = X.values @ self._weights
        probs = 1.0 / (1.0 + np.exp(-raw))
        return pd.Series(probs, index=X.index)

    def save(self, path: Path) -> None:
        """Persist model to directory."""
        path.mkdir(parents=True, exist_ok=True)
        (path / "config.json").write_text(self._config.model_dump_json())
        if self._weights is not None:
            np.save(path / "weights.npy", self._weights)

    @classmethod
    def load(cls, path: Path) -> Self:
        """Restore model from directory."""
        config = WeightedAverageConfig.model_validate_json(
            (path / "config.json").read_text()
        )
        instance = cls(config)
        weights_path = path / "weights.npy"
        if weights_path.exists():
            instance._weights = np.load(weights_path)
        return instance

    def get_config(self) -> WeightedAverageConfig:
        """Return the model's configuration."""
        return self._config
```

**Key contract:**
- `fit(X, y)` — `X` is a pandas DataFrame of numeric features, `y` is a binary
  Series (1 = team_a won, 0 = team_b won)
- `predict_proba(X)` — returns a Series of probabilities in [0, 1]
- `save(path)` / `load(path)` — persist to and restore from a directory
- `get_config()` — return the Pydantic config instance

```{note}
The backtest pipeline automatically strips metadata columns (game_id, season,
day_num, etc.) before passing `X` to stateless models.  Your `fit()` and
`predict_proba()` only see numeric feature columns.
```

### Step 3: Register and Use

The `@register_model("weighted_avg")` decorator handles registration.  To use
your model, ensure the module is imported before the CLI runs.

**Option A:** Place `my_model.py` in `src/ncaa_eval/model/` and add an import
in `src/ncaa_eval/model/__init__.py`:

```python
# In src/ncaa_eval/model/__init__.py, add:
import ncaa_eval.model.my_model  # noqa: F401
```

**Option B:** Import it in a script:

```python
import ncaa_eval.model.my_model  # registers "weighted_avg"
from ncaa_eval.model import list_models

print(list_models())
# ['elo', 'logistic_regression', 'weighted_avg', 'xgboost']
```

Then train via the CLI:

```bash
python -m ncaa_eval.cli train --model weighted_avg
```

### Step 4: Save and Load

The training pipeline calls `save()` automatically.  To manually save and load:

```python
from pathlib import Path

model = WeightedAverageModel()
# ... fit the model ...
model.save(Path("data/runs/my_run/model"))

# Later, restore it:
restored = WeightedAverageModel.load(Path("data/runs/my_run/model"))
```

```{tip}
Look at `src/ncaa_eval/model/logistic_regression.py` for a minimal (~30 lines)
reference implementation of a stateless model.
```

## Part 2: Stateful Model (Rating-Based)

A stateful model processes games sequentially and maintains internal state
(e.g., per-team ratings).  The `StatefulModel` base class provides concrete
`fit()` and `predict_proba()` implementations — you implement five hooks.

### Step 1: Define Config and Model

```python
# my_stateful_model.py
from __future__ import annotations

import json
from pathlib import Path
from typing import Any, Literal, Self

from ncaa_eval.ingest.schema import Game
from ncaa_eval.model.base import ModelConfig, StatefulModel
from ncaa_eval.model.registry import register_model


class SimpleRatingConfig(ModelConfig):
    """Hyperparameters for a simple win-percentage rating model."""

    model_name: Literal["simple_rating"] = "simple_rating"
    initial_rating: float = 0.5
    learning_rate: float = 0.1
    mean_reversion: float = 0.3


@register_model("simple_rating")
class SimpleRatingModel(StatefulModel):
    """A minimal rating model: tracks team win percentages with smoothing."""

    def __init__(self, config: SimpleRatingConfig | None = None) -> None:
        self._config = config or SimpleRatingConfig()
        self._ratings: dict[int, float] = {}

    def start_season(self, season: int) -> None:
        """Mean-revert ratings at the start of each season."""
        mean = self._config.initial_rating
        frac = self._config.mean_reversion
        self._ratings = {
            tid: mean * frac + rating * (1 - frac)
            for tid, rating in self._ratings.items()
        }

    def update(self, game: Game) -> None:
        """Update ratings based on game outcome."""
        lr = self._config.learning_rate
        init = self._config.initial_rating

        w_rating = self._ratings.get(game.w_team_id, init)
        l_rating = self._ratings.get(game.l_team_id, init)

        # Winner's rating increases, loser's decreases
        self._ratings[game.w_team_id] = w_rating + lr * (1.0 - w_rating)
        self._ratings[game.l_team_id] = l_rating + lr * (0.0 - l_rating)

    def _predict_one(self, team_a_id: int, team_b_id: int) -> float:
        """Return P(team_a wins) based on rating difference."""
        init = self._config.initial_rating
        a = self._ratings.get(team_a_id, init)
        b = self._ratings.get(team_b_id, init)
        # Simple sigmoid of rating difference
        diff = a - b
        return 1.0 / (1.0 + 10.0 ** (-diff / 0.2))

    def get_state(self) -> dict[str, Any]:
        """Snapshot current ratings for serialization."""
        return {"ratings": dict(self._ratings)}

    def set_state(self, state: dict[str, Any]) -> None:
        """Restore ratings from a snapshot."""
        self._ratings = dict(state["ratings"])

    def save(self, path: Path) -> None:
        """Persist model config and state."""
        path.mkdir(parents=True, exist_ok=True)
        (path / "config.json").write_text(self._config.model_dump_json())
        (path / "state.json").write_text(
            json.dumps(self.get_state())
        )

    @classmethod
    def load(cls, path: Path) -> Self:
        """Restore model from saved files."""
        config = SimpleRatingConfig.model_validate_json(
            (path / "config.json").read_text()
        )
        instance = cls(config)
        state = json.loads((path / "state.json").read_text())
        instance.set_state(state)
        return instance

    def get_config(self) -> SimpleRatingConfig:
        return self._config
```

### Step 2: Understand the Hooks

The `StatefulModel` base class calls your hooks in this order:

1. **`start_season(season)`** — Called before the first game of each season.
   Use this to mean-revert ratings or reset accumulators.

2. **`update(game)`** — Called once per game, in chronological order.  The
   `Game` object contains:
   - `w_team_id` / `l_team_id` — winner and loser team IDs
   - `w_score` / `l_score` — final scores
   - `loc` — "H" (home), "A" (away), or "N" (neutral)
   - `num_ot` — number of overtime periods
   - `is_tournament` — True for NCAA tournament games

3. **`_predict_one(team_a_id, team_b_id)`** — Return P(team_a wins) using
   current internal ratings.  The base class calls this for each game in the
   test set.

4. **`get_state()` / `set_state(state)`** — Serialize and restore internal
   state.  Used for model persistence and for the evaluation engine to snapshot
   state between folds.

```{warning}
The `fit()` and `predict_proba()` methods are provided by `StatefulModel` —
do **not** override them.  They handle the game reconstruction, season
iteration, and per-row prediction logic automatically.
```

### Step 3: Train and Evaluate

Register and train just like a stateless model:

```bash
python -m ncaa_eval.cli train --model simple_rating
```

```{tip}
See `src/ncaa_eval/model/elo.py` for the full reference implementation of a
stateful model with margin-of-victory adjustments, variable K-factors, and
home-court advantage.
```

## Running Evaluation with a Custom Model

Once trained, your model's run artifacts appear in `data/runs/<run_id>/`.  The
dashboard automatically picks them up:

```bash
streamlit run dashboard/app.py
```

Select your model run in the sidebar to see its metrics on the Leaderboard and
Deep Dive pages.

To run a backtest programmatically:

```python
from pathlib import Path

from ncaa_eval.evaluation.backtest import run_backtest
from ncaa_eval.ingest import ParquetRepository
from ncaa_eval.transform.feature_serving import FeatureConfig, StatefulFeatureServer
from ncaa_eval.transform.serving import ChronologicalDataServer

# Instantiate your custom model (SimpleRatingModel from Part 2 above)
model = SimpleRatingModel()

# Create feature server
repo = ParquetRepository(base_path=Path("data/"))
data_server = ChronologicalDataServer(repo)
config = FeatureConfig()
server = StatefulFeatureServer(config=config, data_server=data_server)

# Run backtest — use "stateful" for StatefulModel, "batch" for stateless Model
result = run_backtest(
    model=model,
    feature_server=server,
    seasons=list(range(2015, 2026)),
    mode="stateful",   # use "batch" for stateless Model subclasses
)

# Print per-year metrics
print(result.summary)
```

## Summary

| Step | Stateless (`Model`) | Stateful (`StatefulModel`) |
|------|--------------------|-----------------------------|
| Config | Extend `ModelConfig` | Extend `ModelConfig` |
| Core methods | `fit`, `predict_proba` | `update`, `_predict_one`, `start_season` |
| State mgmt | N/A | `get_state`, `set_state` |
| Persistence | `save`, `load` | `save`, `load` |
| Config access | `get_config` | `get_config` |
| Register | `@register_model("name")` | `@register_model("name")` |
| Train | `python -m ncaa_eval.cli train --model name` | Same |

## Next Steps

- **Add a custom metric** — See the [Custom Metric Tutorial](custom-metric.md)
- **Compare models** — Train multiple models and use the Leaderboard to compare
- **Explore the reference implementations:**
  - `src/ncaa_eval/model/logistic_regression.py` — minimal stateless model
  - `src/ncaa_eval/model/elo.py` — full stateful model
  - `src/ncaa_eval/model/xgboost_model.py` — production stateless model