How to Create a Custom Model

This tutorial walks you through building and registering a custom prediction model. By the end, you will have a working model that integrates with the CLI, evaluation engine, and dashboard.

NCAA_eval supports two model paradigms:

  • Stateless — batch-trained classifiers (like XGBoost or Logistic Regression)

  • Stateful — sequential-update models (like Elo) that maintain per-team ratings updated game-by-game

This tutorial covers both.

Prerequisites

  • Project installed (poetry install)

  • Data synced (python sync.py --source all --dest data/)

  • At least one model trained (see the Getting Started Tutorial)

Part 1: Stateless Model (Feature-Based)

A stateless model receives a feature matrix X and binary labels y, and produces win probabilities. This is the simpler paradigm — if you have a standard ML classifier, wrap it here.

Step 1: Define the Config

Every model needs a Pydantic config class that extends ModelConfig:

# my_model.py
from __future__ import annotations

import json
from pathlib import Path
from typing import Literal, Self

import numpy as np
import pandas as pd
from pydantic import Field

from ncaa_eval.model.base import Model, ModelConfig
from ncaa_eval.model.registry import register_model


class WeightedAverageConfig(ModelConfig):
    """Hyperparameters for the weighted-average model."""

    model_name: Literal["weighted_avg"] = "weighted_avg"
    home_weight: float = Field(default=0.6, ge=0.0, le=1.0)
    recency_decay: float = Field(default=0.95, ge=0.0, le=1.0)

The model_name field must match the name you will use with @register_model.

Step 2: Implement the Model ABC

Subclass Model and implement all five abstract methods:

@register_model("weighted_avg")
class WeightedAverageModel(Model):
    """A simple model that predicts based on weighted feature averages."""

    def __init__(self, config: WeightedAverageConfig | None = None) -> None:
        self._config = config or WeightedAverageConfig()
        self._weights: np.ndarray | None = None

    def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
        """Learn feature weights from training data."""
        # Simple example: correlation between each feature and outcome
        correlations = X.corrwith(y).fillna(0.0)
        self._weights = correlations.values

    def predict_proba(self, X: pd.DataFrame) -> pd.Series:
        """Return P(team_a wins) for each row."""
        if self._weights is None:
            msg = "Model must be fit() before predict_proba()"
            raise RuntimeError(msg)
        # Weighted sum → sigmoid → probability
        raw = X.values @ self._weights
        probs = 1.0 / (1.0 + np.exp(-raw))
        return pd.Series(probs, index=X.index)

    def save(self, path: Path) -> None:
        """Persist model to directory."""
        path.mkdir(parents=True, exist_ok=True)
        (path / "config.json").write_text(self._config.model_dump_json())
        if self._weights is not None:
            np.save(path / "weights.npy", self._weights)

    @classmethod
    def load(cls, path: Path) -> Self:
        """Restore model from directory."""
        config = WeightedAverageConfig.model_validate_json(
            (path / "config.json").read_text()
        )
        instance = cls(config)
        weights_path = path / "weights.npy"
        if weights_path.exists():
            instance._weights = np.load(weights_path)
        return instance

    def get_config(self) -> WeightedAverageConfig:
        """Return the model's configuration."""
        return self._config

Key contract:

  • fit(X, y)X is a pandas DataFrame of numeric features, y is a binary Series (1 = team_a won, 0 = team_b won)

  • predict_proba(X) — returns a Series of probabilities in [0, 1]

  • save(path) / load(path) — persist to and restore from a directory

  • get_config() — return the Pydantic config instance

Note

The backtest pipeline automatically strips metadata columns (game_id, season, day_num, etc.) before passing X to stateless models. Your fit() and predict_proba() only see numeric feature columns.

Step 3: Register and Use

The @register_model("weighted_avg") decorator handles registration. To use your model, ensure the module is imported before the CLI runs.

Option A: Place my_model.py in src/ncaa_eval/model/ and add an import in src/ncaa_eval/model/__init__.py:

# In src/ncaa_eval/model/__init__.py, add:
import ncaa_eval.model.my_model  # noqa: F401

Option B: Import it in a script:

import ncaa_eval.model.my_model  # registers "weighted_avg"
from ncaa_eval.model import list_models

print(list_models())
# ['elo', 'logistic_regression', 'weighted_avg', 'xgboost']

Then train via the CLI:

python -m ncaa_eval.cli train --model weighted_avg

Step 4: Save and Load

The training pipeline calls save() automatically. To manually save and load:

from pathlib import Path

model = WeightedAverageModel()
# ... fit the model ...
model.save(Path("data/runs/my_run/model"))

# Later, restore it:
restored = WeightedAverageModel.load(Path("data/runs/my_run/model"))

Tip

Look at src/ncaa_eval/model/logistic_regression.py for a minimal (~30 lines) reference implementation of a stateless model.

Part 2: Stateful Model (Rating-Based)

A stateful model processes games sequentially and maintains internal state (e.g., per-team ratings). The StatefulModel base class provides concrete fit() and predict_proba() implementations — you implement five hooks.

Step 1: Define Config and Model

# my_stateful_model.py
from __future__ import annotations

import json
from pathlib import Path
from typing import Any, Literal, Self

from ncaa_eval.ingest.schema import Game
from ncaa_eval.model.base import ModelConfig, StatefulModel
from ncaa_eval.model.registry import register_model


class SimpleRatingConfig(ModelConfig):
    """Hyperparameters for a simple win-percentage rating model."""

    model_name: Literal["simple_rating"] = "simple_rating"
    initial_rating: float = 0.5
    learning_rate: float = 0.1
    mean_reversion: float = 0.3


@register_model("simple_rating")
class SimpleRatingModel(StatefulModel):
    """A minimal rating model: tracks team win percentages with smoothing."""

    def __init__(self, config: SimpleRatingConfig | None = None) -> None:
        self._config = config or SimpleRatingConfig()
        self._ratings: dict[int, float] = {}

    def start_season(self, season: int) -> None:
        """Mean-revert ratings at the start of each season."""
        mean = self._config.initial_rating
        frac = self._config.mean_reversion
        self._ratings = {
            tid: mean * frac + rating * (1 - frac)
            for tid, rating in self._ratings.items()
        }

    def update(self, game: Game) -> None:
        """Update ratings based on game outcome."""
        lr = self._config.learning_rate
        init = self._config.initial_rating

        w_rating = self._ratings.get(game.w_team_id, init)
        l_rating = self._ratings.get(game.l_team_id, init)

        # Winner's rating increases, loser's decreases
        self._ratings[game.w_team_id] = w_rating + lr * (1.0 - w_rating)
        self._ratings[game.l_team_id] = l_rating + lr * (0.0 - l_rating)

    def _predict_one(self, team_a_id: int, team_b_id: int) -> float:
        """Return P(team_a wins) based on rating difference."""
        init = self._config.initial_rating
        a = self._ratings.get(team_a_id, init)
        b = self._ratings.get(team_b_id, init)
        # Simple sigmoid of rating difference
        diff = a - b
        return 1.0 / (1.0 + 10.0 ** (-diff / 0.2))

    def get_state(self) -> dict[str, Any]:
        """Snapshot current ratings for serialization."""
        return {"ratings": dict(self._ratings)}

    def set_state(self, state: dict[str, Any]) -> None:
        """Restore ratings from a snapshot."""
        self._ratings = dict(state["ratings"])

    def save(self, path: Path) -> None:
        """Persist model config and state."""
        path.mkdir(parents=True, exist_ok=True)
        (path / "config.json").write_text(self._config.model_dump_json())
        (path / "state.json").write_text(
            json.dumps(self.get_state())
        )

    @classmethod
    def load(cls, path: Path) -> Self:
        """Restore model from saved files."""
        config = SimpleRatingConfig.model_validate_json(
            (path / "config.json").read_text()
        )
        instance = cls(config)
        state = json.loads((path / "state.json").read_text())
        instance.set_state(state)
        return instance

    def get_config(self) -> SimpleRatingConfig:
        return self._config

Step 2: Understand the Hooks

The StatefulModel base class calls your hooks in this order:

  1. start_season(season) — Called before the first game of each season. Use this to mean-revert ratings or reset accumulators.

  2. update(game) — Called once per game, in chronological order. The Game object contains:

    • w_team_id / l_team_id — winner and loser team IDs

    • w_score / l_score — final scores

    • loc — “H” (home), “A” (away), or “N” (neutral)

    • num_ot — number of overtime periods

    • is_tournament — True for NCAA tournament games

  3. _predict_one(team_a_id, team_b_id) — Return P(team_a wins) using current internal ratings. The base class calls this for each game in the test set.

  4. get_state() / set_state(state) — Serialize and restore internal state. Used for model persistence and for the evaluation engine to snapshot state between folds.

Warning

The fit() and predict_proba() methods are provided by StatefulModel — do not override them. They handle the game reconstruction, season iteration, and per-row prediction logic automatically.

Step 3: Train and Evaluate

Register and train just like a stateless model:

python -m ncaa_eval.cli train --model simple_rating

Tip

See src/ncaa_eval/model/elo.py for the full reference implementation of a stateful model with margin-of-victory adjustments, variable K-factors, and home-court advantage.

Running Evaluation with a Custom Model

Once trained, your model’s run artifacts appear in data/runs/<run_id>/. The dashboard automatically picks them up:

streamlit run dashboard/app.py

Select your model run in the sidebar to see its metrics on the Leaderboard and Deep Dive pages.

To run a backtest programmatically:

from pathlib import Path

from ncaa_eval.evaluation.backtest import run_backtest
from ncaa_eval.ingest import ParquetRepository
from ncaa_eval.transform.feature_serving import FeatureConfig, StatefulFeatureServer
from ncaa_eval.transform.serving import ChronologicalDataServer

# Instantiate your custom model (SimpleRatingModel from Part 2 above)
model = SimpleRatingModel()

# Create feature server
repo = ParquetRepository(base_path=Path("data/"))
data_server = ChronologicalDataServer(repo)
config = FeatureConfig()
server = StatefulFeatureServer(config=config, data_server=data_server)

# Run backtest — use "stateful" for StatefulModel, "batch" for stateless Model
result = run_backtest(
    model=model,
    feature_server=server,
    seasons=list(range(2015, 2026)),
    mode="stateful",   # use "batch" for stateless Model subclasses
)

# Print per-year metrics
print(result.summary)

Summary

Step

Stateless (Model)

Stateful (StatefulModel)

Config

Extend ModelConfig

Extend ModelConfig

Core methods

fit, predict_proba

update, _predict_one, start_season

State mgmt

N/A

get_state, set_state

Persistence

save, load

save, load

Config access

get_config

get_config

Register

@register_model("name")

@register_model("name")

Train

python -m ncaa_eval.cli train --model name

Same

Next Steps

  • Add a custom metric — See the Custom Metric Tutorial

  • Compare models — Train multiple models and use the Leaderboard to compare

  • Explore the reference implementations:

    • src/ncaa_eval/model/logistic_regression.py — minimal stateless model

    • src/ncaa_eval/model/elo.py — full stateful model

    • src/ncaa_eval/model/xgboost_model.py — production stateless model