ncaa_eval.model package

Submodules

Module contents

Model implementations module.

class ncaa_eval.model.Model[source]

Bases: ABC

Abstract base class for all NCAA prediction models.

Every model — stateful or stateless — must implement these five methods so that the training CLI, evaluation engine, and persistence layer can treat all models uniformly.

feature_config

Declarative specification of which feature blocks the model expects. Set by subclass __init__.

Type:

ncaa_eval.transform.feature_serving.FeatureConfig

feature_config: FeatureConfig
abstractmethod fit(X: DataFrame, y: Series) None[source]

Train the model on feature matrix X and labels y.

abstractmethod get_config() ModelConfig[source]

Return the Pydantic-validated configuration for this model.

get_feature_importances() list[tuple[str, float]] | None[source]

Return feature name/importance pairs, or None if unavailable.

The default returns None. Models that support feature importances (e.g. XGBoost) should override this method.

abstractmethod classmethod load(path: Path) Self[source]

Load a previously-saved model from path.

abstractmethod predict_proba(X: DataFrame) Series[source]

Return P(team_a wins) in [0, 1] for each row of X.

abstractmethod save(path: Path) None[source]

Persist the trained model to path.

class ncaa_eval.model.ModelConfig(*, model_name: str, calibration_method: Literal['isotonic', 'sigmoid'] | None = None)[source]

Bases: BaseModel

Base configuration shared by all model implementations.

Subclasses add model-specific hyperparameters as additional fields.

calibration_method: CalibrationMethod | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str
exception ncaa_eval.model.ModelNotFoundError[source]

Bases: KeyError

Raised when a requested model name is not in the registry.

class ncaa_eval.model.StackedEnsemble(base_models: list[~ncaa_eval.model.base.Model], meta_learner: ~ncaa_eval.model.base.Model, contextual_features: list[str] = <factory>, meta_column_order: list[str] = <factory>)[source]

Bases: object

Stacked generalisation ensemble.

Holds a list of base Model instances and a stateless meta-learner. The ensemble’s feature_config is the union of all base models’ configs.

base_models

Two or more trained (or to-be-trained) base models.

Type:

list[ncaa_eval.model.base.Model]

meta_learner

A stateless Model that learns to combine base model predictions with contextual features.

Type:

ncaa_eval.model.base.Model

contextual_features

Column names appended to OOF predictions before meta-learner training (e.g. seed_diff).

Type:

list[str]

base_models: list[Model]
contextual_features: list[str]
property feature_config: FeatureConfig

Return the union of all base model feature configs.

get_config() StackedEnsembleConfig[source]

Return a serialisable configuration record.

classmethod load(path: Path) StackedEnsemble[source]

Reconstruct a StackedEnsemble from a saved directory.

meta_column_order: list[str]
meta_learner: Model
predict_bracket(data_dir: Path, season: int) DataFrame[source]

Generate an n×n pairwise probability matrix for bracket prediction.

Discovers tournament-eligible teams, generates per-base-model pairwise predictions, assembles meta-input for all C(n,2) matchups, and returns a probability matrix suitable for the Monte Carlo bracket simulator.

Parameters:
  • data_dir – Path to the local Parquet data store.

  • season – Target season year.

Returns:

DataFrame of shape (n, n) with team_id index and columns. P[a, b] is the ensemble probability that team a beats team b. Diagonal is zero; P[a,b] + P[b,a] 1.

Raises:
  • FileNotFoundError – If no season data exists for season.

  • ValueError – If any column in meta_column_order is missing from the assembled meta-input, or if a context feature array has unexpected length.

predict_proba(X: DataFrame) Series[source]

Route features through base models and meta-learner.

For each base model, generates predictions by dispatching stateless models through X[base_model.feature_names_] and stateful models through the full X. Assembles base predictions and contextual features into a meta-input DataFrame in self.meta_column_order, then calls the meta-learner.

Parameters:

X – Feature DataFrame with at least the columns required by each base model and all contextual features.

Returns:

Series of ensemble win probabilities, indexed like X.

Raises:

ValueError – If any column in meta_column_order is missing from the assembled meta-input.

save(path: Path) None[source]

Save the ensemble to path.

Layout:

path/
  manifest.json
  feature_config.json
  base_models/
    base_0/  …  (Model.save)
    base_1/  …
  meta_learner/  …  (Model.save)
class ncaa_eval.model.StatefulModel[source]

Bases: Model

Template base for models that process games sequentially.

Concrete methods fit and predict_proba are provided as template methods. Subclasses implement the abstract hooks:

  • update(game) — absorb a single game result

  • _predict_one(team_a_id, team_b_id) — return P(team_a wins)

  • start_season(season) — reset / prepare for a new season

  • get_state() / set_state(state) — snapshot / restore ratings

fit(X: DataFrame, y: Series) None[source]

Reconstruct games from X/y and update sequentially.

Reconstructs Game objects from the feature matrix and labels, then iterates chronologically, calling start_season() on season boundaries and update() per game.

abstractmethod get_state() dict[str, Any][source]

Return a serialisable snapshot of internal ratings.

predict_matchup(team_a_id: int, team_b_id: int) float[source]

Return P(team_a wins) for a single matchup.

Delegates to the _predict_one abstract hook.

predict_proba(X: DataFrame) Series[source]

Call _predict_one per row using itertuples.

abstractmethod set_state(state: dict[str, Any]) None[source]

Restore internal ratings from a snapshot.

abstractmethod start_season(season: int) None[source]

Called before the first game of each season.

abstractmethod update(game: Game) None[source]

Absorb the result of a single game.

ncaa_eval.model.get_model(name: str) type[Model][source]

Return the model class registered under name.

Raises ModelNotFoundError if not found.

ncaa_eval.model.list_models() list[str][source]

Return all registered model names (sorted).

ncaa_eval.model.register_model(name: str) Callable[[_T], _T][source]

Class decorator that registers a Model subclass.

Usage:

@register_model("elo")
class EloModel(StatefulModel):
    ...