ncaa_eval.model package¶

Submodules¶

Module contents¶

Model implementations module.

class ncaa_eval.model.Model[source]¶

Bases: ABC

Abstract base class for all NCAA prediction models.

Every model — stateful or stateless — must implement these five methods so that the training CLI, evaluation engine, and persistence layer can treat all models uniformly.

feature_config¶

Declarative specification of which feature blocks the model expects. Set by subclass __init__.

Type:: ncaa_eval.transform.feature_serving.FeatureConfig

feature_config: FeatureConfig¶

abstractmethod fit(X: DataFrame, y: Series) → None[source]¶: Train the model on feature matrix X and labels y.

abstractmethod get_config() → ModelConfig[source]¶: Return the Pydantic-validated configuration for this model.

get_feature_importances() → list[tuple[str, float]] | None[source]¶

Return feature name/importance pairs, or None if unavailable.

The default returns None. Models that support feature importances (e.g. XGBoost) should override this method.

abstractmethod classmethod load(path: Path) → Self[source]¶: Load a previously-saved model from path.

abstractmethod predict_proba(X: DataFrame) → Series[source]¶: Return P(team_a wins) in [0, 1] for each row of X.

abstractmethod save(path: Path) → None[source]¶: Persist the trained model to path.

class ncaa_eval.model.ModelConfig(*, model_name: str, calibration_method: Literal['isotonic', 'sigmoid'] | None = None)[source]¶

Bases: BaseModel

Base configuration shared by all model implementations.

Subclasses add model-specific hyperparameters as additional fields.

calibration_method: CalibrationMethod | None¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str¶

exception ncaa_eval.model.ModelNotFoundError[source]¶

Bases: KeyError

Raised when a requested model name is not in the registry.

class ncaa_eval.model.StackedEnsemble(base_models: list[~ncaa_eval.model.base.Model], meta_learner: ~ncaa_eval.model.base.Model, contextual_features: list[str] = <factory>, meta_column_order: list[str] = <factory>)[source]¶

Bases: object

Stacked generalisation ensemble.

Holds a list of base Model instances and a stateless meta-learner. The ensemble’s feature_config is the union of all base models’ configs.

base_models¶

Two or more trained (or to-be-trained) base models.

Type:: list[ncaa_eval.model.base.Model]

meta_learner¶

A stateless Model that learns to combine base model predictions with contextual features.

Type:: ncaa_eval.model.base.Model

contextual_features¶

Column names appended to OOF predictions before meta-learner training (e.g. seed_diff).

Type:: list[str]

base_models: list[Model]¶

contextual_features: list[str]¶

property feature_config: FeatureConfig¶: Return the union of all base model feature configs.

get_config() → StackedEnsembleConfig[source]¶: Return a serialisable configuration record.

classmethod load(path: Path) → StackedEnsemble[source]¶: Reconstruct a StackedEnsemble from a saved directory.

meta_column_order: list[str]¶

meta_learner: Model¶

predict_bracket(data_dir: Path, season: int) → DataFrame[source]¶

Generate an n×n pairwise probability matrix for bracket prediction.

Discovers tournament-eligible teams, generates per-base-model pairwise predictions, assembles meta-input for all C(n,2) matchups, and returns a probability matrix suitable for the Monte Carlo bracket simulator.

Parameters:

data_dir – Path to the local Parquet data store.
season – Target season year.

Returns:

DataFrame of shape (n, n) with team_id index and columns. P[a, b] is the ensemble probability that team a beats team b. Diagonal is zero; P[a,b] + P[b,a] ≈ 1.

Raises:

FileNotFoundError – If no season data exists for season.
ValueError – If any column in meta_column_order is missing from the assembled meta-input, or if a context feature array has unexpected length.

predict_proba(X: DataFrame) → Series[source]¶

Route features through base models and meta-learner.

For each base model, generates predictions by dispatching stateless models through X[base_model.feature_names_] and stateful models through the full X. Assembles base predictions and contextual features into a meta-input DataFrame in self.meta_column_order, then calls the meta-learner.

Parameters:: X – Feature DataFrame with at least the columns required by each base model and all contextual features.
Returns:: Series of ensemble win probabilities, indexed like X.
Raises:: ValueError – If any column in meta_column_order is missing from the assembled meta-input.

save(path: Path) → None[source]¶

Save the ensemble to path.

Layout:

path/
  manifest.json
  feature_config.json
  base_models/
    base_0/  …  (Model.save)
    base_1/  …
  meta_learner/  …  (Model.save)

class ncaa_eval.model.StatefulModel[source]¶

Bases: Model

Template base for models that process games sequentially.

Concrete methods fit and predict_proba are provided as template methods. Subclasses implement the abstract hooks:

update(game) — absorb a single game result
_predict_one(team_a_id, team_b_id) — return P(team_a wins)
start_season(season) — reset / prepare for a new season
get_state() / set_state(state) — snapshot / restore ratings

fit(X: DataFrame, y: Series) → None[source]¶

Reconstruct games from X/y and update sequentially.

Reconstructs Game objects from the feature matrix and labels, then iterates chronologically, calling start_season() on season boundaries and update() per game.

abstractmethod get_state() → dict[str, Any][source]¶: Return a serialisable snapshot of internal ratings.

predict_matchup(team_a_id: int, team_b_id: int) → float[source]¶

Return P(team_a wins) for a single matchup.

Delegates to the _predict_one abstract hook.

predict_proba(X: DataFrame) → Series[source]¶: Call _predict_one per row using itertuples.

abstractmethod set_state(state: dict[str, Any]) → None[source]¶: Restore internal ratings from a snapshot.

abstractmethod start_season(season: int) → None[source]¶: Called before the first game of each season.

abstractmethod update(game: Game) → None[source]¶: Absorb the result of a single game.

ncaa_eval.model.get_model(name: str) → type[Model][source]¶

Return the model class registered under name.

Raises ModelNotFoundError if not found.

ncaa_eval.model.list_models() → list[str][source]¶: Return all registered model names (sorted).

ncaa_eval.model.register_model(name: str) → Callable[[_T], _T][source]¶

Class decorator that registers a Model subclass.

Usage:

@register_model("elo")
class EloModel(StatefulModel):
    ...