ncaa_eval.model.ensemble module¶

Stacked ensemble model — orchestrates base models and a meta-learner.

StackedEnsemble is a standalone @dataclass (not a Model subclass) that holds a list of base Model instances and a stateless meta-learner. The training pipeline in cli/train.py dispatches on isinstance(model, StackedEnsemble) to invoke ensemble-specific training.

class ncaa_eval.model.ensemble.StackedEnsemble(base_models: list[~ncaa_eval.model.base.Model], meta_learner: ~ncaa_eval.model.base.Model, contextual_features: list[str] = <factory>, meta_column_order: list[str] = <factory>)[source]¶

Bases: object

Stacked generalisation ensemble.

Holds a list of base Model instances and a stateless meta-learner. The ensemble’s feature_config is the union of all base models’ configs.

base_models¶

Two or more trained (or to-be-trained) base models.

Type:: list[ncaa_eval.model.base.Model]

meta_learner¶

A stateless Model that learns to combine base model predictions with contextual features.

Type:: ncaa_eval.model.base.Model

contextual_features¶

Column names appended to OOF predictions before meta-learner training (e.g. seed_diff).

Type:: list[str]

base_models: list[Model]¶

contextual_features: list[str]¶

property feature_config: FeatureConfig¶: Return the union of all base model feature configs.

get_config() → StackedEnsembleConfig[source]¶: Return a serialisable configuration record.

classmethod load(path: Path) → StackedEnsemble[source]¶: Reconstruct a StackedEnsemble from a saved directory.

meta_column_order: list[str]¶

meta_learner: Model¶

predict_bracket(data_dir: Path, season: int) → DataFrame[source]¶

Generate an n×n pairwise probability matrix for bracket prediction.

Discovers tournament-eligible teams, generates per-base-model pairwise predictions, assembles meta-input for all C(n,2) matchups, and returns a probability matrix suitable for the Monte Carlo bracket simulator.

Parameters:

data_dir – Path to the local Parquet data store.
season – Target season year.

Returns:

DataFrame of shape (n, n) with team_id index and columns. P[a, b] is the ensemble probability that team a beats team b. Diagonal is zero; P[a,b] + P[b,a] ≈ 1.

Raises:

FileNotFoundError – If no season data exists for season.
ValueError – If any column in meta_column_order is missing from the assembled meta-input, or if a context feature array has unexpected length.

predict_proba(X: DataFrame) → Series[source]¶

Route features through base models and meta-learner.

For each base model, generates predictions by dispatching stateless models through X[base_model.feature_names_] and stateful models through the full X. Assembles base predictions and contextual features into a meta-input DataFrame in self.meta_column_order, then calls the meta-learner.

Parameters:: X – Feature DataFrame with at least the columns required by each base model and all contextual features.
Returns:: Series of ensemble win probabilities, indexed like X.
Raises:: ValueError – If any column in meta_column_order is missing from the assembled meta-input.

save(path: Path) → None[source]¶

Save the ensemble to path.

Layout:

path/
  manifest.json
  feature_config.json
  base_models/
    base_0/  …  (Model.save)
    base_1/  …
  meta_learner/  …  (Model.save)

class ncaa_eval.model.ensemble.StackedEnsembleConfig(*, model_name: str = 'ensemble', calibration_method: Literal['isotonic', 'sigmoid'] | None=None, base_model_types: list[str] = <factory>, contextual_features: list[str] = <factory>)[source]¶

Bases: ModelConfig

Configuration record for a stacked ensemble.

Stores base model types and contextual feature names for serialisation and run-tracking purposes.

base_model_types: list[str]¶

contextual_features: list[str]¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str¶