ncaa_eval.model.ensemble module

Stacked ensemble model — orchestrates base models and a meta-learner.

StackedEnsemble is a standalone @dataclass (not a Model subclass) that holds a list of base Model instances and a stateless meta-learner. The training pipeline in cli/train.py dispatches on isinstance(model, StackedEnsemble) to invoke ensemble-specific training.

class ncaa_eval.model.ensemble.StackedEnsemble(base_models: list[~ncaa_eval.model.base.Model], meta_learner: ~ncaa_eval.model.base.Model, contextual_features: list[str] = <factory>, meta_column_order: list[str] = <factory>)[source]

Bases: object

Stacked generalisation ensemble.

Holds a list of base Model instances and a stateless meta-learner. The ensemble’s feature_config is the union of all base models’ configs.

base_models

Two or more trained (or to-be-trained) base models.

Type:

list[ncaa_eval.model.base.Model]

meta_learner

A stateless Model that learns to combine base model predictions with contextual features.

Type:

ncaa_eval.model.base.Model

contextual_features

Column names appended to OOF predictions before meta-learner training (e.g. seed_diff).

Type:

list[str]

base_models: list[Model]
contextual_features: list[str]
property feature_config: FeatureConfig

Return the union of all base model feature configs.

get_config() StackedEnsembleConfig[source]

Return a serialisable configuration record.

classmethod load(path: Path) StackedEnsemble[source]

Reconstruct a StackedEnsemble from a saved directory.

meta_column_order: list[str]
meta_learner: Model
predict_bracket(data_dir: Path, season: int) DataFrame[source]

Generate an n×n pairwise probability matrix for bracket prediction.

Discovers tournament-eligible teams, generates per-base-model pairwise predictions, assembles meta-input for all C(n,2) matchups, and returns a probability matrix suitable for the Monte Carlo bracket simulator.

Parameters:
  • data_dir – Path to the local Parquet data store.

  • season – Target season year.

Returns:

DataFrame of shape (n, n) with team_id index and columns. P[a, b] is the ensemble probability that team a beats team b. Diagonal is zero; P[a,b] + P[b,a] 1.

Raises:
  • FileNotFoundError – If no season data exists for season.

  • ValueError – If any column in meta_column_order is missing from the assembled meta-input, or if a context feature array has unexpected length.

predict_proba(X: DataFrame) Series[source]

Route features through base models and meta-learner.

For each base model, generates predictions by dispatching stateless models through X[base_model.feature_names_] and stateful models through the full X. Assembles base predictions and contextual features into a meta-input DataFrame in self.meta_column_order, then calls the meta-learner.

Parameters:

X – Feature DataFrame with at least the columns required by each base model and all contextual features.

Returns:

Series of ensemble win probabilities, indexed like X.

Raises:

ValueError – If any column in meta_column_order is missing from the assembled meta-input.

save(path: Path) None[source]

Save the ensemble to path.

Layout:

path/
  manifest.json
  feature_config.json
  base_models/
    base_0/  …  (Model.save)
    base_1/  …
  meta_learner/  …  (Model.save)
class ncaa_eval.model.ensemble.StackedEnsembleConfig(*, model_name: str = 'ensemble', calibration_method: Literal['isotonic', 'sigmoid'] | None=None, base_model_types: list[str] = <factory>, contextual_features: list[str] = <factory>)[source]

Bases: ModelConfig

Configuration record for a stacked ensemble.

Stores base model types and contextual feature names for serialisation and run-tracking purposes.

base_model_types: list[str]
contextual_features: list[str]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str