ncaa_eval.model.ensemble module¶
Stacked ensemble model — orchestrates base models and a meta-learner.
StackedEnsemble is a standalone @dataclass (not a Model subclass)
that holds a list of base Model instances and a stateless meta-learner.
The training pipeline in cli/train.py dispatches on
isinstance(model, StackedEnsemble) to invoke ensemble-specific training.
- class ncaa_eval.model.ensemble.StackedEnsemble(base_models: list[~ncaa_eval.model.base.Model], meta_learner: ~ncaa_eval.model.base.Model, contextual_features: list[str] = <factory>, meta_column_order: list[str] = <factory>)[source]¶
Bases:
objectStacked generalisation ensemble.
Holds a list of base
Modelinstances and a stateless meta-learner. The ensemble’sfeature_configis the union of all base models’ configs.- base_models¶
Two or more trained (or to-be-trained) base models.
- Type:
- meta_learner¶
A stateless
Modelthat learns to combine base model predictions with contextual features.
- contextual_features¶
Column names appended to OOF predictions before meta-learner training (e.g.
seed_diff).- Type:
list[str]
- contextual_features: list[str]¶
- property feature_config: FeatureConfig¶
Return the union of all base model feature configs.
- get_config() StackedEnsembleConfig[source]¶
Return a serialisable configuration record.
- classmethod load(path: Path) StackedEnsemble[source]¶
Reconstruct a
StackedEnsemblefrom a saved directory.
- meta_column_order: list[str]¶
- predict_bracket(data_dir: Path, season: int) DataFrame[source]¶
Generate an n×n pairwise probability matrix for bracket prediction.
Discovers tournament-eligible teams, generates per-base-model pairwise predictions, assembles meta-input for all C(n,2) matchups, and returns a probability matrix suitable for the Monte Carlo bracket simulator.
- Parameters:
data_dir – Path to the local Parquet data store.
season – Target season year.
- Returns:
DataFrame of shape
(n, n)with team_id index and columns.P[a, b]is the ensemble probability that team a beats team b. Diagonal is zero;P[a,b] + P[b,a] ≈ 1.- Raises:
FileNotFoundError – If no season data exists for season.
ValueError – If any column in
meta_column_orderis missing from the assembled meta-input, or if a context feature array has unexpected length.
- predict_proba(X: DataFrame) Series[source]¶
Route features through base models and meta-learner.
For each base model, generates predictions by dispatching stateless models through
X[base_model.feature_names_]and stateful models through the fullX. Assembles base predictions and contextual features into a meta-input DataFrame inself.meta_column_order, then calls the meta-learner.- Parameters:
X – Feature DataFrame with at least the columns required by each base model and all contextual features.
- Returns:
Series of ensemble win probabilities, indexed like X.
- Raises:
ValueError – If any column in
meta_column_orderis missing from the assembled meta-input.
- class ncaa_eval.model.ensemble.StackedEnsembleConfig(*, model_name: str = 'ensemble', calibration_method: Literal['isotonic', 'sigmoid'] | None=None, base_model_types: list[str] = <factory>, contextual_features: list[str] = <factory>)[source]¶
Bases:
ModelConfigConfiguration record for a stacked ensemble.
Stores base model types and contextual feature names for serialisation and run-tracking purposes.
- base_model_types: list[str]¶
- contextual_features: list[str]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_name: str¶