ncaa_eval.model.xgboost_model module¶
XGBoost gradient-boosting model — reference stateless model.
Wraps xgboost.XGBClassifier behind the Model ABC,
providing fit / predict_proba / save / load with XGBoost’s
native UBJSON persistence format.
- class ncaa_eval.model.xgboost_model.XGBoostModel(config: XGBoostModelConfig | None = None, *, batch_rating_types: tuple[Literal['srs', 'ridge', 'colley'], ...] = ('srs',), graph_features_enabled: bool = False, ordinal_composite: Literal['simple_average', 'weighted', 'pca'] | None = None)[source]¶
Bases:
ModelXGBoost binary classifier wrapping
XGBClassifier.This is a stateless model — it implements
Modeldirectly (noStatefulModellifecycle hooks).Label balance convention: The feature server typically assigns
team_a = w_team_id(the winner), soymay be heavily biased toward 1. Callers should either randomise team assignment before training (recommended) or setscale_pos_weightin the config tocount(y==0) / count(y==1). The defaultscale_pos_weightisNone(XGBoost default = 1.0), appropriate when team assignment is randomised.- fit(X: DataFrame, y: Series) None[source]¶
Train on feature matrix X and binary labels y.
Automatically splits X into train/validation sets using
validation_fractionfrom the config. The validation set is used for early stopping viaeval_set.Label balance convention:
team_aassignment in the feature server is typicallyw_team_id(the winner). If labels are imbalanced, either randomise team assignment upstream or setscale_pos_weight=count(y==0) / count(y==1)in theXGBoostModelConfig.- Raises:
ValueError – If X is empty.
- get_config() XGBoostModelConfig[source]¶
Return the Pydantic-validated configuration for this model.
- get_feature_importances() list[tuple[str, float]] | None[source]¶
Return feature name/importance pairs from the fitted classifier.
- classmethod load(path: Path) Self[source]¶
Load a previously-saved XGBoost model from path.
- Raises:
FileNotFoundError – If either
config.jsonormodel.ubjis missing.
- predict_proba(X: DataFrame) Series[source]¶
Return P(team_a wins) for each row of X.
- Raises:
RuntimeError – If called before
fit().
- save(path: Path) None[source]¶
Persist the trained model to path directory.
Writes four files: -
model.ubj— XGBoost native UBJSON format (stable across versions) -config.json— Pydantic-serialised hyperparameter config -feature_names.json— JSON array of feature column names -feature_config.json— FeatureConfig sidecar- Raises:
RuntimeError – If called before
fit().
- class ncaa_eval.model.xgboost_model.XGBoostModelConfig(*, model_name: Literal['xgboost'] = 'xgboost', calibration_method: Literal['isotonic', 'sigmoid'] | None = None, n_estimators: int = 500, max_depth: int = 5, learning_rate: float = 0.05, subsample: float = 0.8, colsample_bytree: float = 0.8, min_child_weight: int = 3, reg_alpha: float = 0.0, reg_lambda: float = 1.0, early_stopping_rounds: int = 50, validation_fraction: Annotated[float, Gt(gt=0.0), Lt(lt=1.0)] = 0.1, scale_pos_weight: float | None = None)[source]¶
Bases:
ModelConfigHyperparameters for the XGBoost gradient-boosting model.
Defaults from
specs/research/modeling-approaches.md§5.5 and §6.4.Label balance: Set
scale_pos_weight = count(y==0) / count(y==1)when training labels are imbalanced (e.g. team_a is always the winner). Leave asNone(XGBoost default = 1.0) when team assignment is randomised before training.- colsample_bytree: float¶
- early_stopping_rounds: int¶
- learning_rate: float¶
- max_depth: int¶
- min_child_weight: int¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_name: Literal['xgboost']¶
- n_estimators: int¶
- reg_alpha: float¶
- reg_lambda: float¶
- scale_pos_weight: float | None¶
- subsample: float¶
- validation_fraction: Annotated[float, Field(gt=0.0, lt=1.0)]¶