ncaa_eval.ingest.connectors.kaggle module¶
Kaggle data source connector for NCAA March Madness competition data.
Downloads and parses CSV files from the Kaggle March Machine Learning Mania
competition. The download() method handles the network-dependent download
step while the fetch_*() methods perform pure CSV parsing, making it
straightforward to test without network access.
- class ncaa_eval.ingest.connectors.kaggle.KaggleConnector(extract_dir: Path, competition: str = 'march-machine-learning-mania-2026')[source]¶
Bases:
ConnectorConnector for Kaggle March Machine Learning Mania competition data.
- Parameters:
extract_dir – Local directory where CSV files are downloaded/extracted.
competition – Kaggle competition slug.
- download(*, force: bool = False) None[source]¶
Download and extract competition CSV files via the Kaggle API.
- Parameters:
force – Re-download even if files already exist.
- Raises:
AuthenticationError – Credentials missing or invalid.
NetworkError – Download failed due to connection issues.
- fetch_games(season: int) list[Game][source]¶
Parse regular-season and tournament CSVs into Game models.
Games from
MRegularSeasonCompactResults.csvhaveis_tournament=False; games fromMNCAATourneyCompactResults.csvhaveis_tournament=True.
- fetch_seasons() list[Season][source]¶
Parse
MSeasons.csvinto Season models.Delegates to
load_day_zeros()(which already reads and validates MSeasons.csv) to avoid a second disk read and Pandera validation pass.
- fetch_team_spellings() dict[str, int][source]¶
Parse
MTeamSpellings.csvinto a spelling → TeamID mapping.Returns every alternate spelling (lower-cased) for each team, which provides much wider coverage than the canonical names in MTeams.csv when resolving ESPN team name strings to Kaggle IDs.