ncaa_eval.ingest.connectors.kaggle module

Kaggle data source connector for NCAA March Madness competition data.

Downloads and parses CSV files from the Kaggle March Machine Learning Mania competition. The download() method handles the network-dependent download step while the fetch_*() methods perform pure CSV parsing, making it straightforward to test without network access.

class ncaa_eval.ingest.connectors.kaggle.KaggleConnector(extract_dir: Path, competition: str = 'march-machine-learning-mania-2026')[source]

Bases: Connector

Connector for Kaggle March Machine Learning Mania competition data.

Parameters:
  • extract_dir – Local directory where CSV files are downloaded/extracted.

  • competition – Kaggle competition slug.

download(*, force: bool = False) None[source]

Download and extract competition CSV files via the Kaggle API.

Parameters:

force – Re-download even if files already exist.

Raises:
fetch_games(season: int) list[Game][source]

Parse regular-season and tournament CSVs into Game models.

Games from MRegularSeasonCompactResults.csv have is_tournament=False; games from MNCAATourneyCompactResults.csv have is_tournament=True.

fetch_seasons() list[Season][source]

Parse MSeasons.csv into Season models.

Delegates to load_day_zeros() (which already reads and validates MSeasons.csv) to avoid a second disk read and Pandera validation pass.

fetch_team_spellings() dict[str, int][source]

Parse MTeamSpellings.csv into a spelling → TeamID mapping.

Returns every alternate spelling (lower-cased) for each team, which provides much wider coverage than the canonical names in MTeams.csv when resolving ESPN team name strings to Kaggle IDs.

fetch_teams() list[Team][source]

Parse MTeams.csv into Team models.

Reads MTeams.csv, validates required columns, then constructs Team models from each row’s TeamID and TeamName.

load_day_zeros() dict[int, date][source]

Load and cache the season → DayZero mapping.

Returns:

Mapping of season year to the date of Day 0 for that season.