ncaa_eval.ingest.connectors package

Submodules

Module contents

Data source connectors for NCAA basketball data ingestion.

exception ncaa_eval.ingest.connectors.AuthenticationError[source]

Bases: ConnectorError

Credentials missing, invalid, or expired.

class ncaa_eval.ingest.connectors.Connector[source]

Bases: ABC

Abstract base class for NCAA data source connectors.

All connectors must implement fetch_games(), which is the universal capability. fetch_teams() and fetch_seasons() are optional capabilities — subclasses that do not support them inherit the default implementation, which raises NotImplementedError. Callers should use isinstance() checks or try/except NotImplementedError to probe optional capabilities before calling them.

abstractmethod fetch_games(season: int) list[Game][source]

Fetch game results for a given season year.

fetch_seasons() list[Season][source]

Fetch available seasons from the source.

Optional capability — not all connectors provide season master data.

Raises:

NotImplementedError – If this connector does not support fetching seasons.

fetch_teams() list[Team][source]

Fetch team data from the source.

Optional capability — not all connectors provide team master data.

Raises:

NotImplementedError – If this connector does not support fetching teams.

exception ncaa_eval.ingest.connectors.ConnectorError[source]

Bases: Exception

Base exception for all connector errors.

exception ncaa_eval.ingest.connectors.DataFormatError[source]

Bases: ConnectorError

Raw data (CSV / API response) does not match the expected schema.

class ncaa_eval.ingest.connectors.EspnConnector(team_name_to_id: dict[str, int], season_day_zeros: dict[int, date])[source]

Bases: Connector

Connector for ESPN game data via the cbbpy scraper.

Parameters:
  • team_name_to_id – Mapping from team name strings to Kaggle TeamIDs.

  • season_day_zeros – Mapping from season year to DayZero date.

fetch_games(season: int) list[Game][source]

Fetch game results for season from ESPN via cbbpy.

Uses get_team_schedule() for each team in the mapping and deduplicates by ESPN game ID.

class ncaa_eval.ingest.connectors.KaggleConnector(extract_dir: Path, competition: str = 'march-machine-learning-mania-2026')[source]

Bases: Connector

Connector for Kaggle March Machine Learning Mania competition data.

Parameters:
  • extract_dir – Local directory where CSV files are downloaded/extracted.

  • competition – Kaggle competition slug.

download(*, force: bool = False) None[source]

Download and extract competition CSV files via the Kaggle API.

Parameters:

force – Re-download even if files already exist.

Raises:
fetch_games(season: int) list[Game][source]

Parse regular-season and tournament CSVs into Game models.

Games from MRegularSeasonCompactResults.csv have is_tournament=False; games from MNCAATourneyCompactResults.csv have is_tournament=True.

fetch_seasons() list[Season][source]

Parse MSeasons.csv into Season models.

Delegates to load_day_zeros() (which already reads and validates MSeasons.csv) to avoid a second disk read and Pandera validation pass.

fetch_team_spellings() dict[str, int][source]

Parse MTeamSpellings.csv into a spelling → TeamID mapping.

Returns every alternate spelling (lower-cased) for each team, which provides much wider coverage than the canonical names in MTeams.csv when resolving ESPN team name strings to Kaggle IDs.

fetch_teams() list[Team][source]

Parse MTeams.csv into Team models.

Reads MTeams.csv, validates required columns, then constructs Team models from each row’s TeamID and TeamName.

load_day_zeros() dict[int, date][source]

Load and cache the season → DayZero mapping.

Returns:

Mapping of season year to the date of Day 0 for that season.

exception ncaa_eval.ingest.connectors.NetworkError[source]

Bases: ConnectorError

Connection failure, timeout, or HTTP error.