Getting Started Tutorial

This tutorial walks you through the full NCAA_eval pipeline — from syncing data to viewing model predictions in the interactive dashboard.

Prerequisites: You have already installed the project (poetry install) and configured your Kaggle API credentials. See the README for setup instructions.

Step 1: Sync Data

Download NCAA game data from Kaggle (historical seasons 1985–2025) and ESPN (current-season scores):

python sync.py --source all --dest data/

Sample output (first run — exact counts vary by season):

[kaggle] teams: 362 written
[kaggle] seasons: 41 written
[kaggle] season 1985: 2526 games written
[kaggle] season 1986: 2614 games written
  ...
[kaggle] season 2025: 4545 games written
[espn] season 2025: 4581 games written

Sync complete in 45.2s — teams: 362, seasons: 41, games: 150352, cache hits: 0

Tip

Subsequent runs skip already-cached files automatically. Use --force-refresh to re-download everything.

Step 2: Train an Elo Model

The Elo model is a stateful rating system — it maintains per-team ratings that update game-by-game. It requires no feature engineering and is a good first model to train:

python -m ncaa_eval.cli train --model elo

The CLI shows a Rich progress bar while building features, then runs a walk-forward backtest and prints a summary table:

Building features... ━━━━━━━━━━━━━━━━━━━━ 100% 0:00:12
Training elo on seasons 2015–2025...
Running walk-forward backtest...
Backtest metrics persisted.
Model artifacts persisted.
       Training Results
┌──────────────────────┬──────────┐
│ Field                │ Value    │
├──────────────────────┼──────────┤
│ Run ID               │ <uuid>   │
│ Model                │ elo      │
│ Seasons              │ 2015–2025│
│ Games trained        │ 55000    │
│ Tournament preds     │ 630      │
│ Git hash             │ abc1234  │
└──────────────────────┴──────────┘

The --model flag selects from registered model plugins. To see all available models:

python -c "from ncaa_eval.model import list_models; print(list_models())"
['elo', 'logistic_regression', 'xgboost']

Customize Hyperparameters

Override any Elo hyperparameter via a JSON config file:

{
  "k_regular": 40.0,
  "mean_reversion_fraction": 0.30
}
python -m ncaa_eval.cli train --model elo --config my_elo_config.json

See the User Guide — Stateful Models for the full list of Elo hyperparameters.

Step 3: Train an XGBoost Model

XGBoost is a stateless model — it takes a feature matrix as input and learns which features best predict game outcomes:

python -m ncaa_eval.cli train --model xgboost

The output follows the same format as the Elo training above (progress bar, training message, backtest, and summary table). XGBoost typically produces lower Log Loss and higher AUC when strong features are available.

XGBoost typically outperforms Elo when the feature engineering pipeline provides strong signal. See the User Guide — Stateless Models for hyperparameter details.

Adjust the Training Window

Train on more (or fewer) seasons using --start-year and --end-year:

python -m ncaa_eval.cli train --model xgboost --start-year 2010 --end-year 2024

Step 4: Launch the Dashboard

Start the Streamlit dashboard to explore your model results:

streamlit run dashboard/app.py

The dashboard opens in your browser at http://localhost:8501.

Dashboard Navigation

The dashboard has four pages organized into two sections:

Lab (model analysis):

  • Backtest Leaderboard — Compare all trained models side-by-side on Log Loss, Brier Score, ROC-AUC, and ECE. Color-coded cells highlight the best and worst performers.

  • Model Deep Dive — Inspect a single model’s calibration via reliability diagrams, per-year metric breakdowns, and feature importance (XGBoost only).

Presentation (tournament predictions):

  • Bracket Visualizer — View the model’s predicted bracket with advancement probabilities, pairwise win probabilities, and expected points per team under your chosen scoring rule.

  • Pool Scorer — Score your bracket against thousands of simulated tournament outcomes to see your expected point distribution. Export the bracket as CSV.

Step 5: Interpret Your Results

Compare Models on the Leaderboard

The Leaderboard shows key metrics for every training run:

Metric

Better When

Random Baseline

Log Loss

Lower

0.693

Brier Score

Lower

0.25

ROC-AUC

Higher

0.5

ECE

Lower

Tip

For a detailed explanation of each metric (formulas, interpretation, worked examples), see the User Guide — Evaluation Metrics.

Check Calibration in Model Deep Dive

Select a model run and navigate to the Deep Dive page. The reliability diagram shows whether your model’s predicted probabilities match reality:

  • Points on the diagonal = well-calibrated

  • Points above = under-confident (predicts 60% but wins 70%)

  • Points below = over-confident (predicts 80% but wins 65%)

Use the year dropdown to check calibration stability across seasons.

Build a Bracket in the Bracket Visualizer

  1. Select a model run and tournament year

  2. Choose “Analytical (exact)” for fast expected points, or “Monte Carlo” for score distributions

  3. Review the Expected Points table — teams at the top are the most valuable bracket picks under your scoring rule

  4. Check the Advancement Heatmap to see each team’s probability of reaching each round

Step 6: Iterate and Improve

The typical workflow loop is:

  1. Train a new model (or retrain with different hyperparameters)

  2. Compare on the Leaderboard — did metrics improve?

  3. Inspect calibration on the Deep Dive page

  4. Build a bracket using the Bracket Visualizer

  5. Score the bracket on the Pool Scorer page

  6. Repeat

Tip

Try training a Logistic Regression model as a simple baseline: python -m ncaa_eval.cli train --model logistic_regression

Next Steps