Getting Started Tutorial¶

This tutorial walks you through the full NCAA_eval pipeline — from syncing data to viewing model predictions in the interactive dashboard.

Prerequisites: You have already installed the project (poetry install) and configured your Kaggle API credentials. See the README for setup instructions.

Step 1: Sync Data¶

Download NCAA game data from Kaggle (historical seasons 1985–2025) and ESPN (current-season scores):

python sync.py --source all --dest data/

Sample output (first run — exact counts vary by season):

[kaggle] teams: 362 written
[kaggle] seasons: 41 written
[kaggle] season 1985: 2526 games written
[kaggle] season 1986: 2614 games written
  ...
[kaggle] season 2025: 4545 games written
[espn] season 2025: 4581 games written

Sync complete in 45.2s — teams: 362, seasons: 41, games: 150352, cache hits: 0

Tip

Subsequent runs skip already-cached files automatically. Use --force-refresh to re-download everything.

Step 2: Train an Elo Model¶

The Elo model is a stateful rating system — it maintains per-team ratings that update game-by-game. It requires no feature engineering and is a good first model to train:

python -m ncaa_eval.cli train --model elo

The CLI shows a Rich progress bar while building features, then runs a walk-forward backtest and prints a summary table:

Building features... ━━━━━━━━━━━━━━━━━━━━ 100% 0:00:12
Training elo on seasons 2015–2025...
Running walk-forward backtest...
Backtest metrics persisted.
Model artifacts persisted.
       Training Results
┌──────────────────────┬──────────┐
│ Field                │ Value    │
├──────────────────────┼──────────┤
│ Run ID               │ <uuid>   │
│ Model                │ elo      │
│ Seasons              │ 2015–2025│
│ Games trained        │ 55000    │
│ Tournament preds     │ 630      │
│ Git hash             │ abc1234  │
└──────────────────────┴──────────┘

The --model flag selects from registered model plugins. To see all available models:

python -c "from ncaa_eval.model import list_models; print(list_models())"

['elo', 'logistic_regression', 'xgboost']

Customize Hyperparameters¶

Override any Elo hyperparameter via a JSON config file:

{
  "k_regular": 40.0,
  "mean_reversion_fraction": 0.30
}

python -m ncaa_eval.cli train --model elo --config my_elo_config.json

See the User Guide — Stateful Models for the full list of Elo hyperparameters.

Step 3: Train an XGBoost Model¶

XGBoost is a stateless model — it takes a feature matrix as input and learns which features best predict game outcomes:

python -m ncaa_eval.cli train --model xgboost

The output follows the same format as the Elo training above (progress bar, training message, backtest, and summary table). XGBoost typically produces lower Log Loss and higher AUC when strong features are available.

XGBoost typically outperforms Elo when the feature engineering pipeline provides strong signal. See the User Guide — Stateless Models for hyperparameter details.

Adjust the Training Window¶

Train on more (or fewer) seasons using --start-year and --end-year:

python -m ncaa_eval.cli train --model xgboost --start-year 2010 --end-year 2024

Step 4: Launch the Dashboard¶

Start the Streamlit dashboard to explore your model results:

streamlit run dashboard/app.py

The dashboard opens in your browser at http://localhost:8501.

Dashboard Navigation¶

The dashboard has four pages organized into two sections:

Lab (model analysis):

Backtest Leaderboard — Compare all trained models side-by-side on Log Loss, Brier Score, ROC-AUC, and ECE. Color-coded cells highlight the best and worst performers.
Model Deep Dive — Inspect a single model’s calibration via reliability diagrams, per-year metric breakdowns, and feature importance (XGBoost only).

Presentation (tournament predictions):

Bracket Visualizer — View the model’s predicted bracket with advancement probabilities, pairwise win probabilities, and expected points per team under your chosen scoring rule.
Pool Scorer — Score your bracket against thousands of simulated tournament outcomes to see your expected point distribution. Export the bracket as CSV.

Sidebar Filters¶

Use the sidebar to control what you see:

Filter	Options	Effect
Tournament Year	Any year with tournament data	Filters all pages to that year
Model Run	Any completed training run	Selects which model’s predictions to display
Scoring Format	Standard, Fibonacci, Seed-Diff Bonus, Custom	Changes how bracket points are calculated

Tip

All pages update automatically when you change sidebar filters. Start on the Leaderboard to compare models, then click a model run to dive into its details.

Step 5: Interpret Your Results¶

Compare Models on the Leaderboard¶

The Leaderboard shows key metrics for every training run:

Metric	Better When	Random Baseline
Log Loss	Lower	0.693
Brier Score	Lower	0.25
ROC-AUC	Higher	0.5
ECE	Lower	—

Tip

For a detailed explanation of each metric (formulas, interpretation, worked examples), see the User Guide — Evaluation Metrics.

Check Calibration in Model Deep Dive¶

Select a model run and navigate to the Deep Dive page. The reliability diagram shows whether your model’s predicted probabilities match reality:

Points on the diagonal = well-calibrated
Points above = under-confident (predicts 60% but wins 70%)
Points below = over-confident (predicts 80% but wins 65%)

Use the year dropdown to check calibration stability across seasons.

Build a Bracket in the Bracket Visualizer¶

Select a model run and tournament year
Choose “Analytical (exact)” for fast expected points, or “Monte Carlo” for score distributions
Review the Expected Points table — teams at the top are the most valuable bracket picks under your scoring rule
Check the Advancement Heatmap to see each team’s probability of reaching each round

Step 6: Iterate and Improve¶

The typical workflow loop is:

Train a new model (or retrain with different hyperparameters)
Compare on the Leaderboard — did metrics improve?
Inspect calibration on the Deep Dive page
Build a bracket using the Bracket Visualizer
Score the bracket on the Pool Scorer page
Repeat

Tip

Try training a Logistic Regression model as a simple baseline: python -m ncaa_eval.cli train --model logistic_regression

Next Steps¶

Build an ensemble — See the Ensemble Tutorial
Create a custom model — See the Custom Model Tutorial
Add a custom metric — See the Custom Metric Tutorial
Deep dive into metrics — See the User Guide