# Getting Started Tutorial This tutorial walks you through the full NCAA_eval pipeline — from syncing data to viewing model predictions in the interactive dashboard. **Prerequisites:** You have already installed the project (`poetry install`) and configured your Kaggle API credentials. See the [README](../../README.md) for setup instructions. ## Step 1: Sync Data Download NCAA game data from Kaggle (historical seasons 1985–2025) and ESPN (current-season scores): ```bash python sync.py --source all --dest data/ ``` Sample output (first run — exact counts vary by season): ```text [kaggle] teams: 362 written [kaggle] seasons: 41 written [kaggle] season 1985: 2526 games written [kaggle] season 1986: 2614 games written ... [kaggle] season 2025: 4545 games written [espn] season 2025: 4581 games written Sync complete in 45.2s — teams: 362, seasons: 41, games: 150352, cache hits: 0 ``` ```{tip} Subsequent runs skip already-cached files automatically. Use `--force-refresh` to re-download everything. ``` ## Step 2: Train an Elo Model The Elo model is a stateful rating system — it maintains per-team ratings that update game-by-game. It requires no feature engineering and is a good first model to train: ```bash python -m ncaa_eval.cli train --model elo ``` The CLI shows a Rich progress bar while building features, then runs a walk-forward backtest and prints a summary table: ```text Building features... ━━━━━━━━━━━━━━━━━━━━ 100% 0:00:12 Training elo on seasons 2015–2025... Running walk-forward backtest... Backtest metrics persisted. Model artifacts persisted. Training Results ┌──────────────────────┬──────────┐ │ Field │ Value │ ├──────────────────────┼──────────┤ │ Run ID │ │ │ Model │ elo │ │ Seasons │ 2015–2025│ │ Games trained │ 55000 │ │ Tournament preds │ 630 │ │ Git hash │ abc1234 │ └──────────────────────┴──────────┘ ``` The `--model` flag selects from registered model plugins. To see all available models: ```bash python -c "from ncaa_eval.model import list_models; print(list_models())" ``` ```text ['elo', 'logistic_regression', 'xgboost'] ``` ### Customize Hyperparameters Override any Elo hyperparameter via a JSON config file: ```json { "k_regular": 40.0, "mean_reversion_fraction": 0.30 } ``` ```bash python -m ncaa_eval.cli train --model elo --config my_elo_config.json ``` See the [User Guide — Stateful Models](../user-guide.md#stateful-models) for the full list of Elo hyperparameters. ## Step 3: Train an XGBoost Model XGBoost is a stateless model — it takes a feature matrix as input and learns which features best predict game outcomes: ```bash python -m ncaa_eval.cli train --model xgboost ``` The output follows the same format as the Elo training above (progress bar, training message, backtest, and summary table). XGBoost typically produces lower Log Loss and higher AUC when strong features are available. XGBoost typically outperforms Elo when the feature engineering pipeline provides strong signal. See the [User Guide — Stateless Models](../user-guide.md#stateless-models) for hyperparameter details. ### Adjust the Training Window Train on more (or fewer) seasons using `--start-year` and `--end-year`: ```bash python -m ncaa_eval.cli train --model xgboost --start-year 2010 --end-year 2024 ``` ## Step 4: Launch the Dashboard Start the Streamlit dashboard to explore your model results: ```bash streamlit run dashboard/app.py ``` The dashboard opens in your browser at `http://localhost:8501`. ### Dashboard Navigation The dashboard has four pages organized into two sections: **Lab** (model analysis): - **Backtest Leaderboard** — Compare all trained models side-by-side on Log Loss, Brier Score, ROC-AUC, and ECE. Color-coded cells highlight the best and worst performers. - **Model Deep Dive** — Inspect a single model's calibration via reliability diagrams, per-year metric breakdowns, and feature importance (XGBoost only). **Presentation** (tournament predictions): - **Bracket Visualizer** — View the model's predicted bracket with advancement probabilities, pairwise win probabilities, and expected points per team under your chosen scoring rule. - **Pool Scorer** — Score your bracket against thousands of simulated tournament outcomes to see your expected point distribution. Export the bracket as CSV. ### Sidebar Filters Use the sidebar to control what you see: | Filter | Options | Effect | |--------|---------|--------| | **Tournament Year** | Any year with tournament data | Filters all pages to that year | | **Model Run** | Any completed training run | Selects which model's predictions to display | | **Scoring Format** | Standard, Fibonacci, Seed-Diff Bonus, Custom | Changes how bracket points are calculated | ```{tip} All pages update automatically when you change sidebar filters. Start on the Leaderboard to compare models, then click a model run to dive into its details. ``` ## Step 5: Interpret Your Results ### Compare Models on the Leaderboard The Leaderboard shows key metrics for every training run: | Metric | Better When | Random Baseline | |--------|-------------|-----------------| | Log Loss | Lower | 0.693 | | Brier Score | Lower | 0.25 | | ROC-AUC | Higher | 0.5 | | ECE | Lower | — | ```{tip} For a detailed explanation of each metric (formulas, interpretation, worked examples), see the [User Guide — Evaluation Metrics](../user-guide.md#evaluation-metrics). ``` ### Check Calibration in Model Deep Dive Select a model run and navigate to the Deep Dive page. The reliability diagram shows whether your model's predicted probabilities match reality: - **Points on the diagonal** = well-calibrated - **Points above** = under-confident (predicts 60% but wins 70%) - **Points below** = over-confident (predicts 80% but wins 65%) Use the year dropdown to check calibration stability across seasons. ### Build a Bracket in the Bracket Visualizer 1. Select a model run and tournament year 2. Choose "Analytical (exact)" for fast expected points, or "Monte Carlo" for score distributions 3. Review the **Expected Points table** — teams at the top are the most valuable bracket picks under your scoring rule 4. Check the **Advancement Heatmap** to see each team's probability of reaching each round ## Step 6: Iterate and Improve The typical workflow loop is: 1. **Train** a new model (or retrain with different hyperparameters) 2. **Compare** on the Leaderboard — did metrics improve? 3. **Inspect** calibration on the Deep Dive page 4. **Build a bracket** using the Bracket Visualizer 5. **Score** the bracket on the Pool Scorer page 6. Repeat ```{tip} Try training a Logistic Regression model as a simple baseline: `python -m ncaa_eval.cli train --model logistic_regression` ``` ## Next Steps - **Build an ensemble** — See the [Ensemble Tutorial](../../notebooks/tutorials/03_ensemble_model.ipynb) - **Create a custom model** — See the [Custom Model Tutorial](custom-model.md) - **Add a custom metric** — See the [Custom Metric Tutorial](custom-metric.md) - **Deep dive into metrics** — See the [User Guide](../user-guide.md)