# Test Approach Guide: Example-Based vs Property-Based vs Fuzz-Based Testing

This guide explains **how you write tests** - the approach used to specify test inputs and expected behavior.

## Overview

**Test approach** is one of the four orthogonal dimensions of testing. It's orthogonal to test scope - you can use any approach for unit tests, integration tests, or any other test type.

The project supports three test approaches:
- **Example-based:** Concrete input/output pairs for known scenarios
- **Property-based:** Invariants tested across generated inputs (Hypothesis)
- **Fuzz-based:** Random/mutated inputs for crash resilience (Hypothesis)

---

## Example-Based Testing (Standard Pytest)

### What it is
Tests that specify concrete input examples and their expected outputs.

### When to use
- Testing **specific known scenarios** (e.g., "log loss of [0.9] vs [1] is 0.105")
- Regression tests for previously discovered bugs
- Boundary conditions you want to verify explicitly
- Cases where you have domain knowledge of important edge cases

### Strengths
- **Easy to understand:** Concrete examples are easy to read and debug
- **Fast to write:** Just specify input → expected output
- **Precise:** You control exactly what scenarios are tested
- **Fast to run:** Tests a fixed number of cases

### Weaknesses
- **Limited coverage:** Only tests the scenarios you thought of
- **May miss edge cases:** Humans are bad at imagining all corner cases

### Techniques

**Simple assertions:** Basic input → output tests

```python
def test_elo_update_increases_winner_rating(elo_config, sample_game):
    """Verify Elo rating increases for the winning team."""
    engine = EloFeatureEngine(elo_config)
    initial_rating = engine.get_rating(sample_game.w_team_id)
    engine.update_game(sample_game)
    assert engine.get_rating(sample_game.w_team_id) > initial_rating
```

**Parametrized tests:** `@pytest.mark.parametrize` to test multiple scenarios with same logic

```python
@pytest.mark.parametrize("predictions,actuals,expected", [
    ([1.0, 0.0, 1.0], [1, 0, 1], 0.0),       # Perfect predictions
    ([0.9, 0.1, 0.9], [1, 0, 1], 0.03),      # Near-perfect predictions
    ([0.5, 0.5, 0.5], [1, 0, 1], 0.25),      # Random guessing
])
def test_brier_score_known_cases(predictions, actuals, expected):
    """Verify Brier score for known prediction scenarios."""
    result = brier_score(np.array(predictions), np.array(actuals))
    assert abs(result - expected) < 0.01
```

### Pre-commit eligibility
✅ **YES** (if fast) - Mark as `@pytest.mark.smoke`

---

## Property-Based Testing (Hypothesis)

### What it is
Tests that specify **invariants** (properties that should always hold) and let Hypothesis automatically generate hundreds of test cases.

### When to use
- Testing **invariants** across many inputs (e.g., "output length always equals input length")
- **Mathematical properties** (e.g., "probabilities always in [0, 1]", "log loss always non-negative")
- **Discovering edge cases** you didn't think of
- Functions with **complex input spaces** where manual examples would be tedious

### Strengths
- **Comprehensive coverage:** Tests hundreds of generated cases automatically
- **Finds edge cases:** Often discovers bugs in corner cases humans miss
- **Self-documenting:** The invariant clearly states what property is being tested
- **Regression shrinking:** When a test fails, Hypothesis finds the minimal failing case

### Weaknesses
- **Slower:** Generates and runs many test cases (100+ by default)
- **Harder to debug:** Failures may involve unusual generated inputs
- **Requires invariant thinking:** Need to identify what properties should hold

### When to use Hypothesis vs. Parametrized Pytest

- Use **Hypothesis** when you can state an *invariant* (e.g., "output length = input length for all inputs")
- Use **Parametrized Pytest** when you have *specific scenarios* to verify (e.g., "for input X, output is Y")

### Examples

**Property-based unit test:**

```python
from hypothesis import given, strategies as st

@pytest.mark.property
@given(data=st.lists(st.integers(), min_size=1, max_size=100))
def test_rolling_average_length_invariant(data):
    """Verify rolling average output length matches input (invariant holds for all inputs)."""
    series = pd.Series(data)
    result = calculate_rolling_average(series, window=5)
    assert len(result) == len(series)  # Invariant: length preserved
```

**Property-based integration test:**

```python
@pytest.mark.integration
@pytest.mark.property
@given(cutoff_year=st.integers(2015, 2025))
def test_temporal_boundary_invariant(cutoff_year):
    """Verify API never returns data beyond cutoff (invariant holds for all years)."""
    api = ChronologicalDataServer()
    games = api.get_games_before(cutoff_year=cutoff_year)

    # Invariant: all games before or at cutoff year
    assert all(game.season <= cutoff_year for game in games)
```

**Property-based test with complex strategy:**

```python
@pytest.mark.property
@given(
    predictions=st.lists(st.floats(0, 1), min_size=1, max_size=1000),
    actuals=st.lists(st.integers(0, 1), min_size=1, max_size=1000),
)
def test_brier_score_always_non_negative(predictions, actuals):
    """Verify Brier score is always >= 0 (mathematical property)."""
    # Make lists same length
    min_len = min(len(predictions), len(actuals))
    preds = np.array(predictions[:min_len])
    acts = np.array(actuals[:min_len])

    result = brier_score(preds, acts)
    assert result >= 0  # Invariant: Brier score cannot be negative
```

### Hypothesis Strategies (Common Patterns)

```python
from hypothesis import strategies as st

# Basic strategies
st.integers(min_value=0, max_value=100)        # Integers in range
st.floats(min_value=0.0, max_value=1.0)        # Floats in range
st.text(min_size=1, max_size=50)               # Random strings
st.booleans()                                   # True or False

# Collection strategies
st.lists(st.integers(), min_size=1, max_size=100)  # Lists of integers
st.sets(st.text(), min_size=0, max_size=10)         # Sets of strings
st.dictionaries(keys=st.text(), values=st.floats()) # Dictionaries

# Composite strategies
st.tuples(st.integers(), st.floats())           # Tuples with mixed types
st.one_of(st.integers(), st.none())             # Either int or None

# Data-driven strategies (most powerful)
st.data()  # Draw values dynamically within test
```

### Pre-commit eligibility
❌ **NO** - Hypothesis is slow (generates 100+ test cases)

Mark as `@pytest.mark.property` or `@pytest.mark.slow`

---

## Fuzz-Based Testing (Hypothesis)

### What it is
Tests that feed **random or mutated inputs** to find crashes, unhandled exceptions, and edge cases in error handling. Unlike property-based testing (which verifies invariants), fuzz testing focuses on **resilience** - ensuring the code doesn't crash on unexpected inputs.

### When to use
- Testing **error handling** and crash resilience (e.g., "parser handles malformed CSV gracefully")
- **Input validation** robustness (e.g., "API rejects invalid data without crashing")
- **Security testing** for injection vulnerabilities (e.g., "SQL queries handle special characters")
- **Data parsing** resilience (e.g., "ingestion handles encoding errors")

### Strengths
- **Finds crashes:** Discovers inputs that cause unhandled exceptions
- **Tests error paths:** Validates that error handling code actually works
- **Security-focused:** Can find injection vulnerabilities and edge cases
- **Low effort:** Generate random inputs without thinking about specific cases

### Weaknesses
- **Slower:** Generates many random test cases
- **May produce noise:** Random inputs may trigger expected errors (need to filter)
- **Less targeted:** Not testing correctness, just resilience

### When to use Fuzz vs Property-Based vs Example-Based

- Use **Fuzz-based** when testing *error handling and crash resilience*
- Use **Property-based** when testing *invariants* (e.g., "output length = input length")
- Use **Example-based** when testing *specific scenarios* (e.g., "for input X, output is Y")

### Examples

**Fuzz-based unit test (data parsing):**

```python
from hypothesis import given, strategies as st

@pytest.mark.slow
@given(text=st.text())
def test_parse_team_name_never_crashes(text):
    """Verify parser handles arbitrary text without crashing."""
    try:
        result = parse_team_name(text)
        # If parsing succeeds, result should be valid
        assert isinstance(result, str)
    except ValueError:
        # Expected error for invalid input is acceptable
        pass
```

**Fuzz-based integration test (CSV ingestion):**

```python
@pytest.mark.integration
@pytest.mark.slow
@given(data=st.binary())
def test_ingest_csv_handles_malformed_data(data, tmp_path):
    """Verify CSV ingestion handles malformed files gracefully."""
    # Write random binary data to temp file
    csv_file = tmp_path / "malformed.csv"
    csv_file.write_bytes(data)

    # Ingestion should either succeed or raise expected error
    try:
        result = ingest_game_data(csv_file)
        assert isinstance(result, pd.DataFrame)
    except (UnicodeDecodeError, pd.errors.ParserError):
        # Expected errors for malformed data
        pass
```

**Fuzz-based test (API validation):**

```python
@pytest.mark.slow
@given(
    season=st.integers(),  # Any integer, including negatives
    game_id=st.text(),     # Any string, including empty/special chars
)
def test_api_validates_inputs_safely(season, game_id):
    """Verify API validation doesn't crash on invalid inputs."""
    api = ChronologicalDataServer()

    try:
        # API should either return data or raise ValueError
        result = api.get_game(season=season, game_id=game_id)
        assert result is not None
    except ValueError as e:
        # Expected validation error
        assert "invalid" in str(e).lower() or "not found" in str(e).lower()
```

### Hypothesis Strategies for Fuzzing

```python
from hypothesis import strategies as st

# Basic fuzzing strategies
st.text()                          # Any Unicode text (including empty, special chars)
st.binary()                        # Random byte sequences
st.integers()                      # Any integer (including negatives, zero, large values)
st.floats(allow_nan=True)          # Floats including NaN, inf, -inf

# Targeted fuzzing
st.text(alphabet=st.characters(blacklist_categories=("Cs",)))  # Valid Unicode only
st.text(min_size=0, max_size=1000)  # Bounded text
st.binary(min_size=0, max_size=10000)  # Bounded binary data

# Domain-specific fuzzing
st.text().filter(lambda x: ";" in x or "'" in x)  # SQL injection patterns
st.text().map(lambda x: x + "\x00" + x)  # Null byte injection
```

### Pre-commit eligibility
❌ **NO** - Fuzz testing is slow (generates many random test cases)

Mark as `@pytest.mark.slow`

---

## Decision Tree: Choosing Your Test Approach

```
Are you testing error handling / crash resilience?
├─ YES → Use Fuzz-Based Testing (@pytest.mark.slow, Hypothesis)
│         - Generate random/mutated inputs to find crashes
│         - Examples: CSV parsing, API validation, input sanitization
│
└─ NO → Does the test verify a specific known scenario?
         ├─ YES → Use Example-Based Testing
         │         - Parametrize if testing multiple similar scenarios
         │         - Examples: regression tests, boundary conditions, known edge cases
         │
         └─ NO → Can you state an invariant that should hold for all inputs?
                  ├─ YES → Use Property-Based Testing (Hypothesis)
                  │         - State the invariant clearly
                  │         - Let Hypothesis generate test cases
                  │         - Examples: "length preserved", "result always positive", "sum = 1.0"
                  │
                  └─ NO → Use Example-Based Testing
                            - If you can't state an invariant, test specific examples
                            - Consider if the function is too complex (may need refactoring)
```

---

## Combining Multiple Approaches

Many functions benefit from **multiple** test approaches - each approach tests different aspects:

```python
# Example-based: Test specific known cases
@pytest.mark.parametrize("input,expected", [
    ("St. Mary's", "Saint Mary's"),
    ("UNC", "North Carolina"),
])
def test_clean_team_name_known_cases(input, expected):
    """Verify normalization for known abbreviations."""
    assert clean_team_name(input) == expected

# Property-based: Test invariants
@pytest.mark.property
@given(name=st.text(min_size=1))
def test_clean_team_name_never_empty(name):
    """Verify normalization never returns empty string."""
    result = clean_team_name(name)
    assert len(result) > 0  # Invariant: output is never empty

# Fuzz-based: Test crash resilience
@pytest.mark.slow
@given(name=st.text())  # Including empty strings, special chars
def test_clean_team_name_never_crashes(name):
    """Verify normalization handles any text without crashing."""
    try:
        result = clean_team_name(name)
        assert isinstance(result, str)
    except ValueError:
        # Expected error for invalid team names
        pass
```

**Why use all three?**
- **Example-based:** Verifies correctness for known scenarios (e.g., "UNC" → "North Carolina")
- **Property-based:** Verifies invariants hold universally (e.g., output never empty for valid input)
- **Fuzz-based:** Verifies resilience to unexpected inputs (e.g., doesn't crash on special characters)

---

## See Also

- [Test Scope Guide](test-scope-guide.md) - Unit vs Integration tests
- [Test Purpose Guide](test-purpose-guide.md) - Functional, Performance, Regression
- [Execution Guide](execution.md) - When tests/checks run (4-tier model)
- [Quality Assurance Guide](quality.md) - Mutation testing, coverage analysis