Test Approach Guide: Example-Based vs Property-Based vs Fuzz-Based Testing

This guide explains how you write tests - the approach used to specify test inputs and expected behavior.

Overview

Test approach is one of the four orthogonal dimensions of testing. It’s orthogonal to test scope - you can use any approach for unit tests, integration tests, or any other test type.

The project supports three test approaches:

  • Example-based: Concrete input/output pairs for known scenarios

  • Property-based: Invariants tested across generated inputs (Hypothesis)

  • Fuzz-based: Random/mutated inputs for crash resilience (Hypothesis)


Example-Based Testing (Standard Pytest)

What it is

Tests that specify concrete input examples and their expected outputs.

When to use

  • Testing specific known scenarios (e.g., “log loss of [0.9] vs [1] is 0.105”)

  • Regression tests for previously discovered bugs

  • Boundary conditions you want to verify explicitly

  • Cases where you have domain knowledge of important edge cases

Strengths

  • Easy to understand: Concrete examples are easy to read and debug

  • Fast to write: Just specify input → expected output

  • Precise: You control exactly what scenarios are tested

  • Fast to run: Tests a fixed number of cases

Weaknesses

  • Limited coverage: Only tests the scenarios you thought of

  • May miss edge cases: Humans are bad at imagining all corner cases

Techniques

Simple assertions: Basic input → output tests

def test_elo_update_increases_winner_rating(elo_config, sample_game):
    """Verify Elo rating increases for the winning team."""
    engine = EloFeatureEngine(elo_config)
    initial_rating = engine.get_rating(sample_game.w_team_id)
    engine.update_game(sample_game)
    assert engine.get_rating(sample_game.w_team_id) > initial_rating

Parametrized tests: @pytest.mark.parametrize to test multiple scenarios with same logic

@pytest.mark.parametrize("predictions,actuals,expected", [
    ([1.0, 0.0, 1.0], [1, 0, 1], 0.0),       # Perfect predictions
    ([0.9, 0.1, 0.9], [1, 0, 1], 0.03),      # Near-perfect predictions
    ([0.5, 0.5, 0.5], [1, 0, 1], 0.25),      # Random guessing
])
def test_brier_score_known_cases(predictions, actuals, expected):
    """Verify Brier score for known prediction scenarios."""
    result = brier_score(np.array(predictions), np.array(actuals))
    assert abs(result - expected) < 0.01

Pre-commit eligibility

YES (if fast) - Mark as @pytest.mark.smoke


Property-Based Testing (Hypothesis)

What it is

Tests that specify invariants (properties that should always hold) and let Hypothesis automatically generate hundreds of test cases.

When to use

  • Testing invariants across many inputs (e.g., “output length always equals input length”)

  • Mathematical properties (e.g., “probabilities always in [0, 1]”, “log loss always non-negative”)

  • Discovering edge cases you didn’t think of

  • Functions with complex input spaces where manual examples would be tedious

Strengths

  • Comprehensive coverage: Tests hundreds of generated cases automatically

  • Finds edge cases: Often discovers bugs in corner cases humans miss

  • Self-documenting: The invariant clearly states what property is being tested

  • Regression shrinking: When a test fails, Hypothesis finds the minimal failing case

Weaknesses

  • Slower: Generates and runs many test cases (100+ by default)

  • Harder to debug: Failures may involve unusual generated inputs

  • Requires invariant thinking: Need to identify what properties should hold

When to use Hypothesis vs. Parametrized Pytest

  • Use Hypothesis when you can state an invariant (e.g., “output length = input length for all inputs”)

  • Use Parametrized Pytest when you have specific scenarios to verify (e.g., “for input X, output is Y”)

Examples

Property-based unit test:

from hypothesis import given, strategies as st

@pytest.mark.property
@given(data=st.lists(st.integers(), min_size=1, max_size=100))
def test_rolling_average_length_invariant(data):
    """Verify rolling average output length matches input (invariant holds for all inputs)."""
    series = pd.Series(data)
    result = calculate_rolling_average(series, window=5)
    assert len(result) == len(series)  # Invariant: length preserved

Property-based integration test:

@pytest.mark.integration
@pytest.mark.property
@given(cutoff_year=st.integers(2015, 2025))
def test_temporal_boundary_invariant(cutoff_year):
    """Verify API never returns data beyond cutoff (invariant holds for all years)."""
    api = ChronologicalDataServer()
    games = api.get_games_before(cutoff_year=cutoff_year)

    # Invariant: all games before or at cutoff year
    assert all(game.season <= cutoff_year for game in games)

Property-based test with complex strategy:

@pytest.mark.property
@given(
    predictions=st.lists(st.floats(0, 1), min_size=1, max_size=1000),
    actuals=st.lists(st.integers(0, 1), min_size=1, max_size=1000),
)
def test_brier_score_always_non_negative(predictions, actuals):
    """Verify Brier score is always >= 0 (mathematical property)."""
    # Make lists same length
    min_len = min(len(predictions), len(actuals))
    preds = np.array(predictions[:min_len])
    acts = np.array(actuals[:min_len])

    result = brier_score(preds, acts)
    assert result >= 0  # Invariant: Brier score cannot be negative

Hypothesis Strategies (Common Patterns)

from hypothesis import strategies as st

# Basic strategies
st.integers(min_value=0, max_value=100)        # Integers in range
st.floats(min_value=0.0, max_value=1.0)        # Floats in range
st.text(min_size=1, max_size=50)               # Random strings
st.booleans()                                   # True or False

# Collection strategies
st.lists(st.integers(), min_size=1, max_size=100)  # Lists of integers
st.sets(st.text(), min_size=0, max_size=10)         # Sets of strings
st.dictionaries(keys=st.text(), values=st.floats()) # Dictionaries

# Composite strategies
st.tuples(st.integers(), st.floats())           # Tuples with mixed types
st.one_of(st.integers(), st.none())             # Either int or None

# Data-driven strategies (most powerful)
st.data()  # Draw values dynamically within test

Pre-commit eligibility

NO - Hypothesis is slow (generates 100+ test cases)

Mark as @pytest.mark.property or @pytest.mark.slow


Fuzz-Based Testing (Hypothesis)

What it is

Tests that feed random or mutated inputs to find crashes, unhandled exceptions, and edge cases in error handling. Unlike property-based testing (which verifies invariants), fuzz testing focuses on resilience - ensuring the code doesn’t crash on unexpected inputs.

When to use

  • Testing error handling and crash resilience (e.g., “parser handles malformed CSV gracefully”)

  • Input validation robustness (e.g., “API rejects invalid data without crashing”)

  • Security testing for injection vulnerabilities (e.g., “SQL queries handle special characters”)

  • Data parsing resilience (e.g., “ingestion handles encoding errors”)

Strengths

  • Finds crashes: Discovers inputs that cause unhandled exceptions

  • Tests error paths: Validates that error handling code actually works

  • Security-focused: Can find injection vulnerabilities and edge cases

  • Low effort: Generate random inputs without thinking about specific cases

Weaknesses

  • Slower: Generates many random test cases

  • May produce noise: Random inputs may trigger expected errors (need to filter)

  • Less targeted: Not testing correctness, just resilience

When to use Fuzz vs Property-Based vs Example-Based

  • Use Fuzz-based when testing error handling and crash resilience

  • Use Property-based when testing invariants (e.g., “output length = input length”)

  • Use Example-based when testing specific scenarios (e.g., “for input X, output is Y”)

Examples

Fuzz-based unit test (data parsing):

from hypothesis import given, strategies as st

@pytest.mark.slow
@given(text=st.text())
def test_parse_team_name_never_crashes(text):
    """Verify parser handles arbitrary text without crashing."""
    try:
        result = parse_team_name(text)
        # If parsing succeeds, result should be valid
        assert isinstance(result, str)
    except ValueError:
        # Expected error for invalid input is acceptable
        pass

Fuzz-based integration test (CSV ingestion):

@pytest.mark.integration
@pytest.mark.slow
@given(data=st.binary())
def test_ingest_csv_handles_malformed_data(data, tmp_path):
    """Verify CSV ingestion handles malformed files gracefully."""
    # Write random binary data to temp file
    csv_file = tmp_path / "malformed.csv"
    csv_file.write_bytes(data)

    # Ingestion should either succeed or raise expected error
    try:
        result = ingest_game_data(csv_file)
        assert isinstance(result, pd.DataFrame)
    except (UnicodeDecodeError, pd.errors.ParserError):
        # Expected errors for malformed data
        pass

Fuzz-based test (API validation):

@pytest.mark.slow
@given(
    season=st.integers(),  # Any integer, including negatives
    game_id=st.text(),     # Any string, including empty/special chars
)
def test_api_validates_inputs_safely(season, game_id):
    """Verify API validation doesn't crash on invalid inputs."""
    api = ChronologicalDataServer()

    try:
        # API should either return data or raise ValueError
        result = api.get_game(season=season, game_id=game_id)
        assert result is not None
    except ValueError as e:
        # Expected validation error
        assert "invalid" in str(e).lower() or "not found" in str(e).lower()

Hypothesis Strategies for Fuzzing

from hypothesis import strategies as st

# Basic fuzzing strategies
st.text()                          # Any Unicode text (including empty, special chars)
st.binary()                        # Random byte sequences
st.integers()                      # Any integer (including negatives, zero, large values)
st.floats(allow_nan=True)          # Floats including NaN, inf, -inf

# Targeted fuzzing
st.text(alphabet=st.characters(blacklist_categories=("Cs",)))  # Valid Unicode only
st.text(min_size=0, max_size=1000)  # Bounded text
st.binary(min_size=0, max_size=10000)  # Bounded binary data

# Domain-specific fuzzing
st.text().filter(lambda x: ";" in x or "'" in x)  # SQL injection patterns
st.text().map(lambda x: x + "\x00" + x)  # Null byte injection

Pre-commit eligibility

NO - Fuzz testing is slow (generates many random test cases)

Mark as @pytest.mark.slow


Decision Tree: Choosing Your Test Approach

Are you testing error handling / crash resilience?
├─ YES → Use Fuzz-Based Testing (@pytest.mark.slow, Hypothesis)
│         - Generate random/mutated inputs to find crashes
│         - Examples: CSV parsing, API validation, input sanitization
│
└─ NO → Does the test verify a specific known scenario?
         ├─ YES → Use Example-Based Testing
         │         - Parametrize if testing multiple similar scenarios
         │         - Examples: regression tests, boundary conditions, known edge cases
         │
         └─ NO → Can you state an invariant that should hold for all inputs?
                  ├─ YES → Use Property-Based Testing (Hypothesis)
                  │         - State the invariant clearly
                  │         - Let Hypothesis generate test cases
                  │         - Examples: "length preserved", "result always positive", "sum = 1.0"
                  │
                  └─ NO → Use Example-Based Testing
                            - If you can't state an invariant, test specific examples
                            - Consider if the function is too complex (may need refactoring)

Combining Multiple Approaches

Many functions benefit from multiple test approaches - each approach tests different aspects:

# Example-based: Test specific known cases
@pytest.mark.parametrize("input,expected", [
    ("St. Mary's", "Saint Mary's"),
    ("UNC", "North Carolina"),
])
def test_clean_team_name_known_cases(input, expected):
    """Verify normalization for known abbreviations."""
    assert clean_team_name(input) == expected

# Property-based: Test invariants
@pytest.mark.property
@given(name=st.text(min_size=1))
def test_clean_team_name_never_empty(name):
    """Verify normalization never returns empty string."""
    result = clean_team_name(name)
    assert len(result) > 0  # Invariant: output is never empty

# Fuzz-based: Test crash resilience
@pytest.mark.slow
@given(name=st.text())  # Including empty strings, special chars
def test_clean_team_name_never_crashes(name):
    """Verify normalization handles any text without crashing."""
    try:
        result = clean_team_name(name)
        assert isinstance(result, str)
    except ValueError:
        # Expected error for invalid team names
        pass

Why use all three?

  • Example-based: Verifies correctness for known scenarios (e.g., “UNC” → “North Carolina”)

  • Property-based: Verifies invariants hold universally (e.g., output never empty for valid input)

  • Fuzz-based: Verifies resilience to unexpected inputs (e.g., doesn’t crash on special characters)


See Also