Test Approach Guide: Example-Based vs Property-Based vs Fuzz-Based Testing¶
This guide explains how you write tests - the approach used to specify test inputs and expected behavior.
Overview¶
Test approach is one of the four orthogonal dimensions of testing. It’s orthogonal to test scope - you can use any approach for unit tests, integration tests, or any other test type.
The project supports three test approaches:
Example-based: Concrete input/output pairs for known scenarios
Property-based: Invariants tested across generated inputs (Hypothesis)
Fuzz-based: Random/mutated inputs for crash resilience (Hypothesis)
Example-Based Testing (Standard Pytest)¶
What it is¶
Tests that specify concrete input examples and their expected outputs.
When to use¶
Testing specific known scenarios (e.g., “log loss of [0.9] vs [1] is 0.105”)
Regression tests for previously discovered bugs
Boundary conditions you want to verify explicitly
Cases where you have domain knowledge of important edge cases
Strengths¶
Easy to understand: Concrete examples are easy to read and debug
Fast to write: Just specify input → expected output
Precise: You control exactly what scenarios are tested
Fast to run: Tests a fixed number of cases
Weaknesses¶
Limited coverage: Only tests the scenarios you thought of
May miss edge cases: Humans are bad at imagining all corner cases
Techniques¶
Simple assertions: Basic input → output tests
def test_elo_update_increases_winner_rating(elo_config, sample_game):
"""Verify Elo rating increases for the winning team."""
engine = EloFeatureEngine(elo_config)
initial_rating = engine.get_rating(sample_game.w_team_id)
engine.update_game(sample_game)
assert engine.get_rating(sample_game.w_team_id) > initial_rating
Parametrized tests: @pytest.mark.parametrize to test multiple scenarios with same logic
@pytest.mark.parametrize("predictions,actuals,expected", [
([1.0, 0.0, 1.0], [1, 0, 1], 0.0), # Perfect predictions
([0.9, 0.1, 0.9], [1, 0, 1], 0.03), # Near-perfect predictions
([0.5, 0.5, 0.5], [1, 0, 1], 0.25), # Random guessing
])
def test_brier_score_known_cases(predictions, actuals, expected):
"""Verify Brier score for known prediction scenarios."""
result = brier_score(np.array(predictions), np.array(actuals))
assert abs(result - expected) < 0.01
Pre-commit eligibility¶
✅ YES (if fast) - Mark as @pytest.mark.smoke
Property-Based Testing (Hypothesis)¶
What it is¶
Tests that specify invariants (properties that should always hold) and let Hypothesis automatically generate hundreds of test cases.
When to use¶
Testing invariants across many inputs (e.g., “output length always equals input length”)
Mathematical properties (e.g., “probabilities always in [0, 1]”, “log loss always non-negative”)
Discovering edge cases you didn’t think of
Functions with complex input spaces where manual examples would be tedious
Strengths¶
Comprehensive coverage: Tests hundreds of generated cases automatically
Finds edge cases: Often discovers bugs in corner cases humans miss
Self-documenting: The invariant clearly states what property is being tested
Regression shrinking: When a test fails, Hypothesis finds the minimal failing case
Weaknesses¶
Slower: Generates and runs many test cases (100+ by default)
Harder to debug: Failures may involve unusual generated inputs
Requires invariant thinking: Need to identify what properties should hold
When to use Hypothesis vs. Parametrized Pytest¶
Use Hypothesis when you can state an invariant (e.g., “output length = input length for all inputs”)
Use Parametrized Pytest when you have specific scenarios to verify (e.g., “for input X, output is Y”)
Examples¶
Property-based unit test:
from hypothesis import given, strategies as st
@pytest.mark.property
@given(data=st.lists(st.integers(), min_size=1, max_size=100))
def test_rolling_average_length_invariant(data):
"""Verify rolling average output length matches input (invariant holds for all inputs)."""
series = pd.Series(data)
result = calculate_rolling_average(series, window=5)
assert len(result) == len(series) # Invariant: length preserved
Property-based integration test:
@pytest.mark.integration
@pytest.mark.property
@given(cutoff_year=st.integers(2015, 2025))
def test_temporal_boundary_invariant(cutoff_year):
"""Verify API never returns data beyond cutoff (invariant holds for all years)."""
api = ChronologicalDataServer()
games = api.get_games_before(cutoff_year=cutoff_year)
# Invariant: all games before or at cutoff year
assert all(game.season <= cutoff_year for game in games)
Property-based test with complex strategy:
@pytest.mark.property
@given(
predictions=st.lists(st.floats(0, 1), min_size=1, max_size=1000),
actuals=st.lists(st.integers(0, 1), min_size=1, max_size=1000),
)
def test_brier_score_always_non_negative(predictions, actuals):
"""Verify Brier score is always >= 0 (mathematical property)."""
# Make lists same length
min_len = min(len(predictions), len(actuals))
preds = np.array(predictions[:min_len])
acts = np.array(actuals[:min_len])
result = brier_score(preds, acts)
assert result >= 0 # Invariant: Brier score cannot be negative
Hypothesis Strategies (Common Patterns)¶
from hypothesis import strategies as st
# Basic strategies
st.integers(min_value=0, max_value=100) # Integers in range
st.floats(min_value=0.0, max_value=1.0) # Floats in range
st.text(min_size=1, max_size=50) # Random strings
st.booleans() # True or False
# Collection strategies
st.lists(st.integers(), min_size=1, max_size=100) # Lists of integers
st.sets(st.text(), min_size=0, max_size=10) # Sets of strings
st.dictionaries(keys=st.text(), values=st.floats()) # Dictionaries
# Composite strategies
st.tuples(st.integers(), st.floats()) # Tuples with mixed types
st.one_of(st.integers(), st.none()) # Either int or None
# Data-driven strategies (most powerful)
st.data() # Draw values dynamically within test
Pre-commit eligibility¶
❌ NO - Hypothesis is slow (generates 100+ test cases)
Mark as @pytest.mark.property or @pytest.mark.slow
Fuzz-Based Testing (Hypothesis)¶
What it is¶
Tests that feed random or mutated inputs to find crashes, unhandled exceptions, and edge cases in error handling. Unlike property-based testing (which verifies invariants), fuzz testing focuses on resilience - ensuring the code doesn’t crash on unexpected inputs.
When to use¶
Testing error handling and crash resilience (e.g., “parser handles malformed CSV gracefully”)
Input validation robustness (e.g., “API rejects invalid data without crashing”)
Security testing for injection vulnerabilities (e.g., “SQL queries handle special characters”)
Data parsing resilience (e.g., “ingestion handles encoding errors”)
Strengths¶
Finds crashes: Discovers inputs that cause unhandled exceptions
Tests error paths: Validates that error handling code actually works
Security-focused: Can find injection vulnerabilities and edge cases
Low effort: Generate random inputs without thinking about specific cases
Weaknesses¶
Slower: Generates many random test cases
May produce noise: Random inputs may trigger expected errors (need to filter)
Less targeted: Not testing correctness, just resilience
When to use Fuzz vs Property-Based vs Example-Based¶
Use Fuzz-based when testing error handling and crash resilience
Use Property-based when testing invariants (e.g., “output length = input length”)
Use Example-based when testing specific scenarios (e.g., “for input X, output is Y”)
Examples¶
Fuzz-based unit test (data parsing):
from hypothesis import given, strategies as st
@pytest.mark.slow
@given(text=st.text())
def test_parse_team_name_never_crashes(text):
"""Verify parser handles arbitrary text without crashing."""
try:
result = parse_team_name(text)
# If parsing succeeds, result should be valid
assert isinstance(result, str)
except ValueError:
# Expected error for invalid input is acceptable
pass
Fuzz-based integration test (CSV ingestion):
@pytest.mark.integration
@pytest.mark.slow
@given(data=st.binary())
def test_ingest_csv_handles_malformed_data(data, tmp_path):
"""Verify CSV ingestion handles malformed files gracefully."""
# Write random binary data to temp file
csv_file = tmp_path / "malformed.csv"
csv_file.write_bytes(data)
# Ingestion should either succeed or raise expected error
try:
result = ingest_game_data(csv_file)
assert isinstance(result, pd.DataFrame)
except (UnicodeDecodeError, pd.errors.ParserError):
# Expected errors for malformed data
pass
Fuzz-based test (API validation):
@pytest.mark.slow
@given(
season=st.integers(), # Any integer, including negatives
game_id=st.text(), # Any string, including empty/special chars
)
def test_api_validates_inputs_safely(season, game_id):
"""Verify API validation doesn't crash on invalid inputs."""
api = ChronologicalDataServer()
try:
# API should either return data or raise ValueError
result = api.get_game(season=season, game_id=game_id)
assert result is not None
except ValueError as e:
# Expected validation error
assert "invalid" in str(e).lower() or "not found" in str(e).lower()
Hypothesis Strategies for Fuzzing¶
from hypothesis import strategies as st
# Basic fuzzing strategies
st.text() # Any Unicode text (including empty, special chars)
st.binary() # Random byte sequences
st.integers() # Any integer (including negatives, zero, large values)
st.floats(allow_nan=True) # Floats including NaN, inf, -inf
# Targeted fuzzing
st.text(alphabet=st.characters(blacklist_categories=("Cs",))) # Valid Unicode only
st.text(min_size=0, max_size=1000) # Bounded text
st.binary(min_size=0, max_size=10000) # Bounded binary data
# Domain-specific fuzzing
st.text().filter(lambda x: ";" in x or "'" in x) # SQL injection patterns
st.text().map(lambda x: x + "\x00" + x) # Null byte injection
Pre-commit eligibility¶
❌ NO - Fuzz testing is slow (generates many random test cases)
Mark as @pytest.mark.slow
Decision Tree: Choosing Your Test Approach¶
Are you testing error handling / crash resilience?
├─ YES → Use Fuzz-Based Testing (@pytest.mark.slow, Hypothesis)
│ - Generate random/mutated inputs to find crashes
│ - Examples: CSV parsing, API validation, input sanitization
│
└─ NO → Does the test verify a specific known scenario?
├─ YES → Use Example-Based Testing
│ - Parametrize if testing multiple similar scenarios
│ - Examples: regression tests, boundary conditions, known edge cases
│
└─ NO → Can you state an invariant that should hold for all inputs?
├─ YES → Use Property-Based Testing (Hypothesis)
│ - State the invariant clearly
│ - Let Hypothesis generate test cases
│ - Examples: "length preserved", "result always positive", "sum = 1.0"
│
└─ NO → Use Example-Based Testing
- If you can't state an invariant, test specific examples
- Consider if the function is too complex (may need refactoring)
Combining Multiple Approaches¶
Many functions benefit from multiple test approaches - each approach tests different aspects:
# Example-based: Test specific known cases
@pytest.mark.parametrize("input,expected", [
("St. Mary's", "Saint Mary's"),
("UNC", "North Carolina"),
])
def test_clean_team_name_known_cases(input, expected):
"""Verify normalization for known abbreviations."""
assert clean_team_name(input) == expected
# Property-based: Test invariants
@pytest.mark.property
@given(name=st.text(min_size=1))
def test_clean_team_name_never_empty(name):
"""Verify normalization never returns empty string."""
result = clean_team_name(name)
assert len(result) > 0 # Invariant: output is never empty
# Fuzz-based: Test crash resilience
@pytest.mark.slow
@given(name=st.text()) # Including empty strings, special chars
def test_clean_team_name_never_crashes(name):
"""Verify normalization handles any text without crashing."""
try:
result = clean_team_name(name)
assert isinstance(result, str)
except ValueError:
# Expected error for invalid team names
pass
Why use all three?
Example-based: Verifies correctness for known scenarios (e.g., “UNC” → “North Carolina”)
Property-based: Verifies invariants hold universally (e.g., output never empty for valid input)
Fuzz-based: Verifies resilience to unexpected inputs (e.g., doesn’t crash on special characters)
See Also¶
Test Scope Guide - Unit vs Integration tests
Test Purpose Guide - Functional, Performance, Regression
Execution Guide - When tests/checks run (4-tier model)
Quality Assurance Guide - Mutation testing, coverage analysis