Design Philosophy¶
This document explains the core design philosophy behind pytest-test-categories and the reasoning for its strict enforcement approach.
The “No Escape Hatches” Philosophy¶
pytest-test-categories is built on a fundamental principle: small tests must be truly hermetic. This means:
No network access (not even localhost for small tests)
No filesystem access (ALL access blocked, including
tmp_path)No subprocess spawning
No database connections (including in-memory SQLite)
No sleep calls
Must complete within 1 second
Unlike many testing tools that provide optional enforcement or easy overrides, pytest-test-categories is intentionally strict. This design choice is rooted in Google’s “Software Engineering at Google” testing philosophy.
Why Strictness Matters¶
1. Flaky tests are expensive
Flaky tests erode trust in the test suite. When developers can’t rely on test results, they:
Ignore legitimate failures
Re-run tests multiple times “just in case”
Spend hours debugging phantom failures
Eventually stop writing tests altogether
By enforcing hermeticity at the plugin level, we eliminate entire categories of flakiness.
2. Escape hatches become the norm
When a tool provides easy ways to bypass restrictions, teams inevitably use them:
# "Just this once" becomes standard practice
@pytest.mark.small
@pytest.mark.allow_network # Hypothetical escape hatch
def test_user_service():
response = requests.get("http://api.example.com/users") # Still flaky!
...
pytest-test-categories deliberately does not provide such markers. If your test needs network access, it should be marked as @pytest.mark.medium or larger.
3. Categories have meaning
The test size categories are not arbitrary labels - they define contracts:
Category |
Meaning |
Resources |
|---|---|---|
Small |
Unit test, single process, in-memory |
None (hermetic) |
Medium |
Integration test, single machine |
Localhost network, filesystem |
Large |
System test, multi-machine possible |
Full network, external services |
XLarge |
Performance/stress test |
Unlimited |
If a “small” test can access the network, it’s not really a small test. The label loses meaning.
Trade-offs and Design Decisions¶
Decision: Block all network for small tests¶
What we chose: Small tests cannot make any network connections, not even to localhost.
Alternative considered: Allow localhost connections for small tests.
Rationale: Even localhost connections can be flaky:
Port conflicts in parallel test execution
Service startup timing issues
Resource exhaustion under load
If you need localhost, use @pytest.mark.medium. Medium tests allow localhost-only connections.
Decision: Block in-memory SQLite¶
What we chose: Even :memory: SQLite databases are blocked in small tests.
Alternative considered: Allow in-memory databases since they don’t involve I/O.
Rationale:
Consistency - database usage is database usage, regardless of storage
Design smell - if you need a database, your test might be too integrated
Encourages better patterns - use repository fakes, not in-memory databases
# Instead of this:
@pytest.mark.small
def test_user_repository():
conn = sqlite3.connect(":memory:") # Blocked!
repo = UserRepository(conn)
...
# Do this:
@pytest.mark.small
def test_user_repository():
repo = FakeUserRepository() # In-memory, no database
...
Decision: Enforcement modes exist, but for migration only¶
What we chose: Three enforcement modes - strict, warn, and off.
Why have modes if we’re strict?
The modes exist to support gradual adoption, not permanent bypass:
off: For initial exploration - see what would fail
warn: For migration - fix violations incrementally
strict: The destination - where you should end up
# pyproject.toml - a migration journey
# Week 1: Discovery
[tool.pytest.ini_options]
test_categories_enforcement = "off"
# Week 2-4: Migration
[tool.pytest.ini_options]
test_categories_enforcement = "warn"
# Week 5+: Enforced
[tool.pytest.ini_options]
test_categories_enforcement = "strict"
Decision: No per-test exemptions¶
What we chose: No @pytest.mark.allow_network or similar markers.
Alternative considered: Per-test overrides like pytest-socket provides.
Rationale: Per-test overrides defeat the purpose:
They proliferate (every test becomes “special”)
They’re never removed (technical debt)
They make the category meaningless
Instead, use the right category:
# Wrong: Forcing a square peg into a round hole
@pytest.mark.small
@pytest.mark.allow_network # DON'T DO THIS
def test_external_api():
...
# Right: Use the appropriate category
@pytest.mark.medium # Honest about what the test does
def test_external_api():
...
Comparison with Other Approaches¶
pytest-socket¶
pytest-socket blocks network access and provides allowlisting:
@pytest.mark.enable_socket
def test_with_network():
...
Difference: pytest-test-categories ties network blocking to test size categories. You don’t opt in or out of blocking - you declare what kind of test you’re writing.
pyfakefs¶
pyfakefs provides a fake filesystem for testing:
def test_file_operations(fs):
fs.create_file("/path/to/file.txt")
...
Difference: pyfakefs replaces the filesystem with an in-memory fake. pytest-test-categories blocks access entirely for small tests. These are complementary approaches:
Use pyfakefs in small tests when you need filesystem semantics
Use tmp_path in medium tests when you need real filesystem access
pytest-test-categories enforces which approach is valid for each test size
freezegun/time-machine¶
These tools freeze or control time:
@freeze_time("2024-01-01")
def test_time_dependent():
...
Difference: Time-freezing is about testing time-dependent logic. pytest-test-categories’ sleep blocking is about preventing arbitrary delays:
# This is fine - testing time logic
@pytest.mark.small
@freeze_time("2024-01-01")
def test_date_formatting():
assert format_date(datetime.now()) == "January 1, 2024"
# This is blocked - arbitrary delay
@pytest.mark.small
def test_polling():
time.sleep(0.1) # Blocked! Use proper synchronization
check_state()
The Distribution Target Philosophy¶
pytest-test-categories enforces not just individual test behavior but also the overall test distribution:
80% small tests (hermetic, fast, reliable)
15% medium tests (integration, localhost allowed)
5% large/xlarge tests (system tests, full access)
Why enforce distribution?¶
The Testing Pyramid: Teams naturally drift toward larger tests because they’re “easier” to write:
No need for mocks or fakes
Can test “the real thing”
Less upfront design
But larger tests are slower and less reliable. Enforcing distribution keeps teams honest.
Economics: If 80% of tests run in under 1 second each, your test suite stays fast even as it grows. If most tests are large, CI becomes a bottleneck.
Tolerance bands¶
The targets have tolerance bands to be practical:
Small: 80% (+/-5%) = 75-85%
Medium: 15% (+/-5%) = 10-20%
Large/XLarge: 5% (+/-3%) = 2-8%
These allow normal variation while preventing drift.
Actionable Error Messages¶
When pytest-test-categories blocks something, it tells you exactly how to fix it:
============================================================
HermeticityViolationError
============================================================
Test: test_fetch_user_profile (tests/test_users.py:42)
Category: SMALL
Violation: Network access attempted
Details:
Attempted connection to: api.example.com:443
Small tests have restricted resource access. Options:
1. Mock the network call using responses, httpretty, or respx
2. Use dependency injection to provide a fake HTTP client
3. Change test category to @pytest.mark.medium (if network is required)
Documentation: https://pytest-test-categories.readthedocs.io/
============================================================
This philosophy of helpful errors is intentional:
Don’t just say “no” - explain why
Provide multiple solutions - one might fit better
Link to documentation - for deeper learning
Summary¶
pytest-test-categories is opinionated by design. It embodies the belief that:
Test reliability is non-negotiable - flaky tests destroy developer trust
Categories should mean something - a small test is hermetic or it’s not small
Strictness enables speed - hermetic tests can run in parallel, always
Gradual adoption, then enforcement - modes are for migration, not bypass
Help, don’t just block - every error should include remediation guidance
This philosophy produces test suites that are fast, reliable, and meaningful.