Database Isolation for Hermetic Tests¶
Database isolation is a test enforcement mechanism that prevents small tests from connecting to databases during execution. This ensures tests are hermetic and run entirely in memory with no external state.
When enabled, the pytest-test-categories plugin intercepts database connection attempts and either blocks them or warns about them, depending on your configuration.
Why Database Isolation Matters¶
Tests that connect to databases introduce several problems:
External State¶
Databases maintain state outside the test process:
Data persists between test runs
Schema changes affect test behavior
Indexes and statistics change query plans
Connection pooling introduces shared state
Non-Determinism¶
Shared databases cause test interference:
Tests may see data from other tests
Parallel execution causes race conditions
Cleanup failures leave orphaned data
Sequence/auto-increment values vary
I/O Overhead¶
Database connections involve significant I/O:
Connection establishment (TCP handshake, authentication)
Query parsing and planning
Data serialization and deserialization
Network round trips for each query
Environment Coupling¶
Tests become dependent on database availability:
Database must be running for tests to pass
Schema must match expected structure
Configuration (host, port, credentials) varies between environments
Database version differences cause subtle bugs
Test Size Restrictions¶
Database isolation follows Google’s test size definitions from “Software Engineering at Google”:
Test Size |
Database Access |
Rationale |
|---|---|---|
Small |
Blocked (including |
Must be hermetic, run in memory only |
Medium |
Allowed |
May use databases for integration tests |
Large |
Allowed |
May access production-like databases |
XLarge |
Allowed |
May access real databases |
Small Tests¶
Small tests run entirely in memory:
Fast: No database connection overhead
Hermetic: No dependency on database state
Deterministic: Same input always produces same output
Parallelizable: No database contention
Database isolation enforces hermeticity by blocking all database connections in small tests, including in-memory SQLite (:memory:).
Why Block In-Memory SQLite?¶
You might wonder why :memory: SQLite is also blocked for small tests. While it doesn’t perform file I/O, it still represents a database usage pattern that:
Encourages reliance on database behavior in unit tests
Makes tests slower than pure in-memory data structures
Creates coupling between business logic and persistence
For true unit tests, use plain Python data structures (dicts, lists, dataclasses) instead.
Medium, Large, and XLarge Tests¶
These tests may access databases freely, enabling:
Repository layer integration tests
Schema migration tests
Query performance tests
Data integrity tests
How It Works¶
The plugin intercepts database connections by patching database library connection functions:
Patched Entry Points¶
The following database entry points are intercepted:
Always patched (standard library):
sqlite3.connect- SQLite database connections
Patched if installed:
psycopg2.connect/psycopg.connect- PostgreSQLpymysql.connect- MySQLpymongo.MongoClient- MongoDBredis.Redis/redis.StrictRedis- Redissqlalchemy.create_engine- SQLAlchemy ORM
Connection Interception¶
When a test attempts to connect:
The blocker intercepts the connection call
It extracts the library name and connection string
It checks if the connection is allowed based on test size
For violations, it either raises an exception (STRICT) or warns (WARN)
Enabling Database Isolation¶
Database isolation is controlled by the test_categories_enforcement configuration option.
Configuration via pyproject.toml¶
[tool.pytest.ini_options]
# Enable database isolation enforcement
test_categories_enforcement = "strict"
Configuration via pytest.ini¶
[pytest]
test_categories_enforcement = strict
Configuration via Command Line¶
pytest --test-categories-enforcement=strict
Enforcement Modes¶
The plugin supports three enforcement modes:
STRICT Mode¶
test_categories_enforcement = "strict"
In strict mode, database violations immediately fail the test with a detailed error message:
[TC004] Database Connection Violation
Test: tests/test_repos.py::test_user_repository
Category: SMALL
What happened:
Attempted sqlite3 database connection to: :memory:
How to fix:
1. Mock sqlite3.connect using pytest-mock (mocker.patch)
2. Use dependency injection to provide a fake database/repository
3. Use in-memory data structures (dict, list) for test data
4. Test business logic separately from database operations
5. Change test category to @pytest.mark.medium (if database access is required)
Documentation: https://pytest-test-categories.readthedocs.io/errors/TC004
Use strict mode in CI pipelines to catch violations before merge.
WARN Mode¶
test_categories_enforcement = "warn"
In warn mode, database violations emit a warning but allow the test to continue:
PytestWarning: Database connection violation in test_user_repository:
attempted sqlite3.connect: :memory:
Use warn mode during migration to identify violations without breaking the build.
OFF Mode¶
test_categories_enforcement = "off"
In off mode, database isolation is disabled entirely.
Common Remediation Strategies¶
1. Use In-Memory Data Structures¶
Replace databases with simple Python structures:
from dataclasses import dataclass
import pytest
@dataclass
class User:
id: str
name: str
email: str
# Instead of a real repository with database
class FakeUserRepository:
def __init__(self):
self._users: dict[str, User] = {}
def save(self, user: User) -> None:
self._users[user.id] = user
def get(self, user_id: str) -> User | None:
return self._users.get(user_id)
@pytest.mark.small
def test_user_service():
repo = FakeUserRepository()
user = User(id="123", name="Alice", email="alice@example.com")
repo.save(user)
retrieved = repo.get("123")
assert retrieved.name == "Alice"
2. Use Dependency Injection¶
Design code to accept repository interfaces:
from abc import ABC, abstractmethod
from dataclasses import dataclass
import pytest
@dataclass
class User:
id: str
name: str
class UserRepository(ABC):
@abstractmethod
def get(self, user_id: str) -> User | None: ...
@abstractmethod
def save(self, user: User) -> None: ...
# Production implementation
class SqlUserRepository(UserRepository):
def __init__(self, connection):
self.conn = connection
def get(self, user_id: str) -> User | None:
cursor = self.conn.execute(
"SELECT id, name FROM users WHERE id = ?", (user_id,)
)
row = cursor.fetchone()
return User(id=row[0], name=row[1]) if row else None
def save(self, user: User) -> None:
self.conn.execute(
"INSERT INTO users (id, name) VALUES (?, ?)",
(user.id, user.name)
)
# Test implementation
class FakeUserRepository(UserRepository):
def __init__(self):
self._users: dict[str, User] = {}
def get(self, user_id: str) -> User | None:
return self._users.get(user_id)
def save(self, user: User) -> None:
self._users[user.id] = user
# Service that uses repository
class UserService:
def __init__(self, repo: UserRepository):
self.repo = repo
def create_user(self, name: str) -> User:
import uuid
user = User(id=str(uuid.uuid4()), name=name)
self.repo.save(user)
return user
# Small test with fake repository
@pytest.mark.small
def test_create_user():
repo = FakeUserRepository()
service = UserService(repo)
user = service.create_user("Alice")
assert user.name == "Alice"
assert repo.get(user.id) is not None
3. Mock Database Connections¶
Use pytest-mock to intercept database calls:
import pytest
@pytest.mark.small
def test_user_lookup(mocker):
mock_conn = mocker.MagicMock()
mock_cursor = mocker.MagicMock()
mock_conn.execute.return_value = mock_cursor
mock_cursor.fetchone.return_value = ("123", "Alice")
mocker.patch("sqlite3.connect", return_value=mock_conn)
from myapp.repos import get_user
user = get_user("123")
assert user.name == "Alice"
4. Test Business Logic Separately¶
Separate business logic from persistence:
from dataclasses import dataclass
import pytest
@dataclass
class Order:
items: list[str]
subtotal: float
tax: float
total: float
# Pure business logic - no database
def calculate_order_totals(items: list[str], prices: dict[str, float], tax_rate: float) -> Order:
subtotal = sum(prices.get(item, 0) for item in items)
tax = subtotal * tax_rate
total = subtotal + tax
return Order(items=items, subtotal=subtotal, tax=tax, total=total)
# Small test for pure logic
@pytest.mark.small
def test_calculate_order_totals():
prices = {"apple": 1.00, "banana": 0.50}
order = calculate_order_totals(
items=["apple", "banana"],
prices=prices,
tax_rate=0.10,
)
assert order.subtotal == 1.50
assert order.tax == 0.15
assert order.total == 1.65
5. Use Fixtures for Medium Tests¶
Create database fixtures for integration testing:
import sqlite3
import pytest
@pytest.fixture
def db_connection():
"""Create an in-memory SQLite database for medium tests."""
conn = sqlite3.connect(":memory:")
conn.execute("""
CREATE TABLE users (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL
)
""")
yield conn
conn.close()
@pytest.mark.medium
def test_user_repository_integration(db_connection):
from myapp.repos import SqlUserRepository, User
repo = SqlUserRepository(db_connection)
user = User(id="123", name="Alice", email="alice@example.com")
repo.save(user)
retrieved = repo.get("123")
assert retrieved.name == "Alice"
6. Use Docker Containers for Real Databases¶
For production-like integration tests:
import pytest
@pytest.fixture(scope="session")
def postgres_container():
"""Start a PostgreSQL container for integration tests."""
import docker
client = docker.from_env()
container = client.containers.run(
"postgres:15",
detach=True,
ports={"5432/tcp": None},
environment={
"POSTGRES_PASSWORD": "test",
"POSTGRES_DB": "testdb",
},
)
# Wait for postgres to be ready
import time
time.sleep(2)
port = container.ports["5432/tcp"][0]["HostPort"]
yield f"postgresql://postgres:test@localhost:{port}/testdb"
container.stop()
container.remove()
@pytest.mark.medium
def test_postgres_integration(postgres_container):
import psycopg2
conn = psycopg2.connect(postgres_container)
# Test with real PostgreSQL
...
7. Change Test Size¶
If the test legitimately requires database access:
import pytest
@pytest.mark.medium # Medium tests can access databases
def test_database_migration(db_connection):
from myapp.migrations import apply_migrations
apply_migrations(db_connection)
# Verify schema
cursor = db_connection.execute(
"SELECT name FROM sqlite_master WHERE type='table'"
)
tables = {row[0] for row in cursor.fetchall()}
assert "users" in tables
Best Practices¶
1. Start with WARN Mode¶
When first enabling database isolation, use warn mode to identify all violations:
pytest --test-categories-enforcement=warn 2>&1 | grep "Database connection violation"
2. Use the Repository Pattern¶
Separate data access from business logic:
+------------------+ +-------------------+ +----------+
| Business Logic | --> | Repository | --> | Database |
| (Small Tests) | | (Interface) | | |
+------------------+ +-------------------+ +----------+
^
|
+-------+-------+
| |
+-----+-----+ +-----+-----+
| Fake | | Real |
| Repository| | Repository|
| (Tests) | | (Prod) |
+-----------+ +-----------+
3. Use Factories for Test Data¶
Create test data without database:
from dataclasses import dataclass, field
import uuid
@dataclass
class User:
id: str = field(default_factory=lambda: str(uuid.uuid4()))
name: str = "Test User"
email: str = "test@example.com"
# Easy to create test data
def test_user_email_validation():
user = User(email="invalid")
assert not is_valid_email(user.email)
4. Test Queries Separately¶
If you need to test SQL queries, do it in medium tests:
import pytest
# Small test: Test query building logic
@pytest.mark.small
def test_build_search_query():
from myapp.queries import build_user_search_query
query, params = build_user_search_query(name="Alice", active=True)
assert "WHERE" in query
assert "name LIKE ?" in query
assert "active = ?" in query
assert params == ("%Alice%", True)
# Medium test: Test query execution
@pytest.mark.medium
def test_execute_search_query(db_connection):
# Seed data
db_connection.execute("INSERT INTO users VALUES ('1', 'Alice', 1)")
from myapp.repos import search_users
results = search_users(db_connection, name="Alice")
assert len(results) == 1
Troubleshooting¶
“DatabaseViolationError” for :memory: SQLite¶
Even in-memory SQLite is blocked for small tests. This is intentional - use Python data structures instead.
Solution: Use dicts, lists, or dataclasses for test data.
“psycopg2.connect not being blocked”¶
Ensure the library is installed in your test environment. The plugin only patches libraries that are available.
“SQLAlchemy create_engine not blocked”¶
SQLAlchemy’s create_engine is lazy - it doesn’t connect until you actually execute a query. The connection happens when:
You call
engine.connect()You execute a query
You create a session that executes queries
The plugin intercepts the underlying database connection, not create_engine.
“Tests pass locally but fail in CI”¶
Common causes:
Different database versions
Schema differences
Missing migrations
Environment variable configuration
Solution: Use containerized databases with pinned versions for CI.