Test Sizes¶

pytest-test-categories implements Google’s test size taxonomy, which categorizes tests by their execution characteristics and resource requirements.

Overview¶

Test sizes provide a vocabulary for describing what a test does and what resources it needs. This taxonomy helps teams:

Set appropriate expectations for test execution time
Design tests with the right level of isolation
Maintain a balanced test pyramid
Optimize CI/CD pipeline performance

The Four Test Sizes¶

Small Tests¶

Small tests are the foundation of your test suite. They should make up approximately 80% of all tests.

Characteristics:

Execute in under 1 second
Run entirely in memory
No network access
No filesystem access (except for test fixtures)
No database access
Fully deterministic

Use cases:

Unit tests for pure functions
Testing business logic in isolation
Validator and parser tests
Data transformation tests

Example:

import pytest

@pytest.mark.small
def test_calculate_discount():
    """Small test for pure business logic."""
    from myapp.pricing import calculate_discount

    result = calculate_discount(original_price=100, discount_percent=20)

    assert result == 80.0

@pytest.mark.small
def test_email_validation():
    """Small test for input validation."""
    from myapp.validators import is_valid_email

    assert is_valid_email("user@example.com") is True
    assert is_valid_email("invalid-email") is False

Medium Tests¶

Medium tests may access local services and typically make up about 15% of your test suite.

Characteristics:

Execute in under 5 minutes
May access localhost services
May use local databases (PostgreSQL, MySQL, SQLite)
May use local caches (Redis, Memcached)
Should still be deterministic

Use cases:

Repository layer tests with real databases
Cache integration tests
Local service integration tests
Tests using Docker containers

Example:

import pytest

@pytest.mark.medium
def test_user_repository_creates_user(postgres_connection):
    """Medium test using a local PostgreSQL database."""
    from myapp.repositories import UserRepository

    repo = UserRepository(postgres_connection)
    user = repo.create(name="Alice", email="alice@example.com")

    assert user.id is not None
    assert user.name == "Alice"

@pytest.mark.medium
def test_cache_stores_and_retrieves(redis_client):
    """Medium test using a local Redis instance."""
    from myapp.cache import UserCache

    cache = UserCache(redis_client)
    cache.set("user:123", {"name": "Bob"})

    result = cache.get("user:123")
    assert result["name"] == "Bob"

Large Tests¶

Large tests may access external services and typically make up about 5% of your test suite (combined with XLarge).

Characteristics:

Execute in under 15 minutes
May access external networks
May call real APIs
May use staging environments
May be non-deterministic

Use cases:

End-to-end workflow tests
External API integration tests
Staging environment tests
Contract tests

Example:

import pytest

@pytest.mark.large
def test_payment_workflow_end_to_end(staging_client):
    """Large test for complete payment workflow."""
    # Create order
    order = staging_client.create_order(items=["SKU001", "SKU002"])

    # Process payment
    payment = staging_client.process_payment(order.id, amount=order.total)

    # Verify order completed
    assert payment.status == "completed"
    assert staging_client.get_order(order.id).status == "paid"

@pytest.mark.large
def test_external_api_integration():
    """Large test calling an external API."""
    import httpx

    response = httpx.get("https://api.example.com/health")

    assert response.status_code == 200

XLarge Tests¶

XLarge tests are for scenarios that need extended execution time, similar to large tests in resource access.

Characteristics:

Execute in under 15 minutes (same as Large)
May access external networks
Used for extended or stress testing scenarios

Use cases:

Performance benchmarks
Stress tests
Migration testing
Large data set processing

Example:

import pytest

@pytest.mark.xlarge
def test_bulk_import_performance(large_dataset):
    """XLarge test for bulk import performance."""
    from myapp.importers import BulkImporter

    importer = BulkImporter()
    result = importer.import_records(large_dataset)

    assert result.success_count == len(large_dataset)
    assert result.duration_seconds < 300  # 5 minutes max

Choosing the Right Size¶

Use this decision tree to choose the appropriate test size:

Does the test need external network access?
- Yes: Use @pytest.mark.large or @pytest.mark.xlarge
- No: Continue to step 2
Does the test need local services (databases, caches)?
- Yes: Use @pytest.mark.medium
- No: Continue to step 3
Can the test complete in under 1 second?
- Yes: Use @pytest.mark.small
- No: Consider refactoring or use @pytest.mark.medium

What Counts as Medium¶

The line between “small” and “medium” can be confusing. Here’s explicit guidance:

Allowed in Medium Tests¶

Resource	Examples	Notes
Localhost HTTP	Test spins up `httpx.MockTransport`, Flask test client	Server created and controlled by the test
Local Database	SQLite in `tmp_path`, PostgreSQL in Docker	Isolated per-test instance
Filesystem	`tmp_path` fixture, tempfile	Only within test-controlled directories
In-Memory Stores	Redis mock, in-memory SQLite	No persistent state between tests

NOT What We Mean by Medium¶

Scenario	Why It’s Wrong	Correct Category
Docker-compose sprawl	Orchestration = Large	`@pytest.mark.large`
LocalStack / moto	AWS simulation = external-like complexity	`@pytest.mark.large`
Your laptop’s Postgres	Shared state, not isolated	`@pytest.mark.large`
“Kinda integration”	If you’re hedging, it’s probably Large	`@pytest.mark.large`
Staging environment	External network	`@pytest.mark.large`

The Heuristic¶

If it requires orchestration, it’s Large.

Medium tests should be:

Self-contained: The test creates what it needs
Isolated: No shared state with other tests
Fast enough: Under 5 minutes
Localhost-only: No external network calls

Example: When to Choose Medium vs. Large¶

# MEDIUM: Test creates and controls the database
@pytest.mark.medium
def test_user_repository(tmp_path):
    db = sqlite3.connect(tmp_path / "test.db")
    repo = UserRepository(db)
    repo.create(User(name="Alice"))
    assert repo.count() == 1

# LARGE: Test uses external orchestration
@pytest.mark.large
def test_user_service_with_docker(docker_compose):
    # docker-compose.yml defines postgres, redis, etc.
    client = ServiceClient(docker_compose.get_url("api"))
    client.create_user("Alice")
    assert client.get_users() == ["Alice"]

Using Base Test Classes¶

As an alternative to markers, inherit from base test classes:

from pytest_test_categories import SmallTest, MediumTest, LargeTest, XLargeTest

class TestCalculator(SmallTest):
    """All tests in this class are marked as small."""

    def test_add(self):
        assert Calculator().add(1, 2) == 3

    def test_subtract(self):
        assert Calculator().subtract(5, 3) == 2

class TestDatabaseOperations(MediumTest):
    """All tests in this class are marked as medium."""

    def test_insert(self, db):
        db.insert({"key": "value"})
        assert db.count() == 1

Test Pyramid¶

A healthy test suite follows the test pyramid pattern:

         /\
        /  \
       / L  \     5% Large/XLarge
      /------\
     /   M    \   15% Medium
    /----------\
   /     S      \ 80% Small
  /--------------\

This distribution optimizes for:

Fast feedback: Most tests run quickly
High confidence: Comprehensive coverage at unit level
Cost efficiency: Fewer expensive integration tests
Maintainability: Small tests are easier to maintain

See Distribution Validation for how the plugin enforces this distribution.