Test Patterns and Best Practices

This guide documents the specific test patterns, architectural decisions, and best practices used in the ADK Agents integration test suite.

Core Test Patterns

1. Agent Lifecycle Testing Pattern

The agent lifecycle pattern validates complete conversation turns with proper context management.

Pattern Structure

class TestAgentLifecycle:
    @pytest.fixture
    def context_manager(self):
        """Create context manager with proper mocking."""
        return ContextManager(
            model_name="test-model",
            max_llm_token_limit=100000,
            llm_client=create_mock_llm_client()
        )
    
    def test_complete_turn_execution(self, context_manager):
        """Test complete conversation turn with context updates."""
        # Arrange - Setup initial state
        turn_number = context_manager.start_new_turn("Test user message")
        
        # Act - Execute agent operations
        context_manager.add_code_snippet("test.py", "content", 1, 10)
        context_manager.add_tool_result("test_tool", {"result": "success"})
        
        # Assert - Verify expected outcomes
        context_dict, token_count = context_manager.assemble_context(10000)
        assert len(context_dict["conversation_history"]) == 1
        assert token_count > 0

Key Principles

Complete Turn Simulation - Test full conversation cycles
Context State Validation - Verify context is properly maintained
Token Management - Ensure token limits are respected
Multi-turn Correlation - Test conversation continuity

2. Workflow Orchestration Pattern

The workflow orchestration pattern tests all four workflow types with proper state management.

Pattern Structure

class TestWorkflowOrchestration:
    @pytest.mark.asyncio
    async def test_sequential_workflow_execution(self, mock_agents):
        """Test sequential workflow with proper dependency management."""
        # Arrange - Setup workflow steps
        workflow_steps = [
            {"agent": "analysis", "task": "analyze_code"},
            {"agent": "implementation", "task": "implement_fix"},
            {"agent": "testing", "task": "run_tests"}
        ]
        
        # Act - Execute workflow
        results = await execute_sequential_workflow(workflow_steps, mock_agents)
        
        # Assert - Verify execution order and results
        assert len(results) == 3
        assert all(r.success for r in results)
        assert results[0].start_time < results[1].start_time < results[2].start_time

Workflow Types

Sequential Workflows - Step-by-step execution with dependencies
Parallel Workflows - Concurrent execution where possible
Iterative Workflows - Repeated cycles with feedback loops
Human-in-Loop Workflows - Human approval and intervention points

3. Context Management Testing Pattern

The context management pattern validates advanced context features like smart prioritization and RAG integration.

Pattern Structure

class TestContextManagement:
    def test_smart_prioritization(self, prioritizer, test_data):
        """Test smart prioritization with relevance scoring."""
        # Arrange - Setup diverse content
        code_snippets = create_diverse_code_snippets()
        current_context = "Fix authentication security issues"
        
        # Act - Apply prioritization
        prioritized = prioritizer.prioritize_code_snippets(
            code_snippets, current_context, current_turn=5
        )
        
        # Assert - Verify prioritization logic
        assert len(prioritized) == len(code_snippets)
        assert prioritized[0]["_relevance_score"].final_score >= prioritized[-1]["_relevance_score"].final_score
        
        # Verify security-related content is prioritized
        security_items = [item for item in prioritized if "auth" in item["file_path"].lower()]
        assert len(security_items) > 0

Advanced Features

Smart Prioritization - Content relevance scoring
Cross-turn Correlation - Relationship detection across conversations
Intelligent Summarization - Context-aware content reduction
Dynamic Context Expansion - Automatic content discovery

4. Tool Orchestration Pattern

The tool orchestration pattern tests complex tool coordination with error handling and recovery.

Pattern Structure

class TestToolOrchestration:
    @pytest.mark.asyncio
    async def test_tool_dependency_management(self, orchestrator):
        """Test tool execution with proper dependency handling."""
        # Arrange - Setup dependent tools
        dependencies = [
            ("read_file", {"file_path": "test.py"}, []),
            ("analyze_code", {"file_path": "test.py"}, ["read_file_0"]),
            ("fix_issues", {"file_path": "test.py"}, ["analyze_code_1"])
        ]
        
        # Act - Execute with dependencies
        results = []
        for i, (tool, args, deps) in enumerate(dependencies):
            result = await orchestrator.execute_tool(tool, args, deps, f"{tool}_{i}")
            results.append(result)
        
        # Assert - Verify dependency order
        assert all(r.status == ToolExecutionStatus.COMPLETED for r in results)
        assert results[0].execution_time < results[1].execution_time

Error Handling Features

Automatic Recovery - Retry logic with exponential backoff
Fallback Strategies - Alternative approaches when primary fails
Error Classification - Different recovery strategies per error type
State Consistency - Proper cleanup on failures

5. Performance Testing Pattern

The performance testing pattern validates system behavior under load and stress conditions.

Pattern Structure

class TestPerformanceVerification:
    @pytest.fixture
    def performance_monitor(self):
        """Create performance monitor for metrics collection."""
        return PerformanceMonitor()
    
    @pytest.mark.asyncio
    async def test_load_testing_simulation(self, context_manager, performance_monitor):
        """Test system performance under concurrent load."""
        # Arrange - Setup load test parameters
        concurrent_users = 10
        operations_per_user = 50
        
        performance_monitor.start_monitoring()
        
        # Act - Simulate concurrent users
        async def simulate_user(user_id):
            for op in range(operations_per_user):
                # Simulate user operations
                context_manager.start_new_turn(f"User {user_id} operation {op}")
                # ... perform operations
                performance_monitor.record_operation(success=True)
        
        tasks = [simulate_user(i) for i in range(concurrent_users)]
        await asyncio.gather(*tasks)
        
        metrics = performance_monitor.stop_monitoring()
        
        # Assert - Verify performance thresholds
        assert metrics.success_rate >= 0.95
        assert metrics.throughput_ops_per_sec >= 100
        assert metrics.memory_usage_mb <= 1000

Performance Metrics

Throughput - Operations per second
Memory Usage - Real-time memory consumption
CPU Usage - Processor utilization
Response Time - Operation completion time

Advanced Testing Patterns

1. Mock Strategy Pattern

Comprehensive mocking strategy for external dependencies.

LLM Client Mocking

def create_mock_llm_client():
    """Create realistic LLM client mock."""
    client = AsyncMock()
    
    # Mock generate_content with realistic responses
    client.generate_content.return_value = AsyncMock(
        text="Mock LLM response",
        usage_metadata=AsyncMock(
            prompt_token_count=100,
            candidates_token_count=200,
            total_token_count=300
        )
    )
    
    # Mock count_tokens for token management
    client.count_tokens.return_value = AsyncMock(total_tokens=150)
    
    return client

Session State Mocking

def create_mock_session_state():
    """Create mock session state for multi-agent coordination."""
    return {
        "conversation_id": "test-conversation-123",
        "current_phase": "analysis",
        "shared_context": {},
        "agent_states": {
            "analysis_agent": {"status": "ready", "last_action": None},
            "implementation_agent": {"status": "waiting", "dependencies": ["analysis"]},
            "testing_agent": {"status": "waiting", "dependencies": ["implementation"]}
        },
        "workflow_history": [],
        "error_count": 0,
        "performance_metrics": {
            "total_tokens": 0,
            "execution_time": 0.0,
            "memory_usage": 0.0
        }
    }

2. Fixture Strategy Pattern

Reusable test components for consistent test setup.

Parameterized Fixtures

@pytest.fixture(params=[
    {"workflow_type": "sequential", "agent_count": 3},
    {"workflow_type": "parallel", "agent_count": 5},
    {"workflow_type": "iterative", "agent_count": 2},
    {"workflow_type": "human_in_loop", "agent_count": 4}
])
def workflow_config(request):
    """Parameterized workflow configuration."""
    return request.param

def test_workflow_execution(workflow_config, mock_agents):
    """Test with different workflow configurations."""
    # Test implementation adapts based on workflow_config
    pass

Scoped Fixtures

@pytest.fixture(scope="module")
def performance_test_data():
    """Module-scoped performance test data."""
    return create_performance_test_data()

@pytest.fixture(scope="function")
def isolated_context_manager():
    """Function-scoped context manager for isolation."""
    return ContextManager(test_mode=True)

3. Assertion Strategy Pattern

Comprehensive assertion patterns for different validation types.

Context Validation

def assert_context_consistency(context_dict, expected_components):
    """Assert context has expected components and structure."""
    assert isinstance(context_dict, dict)
    
    for component in expected_components:
        assert component in context_dict, f"Missing context component: {component}"
    
    if "conversation_history" in context_dict:
        assert len(context_dict["conversation_history"]) > 0
        for turn in context_dict["conversation_history"]:
            assert "turn_number" in turn
            assert "user_message" in turn or "agent_message" in turn
    
    if "tool_results" in context_dict:
        for result in context_dict["tool_results"]:
            assert "tool_name" in result
            assert "response" in result
            assert "summary" in result

Performance Validation

def assert_performance_thresholds(metrics, thresholds):
    """Assert performance metrics meet defined thresholds."""
    assert metrics.execution_time <= thresholds["max_execution_time"]
    assert metrics.memory_usage_mb <= thresholds["max_memory_mb"]
    assert metrics.success_rate >= thresholds["min_success_rate"]
    assert metrics.throughput_ops_per_sec >= thresholds["min_throughput"]

4. Data Generation Pattern

Realistic test data generation for comprehensive testing.

Content Generation

def create_diverse_code_snippets():
    """Create diverse code snippets for testing prioritization."""
    return [
        {
            "file_path": "src/auth.py",
            "content": "class AuthManager:\n    def authenticate(self, user, password):\n        return jwt.encode({...})",
            "language": "python",
            "complexity": "medium"
        },
        {
            "file_path": "src/config.py", 
            "content": "SECRET_KEY = 'hardcoded-secret-key'\nDEBUG = True",
            "language": "python",
            "complexity": "low"
        },
        {
            "file_path": "tests/test_auth.py",
            "content": "def test_authentication():\n    assert auth_manager.authenticate('user', 'pass')",
            "language": "python",
            "complexity": "low"
        }
    ]

Performance Test Data

def create_performance_test_data():
    """Create realistic performance test data."""
    return {
        "large_files": [generate_large_file_content(i) for i in range(100)],
        "complex_queries": [generate_complex_query(i) for i in range(50)],
        "tool_chains": [generate_tool_chain(i) for i in range(25)],
        "user_scenarios": [generate_user_scenario(i) for i in range(200)]
    }

Best Practices

1. Test Organization

File Structure

tests/
├── integration/
│   ├── test_agent_lifecycle.py
│   ├── test_context_management_advanced.py
│   ├── test_tool_orchestration_advanced.py
│   ├── test_performance_verification.py
│   └── run_integration_tests.py
├── fixtures/
│   ├── test_helpers.py
│   └── mock_data.py
└── conftest.py

Test Class Organization

class TestFeatureArea:
    """Test class for specific feature area."""
    
    # Fixtures specific to this test class
    @pytest.fixture
    def feature_setup(self):
        """Setup for feature-specific tests."""
        pass
    
    # Basic functionality tests
    def test_basic_functionality(self):
        """Test basic feature functionality."""
        pass
    
    # Edge case tests
    def test_edge_cases(self):
        """Test edge cases and boundary conditions."""
        pass
    
    # Error handling tests
    def test_error_handling(self):
        """Test error conditions and recovery."""
        pass
    
    # Performance tests
    @pytest.mark.performance
    def test_performance_characteristics(self):
        """Test performance under load."""
        pass

2. Test Naming Conventions

Descriptive Test Names

# Good: Describes what is being tested and expected outcome
def test_context_assembly_with_token_limit_respects_budget():
    """Test that context assembly respects token budget limits."""
    pass

def test_tool_orchestration_recovers_from_file_not_found_error():
    """Test tool orchestration handles file not found errors with retry."""
    pass

# Bad: Vague or unclear purpose
def test_context():
    pass

def test_tool_error():
    pass

Test Categories

# Mark tests with appropriate categories
@pytest.mark.unit
def test_basic_functionality():
    pass

@pytest.mark.integration
def test_component_interaction():
    pass

@pytest.mark.performance
def test_load_handling():
    pass

@pytest.mark.slow
def test_comprehensive_scenario():
    pass

3. Mocking Best Practices

Realistic Behavior

# Mock with realistic behavior patterns
def create_realistic_llm_mock():
    """Create LLM mock with realistic response patterns."""
    mock = AsyncMock()
    
    # Simulate response time variation
    async def mock_generate(*args, **kwargs):
        await asyncio.sleep(random.uniform(0.1, 0.5))  # Realistic response time
        return generate_realistic_response(*args, **kwargs)
    
    mock.generate_content.side_effect = mock_generate
    return mock

State Consistency

# Maintain consistent state across mock interactions
class StatefulMock:
    def __init__(self):
        self.state = {"calls": 0, "context": {}}
    
    async def mock_method(self, *args, **kwargs):
        self.state["calls"] += 1
        # Update state based on method calls
        return self.generate_response_based_on_state()

4. Error Testing Patterns

Comprehensive Error Scenarios

@pytest.mark.parametrize("error_type,expected_recovery", [
    ("file_not_found", "retry_with_alternatives"),
    ("permission_denied", "escalate_permissions"),
    ("timeout", "extend_timeout_and_retry"),
    ("rate_limit", "exponential_backoff"),
    ("network_error", "fallback_strategy")
])
def test_error_recovery_patterns(error_type, expected_recovery):
    """Test various error recovery patterns."""
    # Simulate specific error type
    # Verify expected recovery behavior
    pass

5. Performance Testing Patterns

Baseline and Regression Testing

def test_performance_regression():
    """Test that performance hasn't regressed."""
    baseline_metrics = load_baseline_metrics()
    current_metrics = measure_current_performance()
    
    # Allow for some variance but catch significant regressions
    assert current_metrics.execution_time <= baseline_metrics.execution_time * 1.1
    assert current_metrics.memory_usage <= baseline_metrics.memory_usage * 1.1

Resource Monitoring

def test_resource_usage_within_limits():
    """Test that resource usage stays within defined limits."""
    with ResourceMonitor() as monitor:
        # Execute test operations
        execute_test_scenario()
    
    # Verify resource usage
    assert monitor.peak_memory_mb <= 1000
    assert monitor.peak_cpu_percent <= 80
    assert monitor.open_file_descriptors <= 100

Conftest.py Fixture Patterns

1. Fixture Organization Pattern

The integration test suite uses a dedicated conftest.py file for integration-specific fixtures following Google ADK patterns.

Fixture Hierarchy

# tests/integration/conftest.py
import pytest
from tests.fixtures.test_helpers import create_mock_llm_client

# Session-scoped fixtures for expensive operations
@pytest.fixture(scope="session")
def performance_test_data():
    """Create test data once per session."""
    return create_performance_test_data()

# Function-scoped fixtures for isolated tests
@pytest.fixture(scope="function")
def mock_context_manager(mock_llm_client):
    """Create fresh context manager for each test."""
    return MockContextManager(
        model_name="test-model",
        max_llm_token_limit=100000,
        llm_client=mock_llm_client
    )

# Phase-specific fixtures combining multiple components
@pytest.fixture(scope="function")
def foundation_test_setup(mock_context_manager, mock_agent_pool, workflow_configs):
    """Complete setup for foundation phase tests."""
    return {
        "context_manager": mock_context_manager,
        "agent_pool": mock_agent_pool,
        "workflow_configs": workflow_configs
    }

2. Parametrized Fixture Pattern

Use parametrized fixtures to test multiple scenarios efficiently.

Workflow Scenario Testing

@pytest.fixture(params=[
    {"workflow_type": "sequential", "agent_count": 3},
    {"workflow_type": "parallel", "agent_count": 5},
    {"workflow_type": "iterative", "agent_count": 2},
    {"workflow_type": "human_in_loop", "agent_count": 4}
])
def workflow_scenario(request):
    """Test different workflow scenarios."""
    return request.param

# Usage in tests
def test_workflow_execution(workflow_scenario):
    # Automatically runs with all 4 parameter combinations
    assert workflow_scenario['workflow_type'] in VALID_WORKFLOW_TYPES
    assert workflow_scenario['agent_count'] > 0

Load Testing Scenarios

@pytest.fixture(params=[1, 5, 10, 25, 50])
def load_test_scenario(request):
    """Test different load levels."""
    return {
        "concurrent_users": request.param,
        "operations_per_user": 100,
        "expected_min_throughput": max(50, 500 / request.param)
    }

3. Conditional Fixture Pattern

Use conditional fixtures for environment-specific testing.

Environment-Specific Skipping

def pytest_runtest_setup(item):
    """Setup for each test run with conditional skipping."""
    # Skip performance tests on slow systems
    if "performance" in item.keywords:
        if os.environ.get("SKIP_PERFORMANCE_TESTS", "false").lower() == "true":
            pytest.skip("Performance tests skipped on slow systems")
    
    # Skip stress tests unless explicitly requested
    if "stress" in item.keywords:
        if os.environ.get("RUN_STRESS_TESTS", "false").lower() != "true":
            pytest.skip("Stress tests skipped unless explicitly requested")

4. Fixture Dependency Pattern

Create fixtures that depend on other fixtures for complex setups.

Layered Dependencies

@pytest.fixture(scope="function")
def mock_llm_client():
    """Base LLM client mock."""
    return create_mock_llm_client()

@pytest.fixture(scope="function")
def mock_context_manager(mock_llm_client):
    """Context manager that depends on LLM client."""
    return MockContextManager(llm_client=mock_llm_client)

@pytest.fixture(scope="function")
def mock_devops_agent(mock_context_manager, mock_llm_client):
    """Agent that depends on context manager and LLM client."""
    return MockDevOpsAgent(
        context_manager=mock_context_manager,
        llm_client=mock_llm_client
    )

@pytest.fixture(scope="function")
def complete_integration_setup(mock_devops_agent, mock_performance_monitor):
    """Complete setup depending on multiple components."""
    return {
        "agent": mock_devops_agent,
        "monitor": mock_performance_monitor,
        "ready": True
    }

5. Cleanup Fixture Pattern

Implement automatic cleanup to prevent test pollution.

Automatic Cleanup

@pytest.fixture(scope="function", autouse=True)
def cleanup_after_test():
    """Automatic cleanup after each test."""
    yield
    
    # Clean up any temporary files
    temp_files = ["test_temp_file.txt", "test_context.json"]
    for temp_file in temp_files:
        if os.path.exists(temp_file):
            try:
                os.remove(temp_file)
            except Exception as e:
                logger.warning(f"Failed to cleanup {temp_file}: {e}")

@pytest.fixture(scope="session", autouse=True)
def setup_integration_test_session():
    """Setup integration test session."""
    logger.info("Starting integration test session")
    
    # Create session info
    session_info = {
        "session_id": f"integration_test_session_{int(time.time())}",
        "start_time": time.time()
    }
    
    yield session_info
    
    # Session cleanup
    logger.info("Ending integration test session")

6. Metrics Collection Pattern

Use fixtures to collect test metrics and performance data.

Test Metrics Collection

@pytest.fixture(scope="function")
def test_metrics_collector():
    """Collect test metrics during execution."""
    class MetricsCollector:
        def __init__(self):
            self.metrics = []
            self.start_time = time.time()
        
        def record_metric(self, name: str, value: float, tags: Dict[str, str] = None):
            self.metrics.append({
                "name": name,
                "value": value,
                "timestamp": time.time(),
                "tags": tags or {}
            })
        
        def get_summary(self) -> Dict[str, Any]:
            return {
                "total_metrics": len(self.metrics),
                "execution_time": time.time() - self.start_time,
                "metrics": self.metrics
            }
    
    return MetricsCollector()

# Usage in tests
def test_with_metrics(test_metrics_collector):
    test_metrics_collector.record_metric("test_start", 1.0)
    # ... test operations ...
    test_metrics_collector.record_metric("test_end", 1.0)
    
    summary = test_metrics_collector.get_summary()
    assert summary["total_metrics"] == 2

7. Mock State Management Pattern

Maintain consistent state across mock interactions.

Stateful Mock Fixtures

@pytest.fixture(scope="function")
def stateful_mock_context():
    """Create stateful mock context that maintains consistency."""
    class StatefulMockContext:
        def __init__(self):
            self.code_snippets = []
            self.tool_results = []
            self.conversation_history = []
            self.turn_count = 0
        
        def add_code_snippet(self, file_path, content, start_line=1, end_line=None):
            snippet = {
                "file_path": file_path,
                "content": content,
                "start_line": start_line,
                "end_line": end_line or start_line + len(content.split('\n'))
            }
            self.code_snippets.append(snippet)
            return snippet
        
        def start_new_turn(self, message):
            self.turn_count += 1
            turn = {
                "turn_number": self.turn_count,
                "user_message": message,
                "timestamp": time.time()
            }
            self.conversation_history.append(turn)
            return self.turn_count
        
        def get_state(self):
            return {
                "code_snippets": len(self.code_snippets),
                "tool_results": len(self.tool_results),
                "conversation_history": len(self.conversation_history),
                "turn_count": self.turn_count
            }
    
    return StatefulMockContext()

8. Error Simulation Pattern

Create fixtures for testing error scenarios.

Error Simulation Configuration

@pytest.fixture(scope="function")
def error_simulation_config():
    """Configuration for error simulation tests."""
    return {
        "error_types": [
            "network_error",
            "timeout_error", 
            "rate_limit_error",
            "authentication_error",
            "validation_error"
        ],
        "error_probabilities": {
            "network_error": 0.1,
            "timeout_error": 0.05,
            "rate_limit_error": 0.02,
            "authentication_error": 0.01,
            "validation_error": 0.03
        },
        "recovery_strategies": {
            "network_error": "retry_with_backoff",
            "timeout_error": "extend_timeout",
            "rate_limit_error": "exponential_backoff",
            "authentication_error": "refresh_credentials",
            "validation_error": "validate_and_retry"
        }
    }

9. Performance Monitoring Pattern

Integrate performance monitoring into fixtures.

Performance Monitoring Integration

@pytest.fixture(scope="function")
def performance_monitor():
    """Monitor test performance metrics."""
    class PerformanceMonitor:
        def __init__(self):
            self.metrics = []
            self.start_time = None
        
        def start_monitoring(self):
            self.start_time = time.time()
        
        def stop_monitoring(self):
            return MagicMock(
                execution_time=time.time() - (self.start_time or time.time()),
                peak_memory_mb=100,
                operations_per_second=100
            )
        
        def record_operation(self, success=True):
            self.metrics.append({
                "success": success,
                "timestamp": time.time()
            })
    
    return PerformanceMonitor()

Testing Anti-Patterns to Avoid

1. Flaky Tests

# Bad: Time-dependent test that can fail randomly
def test_operation_timing():
    start = time.time()
    execute_operation()
    end = time.time()
    assert end - start < 1.0  # Flaky due to system variance

# Good: Test behavior, not exact timing
def test_operation_completes_within_reasonable_time():
    with timeout(5.0):  # Reasonable upper bound
        execute_operation()
    # Test succeeded if no timeout

2. Overly Complex Tests

# Bad: Testing too many things in one test
def test_everything():
    # Setup for multiple unrelated features
    # Test feature A
    # Test feature B  
    # Test feature C
    # Complex assertions mixing concerns

# Good: Focused tests
def test_specific_feature():
    # Setup for one feature
    # Test one specific aspect
    # Clear, focused assertions

3. Inadequate Mocking

# Bad: Over-mocking or under-mocking
def test_with_everything_mocked():
    # Mock everything including the system under test
    # Test becomes meaningless

# Good: Mock external dependencies only
def test_with_appropriate_mocks():
    # Mock external services, databases, APIs
    # Test real integration between internal components

Conclusion

These test patterns and practices provide a solid foundation for building reliable, maintainable, and comprehensive integration tests. By following these patterns, you can ensure your test suite effectively validates the complex interactions in your multi-agent system while remaining maintainable and providing clear feedback on system behavior.

Remember to:

Keep tests focused and independent
Use realistic mocking strategies
Validate both happy path and error conditions
Monitor performance and resource usage
Maintain clear documentation and naming conventions

For specific implementation examples, see the Integration Testing Guide and Performance Testing Guide.