EDA for the Rest of Us: Event Design Patterns

The most important decision in event-driven architecture isn’t which technology to use—it’s what information to put in your events. Get this wrong, and you’ll end up with a system that’s more complex and tightly coupled than the monolith you were trying to escape.

Looking at successful and failed EDA implementations across companies like Netflix, Uber, and ING Bank¹, one thing becomes clear: event design patterns are what separate successful EDA implementations from expensive mistakes. The infrastructure will work regardless of whether you choose Kafka, EventBridge, or SQS. But the patterns you choose for generating and structuring events will determine whether your system becomes more maintainable or turns into a debugging nightmare.

Research shows that while EDA can provide tremendous benefits, “poor implementations of EDA can make both scalability and resiliency worse than before"². The difference often comes down to how events are designed and structured from the beginning.

Table of Contents

Why Event Design Patterns Matter

Every event you publish is a contract with your consumers. Choose the wrong pattern, and you’ll face:

  • Consumers making cascading API calls because events lack necessary context
  • Breaking changes that ripple through your entire system when you need to modify event structure
  • Performance bottlenecks when high-volume events contain too much data
  • Security issues when events expose more information than intended
  • Operational complexity from managing multiple inconsistent event formats

The good news? Most of these problems are avoidable with the right patterns applied consistently.

The Fat Events Anti-Pattern

Before exploring specific patterns, we need to address one of the most common mistakes in event design: Fat Events. These are events that contain data not owned by the domain generating the event.

The order service publishes an “order submitted” event but includes the customer’s credit score, loyalty tier, and current inventory levels. The order service doesn’t own any of this data—it belongs to the customer and inventory services respectively.

This creates several problems:

  • Data Ownership Violations: Services end up publishing data they don’t actually own or control, violating domain boundaries.
  • Coupling and Fragility: Changes to customer or inventory data structures now require coordinating changes across multiple services that shouldn’t be coupled.
  • Data Staleness: The customer’s loyalty tier might have changed after the order service cached it but before the event was published.
  • Security and Governance: Events expose data that consumers might not be authorized to access, creating compliance and security issues.

The solution is simple: only include data your domain owns. If you need to reference other domains, include identifiers or URLs that consumers can use to fetch that data from the authoritative source.

This principle applies regardless of which pattern you choose. Whether you’re using Event Carried State Transfer, Event Notification, or any other pattern, resist the temptation to include “helpful” data from other domains. It will cause more problems than it solves.

Pattern 1: Event Carried State Transfer

Event Carried State Transfer (ECST) is the pattern where events include state information in the payload, reducing or eliminating the need for consumers to make additional API calls. But there are two distinct approaches, each serving different architectural needs.

Note: The “Full Entity State” and “Contextual State” approaches described here are practical interpretations of Martin Fowler’s Event Carried State Transfer pattern. The terminology helps distinguish between different scopes of state transfer, though these aren’t formally defined terms in the literature.

Full Entity State Approach

This approach includes the complete current state of an entity in every event, regardless of what specifically changed. A “Customer Updated” event would include the complete customer record with all current values—profile information, preferences, subscription details, address, and metadata. This transforms events into complete snapshots that enable consumers to rebuild their entire view of the entity from any single event.

When this works well:

  • Entity lifecycle events (CRUD operations) where consumers maintain replicas
  • Small to medium entities that fit comfortably in message size limits
  • Scenarios where different consumers need different subsets of entity data
  • Analytics and reporting systems that need complete entity snapshots

This pattern works well for user profile synchronization across multiple microservices, where marketing, support, and billing systems can all consume the same events and extract the data relevant to their needs without additional API calls.

The tradeoffs:

  • Massive payloads: Complex entities can create very large events, increasing bandwidth and storage costs
  • Irrelevant data exposure: Consumers get data they don’t need, potentially creating security and compliance issues
  • Schema coupling: Changes to entity structure require coordinating updates across all consumers
  • Evolution complexity: Adding fields to entities means updating event schemas and potentially breaking consumers

Contextual State Approach

This approach includes only the state information relevant to understanding and processing the specific business event. An “Order Submitted” event might include order details, payment information, and shipping address (all relevant to fulfillment processing) but not the customer’s marketing preferences or account history.

When this shines:

  • Business process events with focused workflows
  • Large entities where complete state would create oversized messages
  • Data governance scenarios where complete state would expose unnecessary sensitive information
  • High-volume events where payload optimization matters

The tradeoffs:

  • Design complexity: Requires careful analysis of what context each consumer actually needs
  • Still larger than notifications: More expensive than lightweight notification events
  • Potential over-inclusion: Easy to include “just in case” data that creates unnecessary coupling
  • Consumer assumption risk: If you guess wrong about what consumers need, they still end up making API calls

Event Carried State Transfer with AWS Services

When implementing ECST on AWS, the choice of service depends on your specific requirements and constraints.

Amazon SQS is often the starting point for ECST implementations, it excels at point-to-point delivery where each message goes to exactly one consumer:

  • Message limits: 256 KB standard (extendable to 2 GB with S3)
  • Throughput: Nearly unlimited for standard queues, 3,000 msg/s for FIFO
  • Best for: Simple fan-out patterns without strict ordering requirements
    • Built-in dead letter queues for failed message handling
    • Automatic scaling without configuration
    • Long polling reduces API calls and costs

Amazon Kinesis Data Streams becomes the better choice for high-throughput scenarios, handling thousands of events per second with multiple consumers:

  • Capacity: 1 MB records, 1 MB/s per shard write, 2 MB/s read
  • Retention: 24 hours to 365 days with guaranteed ordering per shard
  • Advanced features: On-demand scaling up to 200 MB/s
    • Multiple consumers can replay the same events
    • Checkpointing for exactly-once processing
    • Integration with analytics services (Kinesis Analytics, EMR)

Amazon EventBridge shines when you need sophisticated routing and filtering capabilities across distributed systems:

  • Event handling: 256 KB limit, 10,000 events/s (soft limit)
  • Routing capabilities: Content-based filtering and transformation
  • Enterprise features: Cross-account/region delivery with compliance
    • Built-in schema registry with discovery
    • Archive and replay for debugging
    • Native integration with 20+ AWS services

For most ECST implementations, start with EventBridge + SQS: EventBridge handles the intelligent routing and filtering, while SQS provides reliable delivery to individual consumers. This combination gives you the flexibility to evolve your routing logic without changing consumer code. Move to Kinesis only when you hit throughput limits or need advanced streaming features like real-time analytics.

Pattern 2: Event Notification

Event Notification sends minimal information to notify that something happened, with consumers making additional API calls for details they need. A “Customer Profile Updated” event would include just the customer ID, timestamp, which fields changed, and links to fetch the complete customer data if needed.

When this works well:

  • High fan-out scenarios with many different consumers
  • Bandwidth or cost constraints
  • Varied consumer needs where only some require full details
  • Large state objects that would create oversized events
  • Security concerns where not all consumers should access complete data

This pattern works well for account balance changes in financial services, where compliance requires that balance queries always hit the authoritative source. Events become lightweight triggers that tell downstream systems “something changed with account X” without including the sensitive balance data.

The tradeoffs:

  • Additional complexity as consumers must implement API client logic
  • Higher latency from additional round-trips
  • Potential inconsistency if data changes between notification and fetch
  • API dependency creates coupling between event consumers and source services

Event Notification with AWS Services

When implementing Event Notification on AWS, choose based on your scale and routing requirements.

Amazon EventBridge is the recommended starting point for most event notification implementations:

  • Advanced routing: Rule-based filtering on any JSON attribute
  • Schema management: Automatic discovery and versioning
  • Built-in integrations: Direct delivery to 20+ AWS services
    • 7-day archive retention for replay
    • Event transformation before delivery
    • Dead letter queues for failed deliveries

Amazon SNS excels at massive scale notification scenarios:

  • Scale: Millions of messages/second, 12.5M subscriptions per topic
  • Multi-protocol: SQS, Lambda, HTTP, email, SMS from one topic
  • FIFO topics: 300 TPS (3,000 with batching) for ordered delivery
    • Message filtering reduces downstream processing
    • Fan-out to thousands of subscribers
    • Cross-region message replication

For most teams starting out, begin with EventBridge as your default choice. Move to SNS when you need massive scale or require its unique delivery protocols like email and SMS.

Pattern 3: Async Commands

Async Commands enable asynchronous task execution by sending command messages to queues for background processing. Unlike events which represent facts about the past, commands represent requests for future actions that need to be processed by specific handlers.

When this works well:

  • Task-based UIs where user actions map to specific business operations
  • Explicit intent modeling with validation and authorization before state changes
  • Asynchronous task execution that decouples requestors from executors
  • Complex business processes requiring orchestration

A “Process Payment” command would include the order ID, payment amount, currency, payment method reference, and who requested the processing. The command handler processes this request, validates business rules, and upon success emits events like PaymentProcessed or PaymentFailed.

The tradeoffs:

  • Additional complexity from command handling infrastructure
  • Tighter coupling than events as commands target specific handlers
  • More messaging overhead from command-event sequences
  • Need for robust error handling and retry mechanisms

Async Commands with AWS Services

When implementing Async Commands on AWS, SQS is the primary service for reliable command delivery.

Amazon SQS is purpose-built for command queues with reliable point-to-point delivery:

  • FIFO queues ensure commands are processed in order when required
  • Dead letter queues handle failed commands that exceed retry limits
  • Visibility timeouts prevent duplicate processing during command handling

For async command implementations, use SQS queues with your preferred command processing infrastructure. This provides reliable delivery, built-in error handling, and decouples command submission from processing.

Pattern 4: Change Data Capture (CDC)

CDC captures changes from existing databases and transforms them into events, enabling event-driven integration without modifying existing applications. When a customer updates their email address in the database, CDC would capture that change and publish an event showing the before and after values, along with metadata about the change.

When this works well:

  • Legacy system integration where you can’t modify existing applications
  • Database-first applications built around database operations
  • Real-time synchronization requirements
  • Audit and compliance needs for detailed change records

The tradeoffs:

  • High-change volume that can overwhelm downstream systems
  • Events may lack business context not available in database changes
  • Security concerns about exposing database-level changes
  • Additional operational complexity from CDC infrastructure

Change Data Capture with AWS Services

When implementing CDC on AWS, choose based on your database type and integration requirements.

Amazon DynamoDB Streams provides native CDC for DynamoDB tables:

  • Capture details: Real-time item changes with 24-hour retention
  • Throughput: 2 concurrent consumers per shard maximum
  • Integration: Direct Lambda triggers for serverless processing
    • 400 KB record size (same as DynamoDB item limit)
    • Guaranteed ordering per item
    • Exactly-once processing with Lambda

AWS Database Migration Service (DMS) enables CDC for relational databases:

  • Database support: MySQL, PostgreSQL, Oracle, SQL Server, and more
  • Performance: Varies by instance (dms.r5.large ~4,000 TPS)
  • Flexibility: Full load + CDC or CDC only operation modes
    • LOB columns up to 1 GB supported
    • Built-in data transformation
    • Multiple target endpoints (Kinesis, S3, RDS)

Amazon Kinesis Data Streams handles high-volume CDC event streaming:

  • High throughput: Process thousands of changes/second
  • Multiple consumers: Independent processing with replay capability
  • Direct integration: Receives CDC data from DMS
    • 1 MB record size limit
    • Guaranteed ordering per partition key
    • Up to 365 days retention for replay

For most CDC implementations, start with native database streams (DynamoDB Streams) when possible. Use DMS for relational databases that need CDC capabilities without application changes. As documented in AWS Prescriptive Guidance, combining CDC with event-driven patterns enables gradual modernization of legacy systems.

Choosing the Right Pattern

Your choice depends on several factors:

  • Data Requirements: How much data do consumers need, and how fresh must it be?
  • Performance Needs: What are your latency and throughput requirements?
  • Consistency Model: What consistency guarantees do you need?
  • Integration Constraints: Working with existing systems or building greenfield?
  • Team Skills: What patterns can your team effectively implement and maintain?

Most successful systems use multiple patterns simultaneously, choosing the most appropriate one for each type of data and use case.

Pattern Selection Decision Matrix

Pattern Event Size Coupling Latency Throughput Best Use Cases
Event Notification Small (< 1 KB) Lowest Higher (requires API calls) Highest • Many diverse consumers
• Security-sensitive data
• Bandwidth constraints
• Unknown consumer needs
ECST (Full) Large (10-256 KB) Medium Lower Medium • Data replication
• Analytics/reporting
• Offline processing
• Known consumer needs
ECST (Contextual) Medium (1-50 KB) Medium Lower Medium-High • Business workflows
• Specific use cases
• Performance critical
• Domain boundaries
Async Commands Small-Medium Higher Medium High • Task execution
• User actions
• Orchestration
• Explicit operations
CDC Variable Low Lowest Variable • Legacy integration
• Database sync
• Audit trails
• Real-time ETL

Key Tradeoffs Summary

Consideration Event Notification Event Carried State Transfer
Runtime Coupling Higher (API dependency) Lower (self-contained)
Schema Evolution Easier (minimal contract) Harder (full contract)
Network Efficiency Lower (multiple calls) Higher (single message)
Data Freshness Always current Point-in-time snapshot
Consumer Complexity Higher (API client needed) Lower (just parse event)
Publisher Complexity Lower (just notify) Higher (gather all data)

The foundation of a good Event-Driven Architecture

Event design patterns are the foundation that everything else in your event-driven architecture builds upon. Get these patterns right, and you’ll have events that tell clear stories about your business processes. Get them wrong, and you’ll spend more time debugging event flows than building features.

The goal isn’t to follow academic patterns perfectly—it’s to create events that make your system easier to understand, debug, and modify over time. When someone wakes up at 3 AM to debug a production issue, they should be able to read your event logs and understand the sequence of business activities that led to the problem.

Next week, we’ll explore communication patterns—how to choose between EventBridge, SNS, SQS, and Kafka for different scenarios, and when you still need request-response patterns in an event-driven world.

What event design challenges are you facing? Have you found patterns that work particularly well for your domain? I’d love to hear about your experiences.


References

Foundational Event-Driven Architecture Resources

  1. What do you mean by “Event-Driven”? - Martin Fowler’s foundational article defining event-driven architecture patterns (2017)

  2. Enterprise Integration Patterns: Messaging Patterns Overview - Gregor Hohpe and Bobby Woolf’s comprehensive messaging patterns catalog

  3. Domain-Driven Design - Eric Evans’ concepts including domain events and bounded contexts

AWS Official Documentation

  1. AWS Event-Driven Architecture - AWS’s official guide to building event-driven applications

  2. AWS Well-Architected Framework: Event-Driven Architectures - Best practices for serverless event-driven systems

  3. Building Event-Driven Architectures on AWS - AWS Prescriptive Guidance

AWS Service-Specific Documentation

  1. Amazon SQS Developer Guide - Including message size limits and best practices

  2. Amazon Kinesis Data Streams Developer Guide - Stream processing patterns and limits

  3. Amazon EventBridge User Guide - Event routing and schema registry documentation

  4. DynamoDB Streams Developer Guide - Change data capture for DynamoDB

  5. AWS Database Migration Service User Guide - CDC implementation for relational databases

Pattern Implementation Guides

  1. The Event-Carried State Transfer Pattern - Detailed implementation guide with examples

  2. Event Notification Pattern in Microservices - Chris Richardson’s microservices patterns

  3. CQRS (Command Query Responsibility Segregation) - Martin Fowler’s explanation of the CQRS pattern (relevant for Async Commands)

  4. Enriching Event-Driven Architectures with AWS Event Ruler - AWS Compute Blog on event filtering

Industry Case Studies

  1. Netflix: Scaling Event Sourcing for Millions of Devices - Netflix Technology Blog (2017)

  2. Uber: Real-Time Data Infrastructure - Uber Engineering’s event processing architecture (2021)

  3. Building Event-Driven Microservices at ING - InfoQ coverage of ING’s Kafka implementation (2018)

Academic and Research Papers

  1. Event-Driven Architecture: A Review of Definitions, Key Characteristics, and Research Challenges - IEEE International Conference on Distributed Event-Based Systems (2020)

  2. A Systematic Mapping Study on Microservices Architecture in DevOps - Journal of Systems and Software (2018)

Additional AWS Resources

  1. Best Practices for Implementing Event-Driven Architectures - AWS Architecture Blog

  2. Architectural Patterns for Real-time Analytics using Amazon Kinesis - AWS Big Data Blog

  3. Building Resilient Serverless Patterns by Combining Messaging Services - AWS Compute Blog

Books and Extended Resources

  1. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions - Hohpe & Woolf (ISBN: 978-0321200686)

  2. Building Event-Driven Microservices - Adam Bellemare, O’Reilly (2020)

  3. Domain-Driven Design: Tackling Complexity in the Heart of Software - Eric Evans (ISBN: 978-0321125217)

  4. Designing Event-Driven Systems - Ben Stopford, O’Reilly (2018) - Free ebook from Confluent

  5. Flow Architectures: The Future of Streaming and Event-Driven Integration - James Urquhart, O’Reilly (2021)

  6. Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing - Tyler Akidau, Slava Chernyak, Reuven Lax, O’Reilly (2018)

Tools and Specifications

  1. CloudEvents Specification - CNCF standard for describing event data

  2. AsyncAPI Specification - Industry standard for defining asynchronous APIs

  3. EventCatalog - Open source documentation tool for event-driven architectures