EDA for the Rest of Us: Communication Patterns

This is part 3 of the “EDA for the Rest of Us” series. If you haven’t read A Guide for the Confused and Event Design Patterns, I recommend starting there as we’ll build on those concepts.

In my previous post, we explored event design patterns - how to structure the information in your events. Those design choices directly shape how your systems interact. Event Carried State Transfer enables autonomous processing but requires careful consideration of who needs that state. Event Notification keeps events lightweight but creates dependencies on API calls. The pattern you choose fundamentally influences how services communicate.

Today, we’re exploring how these events actually move through your system. The communication patterns you choose - Publish-Subscribe, Point-to-Point, Request-Reply, and Event Streaming - determine not just the mechanics of delivery, but the fundamental coupling and interaction models between your services.

Get the combination wrong, and you’ll create architectural friction: state transfer events sent point-to-point when multiple services need them, or notification events broadcast to services that can’t fetch the details they need. Get it right, and your event design and communication patterns reinforce each other, creating systems that are both loosely coupled and highly cohesive.

Table of Contents

Why Communication Patterns Are Critical

Communication patterns are the highways of your event-driven architecture. They determine:

  • Message Delivery Guarantees: Will every consumer definitely receive the message? In what order?
  • System Coupling: How tightly are publishers and consumers bound together?
  • Scalability Limits: Can you add new consumers without affecting existing ones?
  • Performance Characteristics: What’s the latency and throughput of your message flow?
  • Failure Modes: What happens when a consumer is offline or overwhelmed?

Get these patterns wrong, and you’ll face problems that are expensive to fix. Use request-reply for high-volume scenarios, and you’ll create bottlenecks as services wait for responses they don’t really need. Implement pub/sub without proper filtering, and services drown in irrelevant events, wasting resources processing messages they should never see. Pick the wrong streaming partition key, and related events scatter across partitions, breaking your ordering guarantees. Most painfully, these aren’t bugs you can just patch - they’re architectural decisions that require significant rework to fix.

The good news? Once you understand the core patterns and their tradeoffs, choosing the right one becomes much more straightforward.

How Event Design Meets Communication

The event design patterns from our previous post directly influence which communication patterns work best:

  • Event Notification → Often uses Publish-Subscribe for broad distribution
  • Event Carried State Transfer → May use Point-to-Point for targeted delivery or Streaming for analytics
  • Async Commands → Typically uses Point-to-Point queues
  • CDC Events → Usually flows through Streaming for ordered processing

Understanding these natural affinities helps you make coherent architectural decisions.

Pattern 1: Publish-Subscribe

Publish-Subscribe (pub/sub) is the pattern where publishers broadcast events to multiple consumers without knowing who they are. Think of it as a radio station: the DJ doesn’t know who’s listening, they just broadcast the signal and anyone with a radio can tune in.

The key characteristic of pub/sub is the decoupling between producers and consumers - producers don’t know (or care) who’s listening, and consumers can come and go without affecting producers. At its simplest, publishers send events to topics and consumers subscribe to receive all events on those topics. Modern implementations add filtering capabilities, letting consumers subscribe only to events matching specific criteria.

sequenceDiagram
    participant OS as Order Service
    participant B as Message Broker
    participant AS as Analytics Service
    participant NS as Notification Service
    participant IS as Inventory Service
    participant FS as Fraud Service

    OS->>B: Publish OrderPlaced Event
    Note over B: Event:<br/>{<br/>  "orderId": "123",<br/>  "customerId": "456",<br/>  "items": [...],<br/>  "total": 1299.99,<br/>  "shipping": "express"<br/>}

    Note over B: Topic subscribers get all events<br/>Rule-based subscribers get filtered events

    par Parallel Delivery
        B-->>AS: All order events
        Note over AS: Update metrics
    and
        B-->>NS: All order events
        Note over NS: Send confirmation
    and
        B-->>IS: All order events
        Note over IS: Reserve items
    and
        B-->>FS: Only if total > 500
        Note over FS: Run fraud checks
    end

When pub/sub shines:

  • Multiple services need to react to the same business event
  • Broadcasting Event Notifications (from my previous post) to many consumers
  • Event Carried State Transfer where multiple services need the same state snapshot
  • Broadcasting updates or notifications across your system

I’ve used pub/sub successfully in order processing where analytics needs every order for metrics, notifications needs them for customer communication, and fraud detection only cares about high-value orders. Each service subscribes according to its needs - some to the entire stream, others with specific filters.

The tradeoffs:

  • No delivery guarantees to specific consumers: You know events are delivered, but not to whom
  • Filter complexity: Sophisticated routing rules can become hard to debug
  • Schema coupling: All consumers depend on the event structure
  • Performance overhead: More consumers mean more message deliveries

Publish-Subscribe with AWS Services

AWS offers several services for implementing pub/sub, each optimized for different scenarios:

Amazon EventBridge provides advanced content-based routing with rich filtering:

  • Filtering: Complex rules with pattern matching, numeric comparisons, and IP address matching
  • Throughput: 10,000 events/second (soft limit, can be increased)
  • Targets: 20+ AWS service integrations plus Lambda, SQS, and API destinations
  • Features: Schema registry, event replay, archive, and event discovery
    • Event transformation before delivery
    • Cross-account and cross-region routing
    • Built-in error handling and retry logic
    • Partner integrations (Salesforce, Zendesk, PagerDuty)

Amazon SNS is the classic pub/sub service, perfect for simple topic-based broadcasting:

  • Scale: Handles millions of messages per second
  • Delivery: Push-based to SQS, Lambda, HTTP endpoints, email, SMS
  • Filtering: Basic attribute-based message filtering
  • Fan-out: Up to 12.5 million subscriptions per topic
    • FIFO topics for ordered delivery (300 TPS, up to 3,000 with batching)
    • Standard topics for maximum throughput
    • Dead letter queues for failed deliveries
    • Cross-region replication for global systems

For most pub/sub implementations, start with EventBridge. It provides the best balance of features, filtering capabilities, and AWS service integrations. Use SNS when you need simple topic-based fan-out at massive scale or non-AWS endpoint delivery (email, SMS).

Pattern 2: Point-to-Point

Point-to-Point ensures each message is processed by exactly one consumer. Unlike pub/sub where messages are broadcast, point-to-point creates a direct channel between producers and consumers. Think of it as registered mail - one sender, one recipient, with confirmation of delivery.

The defining characteristic is exclusivity: once a consumer takes a message, it’s gone from the queue. This pattern supports two delivery modes: unordered for maximum throughput where multiple workers process messages in parallel, or strictly ordered (FIFO) when sequence matters for business correctness.

sequenceDiagram
    participant P as Producer
    participant Q as Message Queue
    participant W1 as Worker 1
    participant W2 as Worker 2
    participant DB as Database

    Note over Q: Standard Queue Example
    P->>Q: Image Resize Task A
    P->>Q: Image Resize Task B
    P->>Q: Image Resize Task C

    par Parallel Processing
        W1->>Q: Poll for message
        Q-->>W1: Task A
        W1->>W1: Process image
    and
        W2->>Q: Poll for message
        Q-->>W2: Task B
        W2->>W2: Process image
    end

    Note over Q: FIFO Queue Example
    P->>Q: Deposit $1000 (seq: 1)
    P->>Q: Withdraw $500 (seq: 2)

    W1->>Q: Receive next message
    Q-->>W1: Deposit $1000
    W1->>DB: Update balance: +$1000

    W1->>Q: Receive next message
    Q-->>W1: Withdraw $500
    W1->>DB: Update balance: -$500

    Note over Q: Order preserved for<br/>business correctness

When point-to-point shines:

  • Distributing work across multiple processors (image processing, report generation)
  • Ensuring exactly-once processing for critical operations
  • Decoupling producers from consumers with reliable delivery
  • Sequential processing for operations that must happen in order (financial transactions, state machines)

I’ve used standard queues successfully for image processing workloads where users upload photos needing resizing and optimization. Multiple workers pull from the same queue, automatically scaling based on backlog. For financial systems, FIFO queues ensure transaction ordering - deposits, withdrawals, and transfers execute in the exact sequence they were submitted.

The tradeoffs:

  • No broadcasting: Each message goes to exactly one consumer
  • Consumer coupling: Producers must know which queue to use
  • Scaling limits: FIFO queues sacrifice throughput for ordering guarantees
  • Complexity: Managing visibility timeouts, message acknowledgments, and failure handling

Point-to-Point with AWS Services

Amazon SQS is the native AWS service for point-to-point messaging:

Standard Queues offer maximum throughput with best-effort ordering:

  • Throughput: Nearly unlimited TPS
  • Delivery: At-least-once with potential duplicates
  • Features:
    • Message size up to 256 KB (2 GB with S3)
    • Retention up to 14 days
    • Long polling for cost efficiency
    • Dead letter queues for poison messages
    • Visibility timeout up to 12 hours
    • Delay queues up to 15 minutes

FIFO Queues guarantee ordered, exactly-once processing:

  • Throughput: 300 TPS (3,000 with batching)
  • Ordering: Strict FIFO within message groups
  • Delivery: Exactly-once processing with deduplication
  • Features:
    • Message group IDs for parallel ordered streams
    • Deduplication IDs prevent duplicate sends
    • All standard queue features
    • Content-based deduplication

For new implementations, start with SQS standard queues and only enable FIFO if you need strict ordering. The throughput difference is significant, and many use cases that seem to need ordering can work with eventual consistency.

Pattern 3: Request-Reply

Request-Reply adds synchronous-style communication to asynchronous messaging. A service sends a request message and expects a specific reply, maintaining the benefits of decoupling while enabling two-way communication. It’s like sending a letter with a return envelope - asynchronous delivery but with an expected response.

graph TB
    subgraph "Client Side"
        C[Client Service]
        RQ[Reply Queue]
    end

    subgraph "Message Infrastructure"
        REQ[Request Queue]
    end

    subgraph "Server Side"
        S[Server Service]
    end

    C -->|"1: Request + ReplyTo"| REQ
    REQ -->|"2: Process Request"| S
    S -->|"3: Send Reply"| RQ
    RQ -->|"4: Correlate Response"| C

    style REQ fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style RQ fill:#dcfce7,stroke:#16a34a,stroke-width:2px

Correlation-Based Reply

This approach uses correlation IDs to match requests with replies, allowing multiple outstanding requests without confusion.

sequenceDiagram
    participant CS as Client Service
    participant RQ as Request Queue
    participant RQR as Reply Queue
    participant PS as Processing Service

    Note over CS: Generate correlationId: abc123
    CS->>RQ: Request {<br/>  correlationId: "abc123",<br/>  replyTo: "reply-queue",<br/>  action: "calculatePrice",<br/>  data: {...}<br/>}

    Note over CS: Generate correlationId: xyz789
    CS->>RQ: Request {<br/>  correlationId: "xyz789",<br/>  replyTo: "reply-queue",<br/>  action: "checkInventory",<br/>  data: {...}<br/>}

    PS->>RQ: Poll for messages
    RQ-->>PS: Request (correlationId: abc123)
    PS->>PS: Process calculatePrice

    PS->>RQR: Reply {<br/>  correlationId: "abc123",<br/>  result: { price: 99.99 }<br/>}

    PS->>RQ: Poll for messages
    RQ-->>PS: Request (correlationId: xyz789)
    PS->>PS: Process checkInventory

    PS->>RQR: Reply {<br/>  correlationId: "xyz789",<br/>  result: { available: true }<br/>}

    CS->>RQR: Poll for replies
    RQR-->>CS: Reply (correlationId: xyz789)
    CS->>CS: Match xyz789 to request

    CS->>RQR: Poll for replies
    RQR-->>CS: Reply (correlationId: abc123)
    CS->>CS: Match abc123 to request

When correlation-based reply works well:

  • Multiple concurrent requests from the same client
  • Stateless processing services
  • Scenarios where reply order doesn’t matter
  • Integration with legacy systems expecting request-reply

I’ve used this pattern successfully for price calculation services where the frontend could request prices for multiple products simultaneously. Each request had a unique correlation ID, and responses could return in any order without confusion.

The challenges:

  • State management: Clients must track outstanding requests
  • Timeout handling: Must implement request expiration
  • Reply queue management: Each client needs a way to receive replies
  • Orphaned replies: Replies might arrive after client gives up

Callback-Based Reply

Instead of correlation IDs, this pattern uses callback endpoints or queues specified in each request.

sequenceDiagram
    participant MS as Microservice
    participant EB as EventBridge
    participant LS as Lambda Service
    participant CQ as Callback Queue

    MS->>EB: Request Event {<br/>  action: "processDocument",<br/>  callbackQueue: "team-a-results",<br/>  documentId: "doc-123"<br/>}

    EB-->>LS: Route to processor
    LS->>LS: Process document

    LS->>CQ: Send to specified callback {<br/>  documentId: "doc-123",<br/>  status: "completed",<br/>  result: {...}<br/>}

    MS->>CQ: Poll callback queue
    CQ-->>MS: Receive result

    Note over MS: Different teams can use<br/>different callback queues

When callback-based reply shines:

  • Multi-tenant systems where each tenant has separate infrastructure
  • Scenarios with different reply handling requirements
  • Systems where reply destination varies by request type
  • Webhook-style integrations

The tradeoffs:

  • Security complexity: Must validate callback destinations
  • Infrastructure overhead: Multiple reply queues to manage
  • Error handling: Failed callbacks need retry logic
  • Discovery challenges: Services must know valid callback endpoints

Request-Reply with AWS Services

AWS doesn’t offer a dedicated request-reply service. Both patterns use the same core services but configure them differently:

Core Services:

  • SQS: Request queues and (for correlation-based) reply queues
  • DynamoDB or ElastiCache: Store request state and correlations
  • Your choice of compute: Lambda, ECS, or EC2 for processing

Pattern Differences:

  • Correlation-Based: Uses fixed reply queue(s) where consumers poll and match correlation IDs
  • Callback-Based: Each request specifies its reply destination (queue, URL, or topic)

When you might need additional services:

  • Step Functions: Handles request-reply natively with waitForTaskToken
  • API Gateway: For HTTP callback endpoints
  • EventBridge: If routing requests based on content

Key Considerations:

  • Reply queues should use long polling (20 seconds) to reduce costs
  • Set DynamoDB TTL to automatically clean up old correlations
  • Visibility timeout must exceed your processing time
  • Always implement request timeouts
  • Consider Step Functions for complex workflows despite higher cost

Most teams start with SQS + DynamoDB for simple request-reply, then evaluate Step Functions if they need complex orchestration or visual workflow tracking.

Pattern 4: Event Streaming

Event streaming maintains events in an ordered, replayable log that multiple consumers can read independently. Unlike queues where messages are deleted after processing, streams preserve events for a retention period, enabling replay and multiple processing patterns.

graph TB
    subgraph "Event Producers"
        P1[Order Service]
        P2[User Service]
        P3[Inventory Service]
    end

    subgraph "Event Stream"
        subgraph "Partition 0"
            E1[Event 1]
            E2[Event 2]
            E3[Event 3]
        end
        subgraph "Partition 1"
            E4[Event 4]
            E5[Event 5]
            E6[Event 6]
        end
        subgraph "Partition 2"
            E7[Event 7]
            E8[Event 8]
            E9[Event 9]
        end
    end

    subgraph "Consumers"
        subgraph "Real-time Analytics"
            C1[Consumer at Event 9]
        end
        subgraph "Batch Processing"
            C2[Consumer at Event 5]
        end
        subgraph "New ML Model"
            C3[Consumer at Event 1]
        end
    end

    P1 --> E3
    P1 --> E6
    P1 --> E9
    P2 --> E1
    P2 --> E4
    P2 --> E7
    P3 --> E2
    P3 --> E5
    P3 --> E8

    E9 -.->|"Latest"| C1
    E5 -.->|"Processing"| C2
    E1 -.->|"Replay from start"| C3

    style E1 fill:#f3f4f6
    style E2 fill:#f3f4f6
    style E3 fill:#f3f4f6
    style E4 fill:#e5e7eb
    style E5 fill:#e5e7eb
    style E6 fill:#d1d5db
    style E7 fill:#9ca3af
    style E8 fill:#6b7280
    style E9 fill:#374151

Key characteristics:

  • Append-only log: Events are immutable once written
  • Partitioned for scale: Events distributed across shards based on partition key
  • Independent consumers: Each tracks their own position in the stream
  • Ordered within partition: Events maintain sequence per partition
  • Replayable: Consumers can reprocess from any point in the retention window
sequenceDiagram
    participant PS as Payment Service
    participant K as Kinesis Stream
    participant RT as Real-time Fraud Detection
    participant BA as Analytics (Batch)
    participant ML as ML Training

    loop Continuous Event Flow
        PS->>K: PaymentProcessed Event
        Note over K: Append to shard<br/>Sequence: 1000
    end

    Note over RT: Reading from latest
    K-->>RT: Events 998, 999, 1000
    RT->>RT: Check for fraud patterns

    Note over BA: Reading hourly batches
    K-->>BA: Events 500-1000
    BA->>BA: Aggregate metrics

    Note over ML: Replaying from beginning
    K-->>ML: Events 1-1000
    ML->>ML: Train new model

    Note over K: Each consumer maintains<br/>independent position

When event streaming shines:

  • Multiple consumers need events at different paces (real-time, batch, replay)
  • Event sourcing or audit requirements
  • Real-time analytics and monitoring
  • Change data capture (CDC) patterns
  • Need to reprocess historical data for new features or bug fixes

In a payments platform I worked on, we used streaming to feed multiple systems: real-time fraud detection read the latest events, hourly batch jobs computed business metrics, and ML teams could replay the entire history to train new models. Each consumer moved at its own pace without affecting others.

The tradeoffs:

  • Higher complexity: Managing consumer positions and shard assignments
  • Partition key critical: Poor choice creates hot shards and limits scaling
  • Cost: Pay for provisioned capacity (shards) not just usage
  • No built-in filtering: Consumers process all events in their shards
  • Resharding disruption: Scaling shards can temporarily affect ordering

Event Streaming with AWS Services

Amazon Kinesis Data Streams is the primary AWS streaming service:

  • Throughput: 1 MB/sec or 1,000 records/sec per shard
  • Retention: 24 hours to 365 days
  • Ordering: Guaranteed within shard
  • Scaling: Manual or on-demand auto-scaling
  • Consumer options:
    • Kinesis Client Library (KCL) for at-least-once processing
    • Enhanced fan-out for dedicated throughput per consumer
    • Lambda integration for serverless processing
    • Kinesis Analytics for SQL-based stream processing
    • Kinesis Data Firehose for loading to S3/Redshift

DynamoDB Streams for database change streams:

  • Integration: Built into DynamoDB
  • Ordering: Per item (partition key)
  • Retention: 24 hours
  • Use cases: Triggering Lambda functions on data changes

When to consider alternatives:

  • SQS FIFO: For simple ordered processing without replay needs (300 TPS limit)
  • Amazon MSK: Only if you need Kafka compatibility or specific Kafka features
  • MSK Serverless: For variable Kafka workloads without operational overhead

Choosing the right approach:

  • Kinesis Data Streams: Default choice for event streaming on AWS
  • DynamoDB Streams: When streaming DynamoDB changes
  • SQS FIFO: When you need ordering but not replay (simpler, lower cost)
  • MSK: Only for Kafka migration or Kafka-specific features

For most streaming use cases, start with Kinesis Data Streams. It provides the best balance of features, AWS integration, and operational simplicity.

Choosing the Right Pattern

Selecting the right communication pattern isn’t about finding the “best” pattern - it’s about matching patterns to your specific requirements. Here’s a framework to guide your decision:

Pattern Selection Framework

graph TD
    Start[Message Delivery Need]

    Start --> Q0{Need synchronous<br/>response?}

    Q0 -->|Yes| RR[Request-Reply<br/>→ SQS + State Store]
    Q0 -->|No| Q1{Do multiple consumers<br/>need the same message?}

    Q1 -->|Yes| Q2{Need replay<br/>capability?}
    Q1 -->|No| Q5{Need strict<br/>ordering?}

    Q2 -->|Yes| ES[Event Streaming<br/>→ Kinesis]
    Q2 -->|No| Q4{Need content-based<br/>filtering?}

    Q4 -->|Yes| CB[Publish-Subscribe<br/>→ EventBridge]
    Q4 -->|No| TB[Publish-Subscribe<br/>→ SNS]

    Q5 -->|Yes| PTP_FIFO[Point-to-Point<br/>→ SQS FIFO]
    Q5 -->|No| PTP_STD[Point-to-Point<br/>→ SQS Standard]

    style CB fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style TB fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style ES fill:#f3e8ff,stroke:#9333ea,stroke-width:2px
    style RR fill:#dcfce7,stroke:#16a34a,stroke-width:2px
    style PTP_FIFO fill:#fef3c7,stroke:#f59e0b,stroke-width:2px
    style PTP_STD fill:#fef3c7,stroke:#f59e0b,stroke-width:2px

Key Decision Factors

Factor Question Answer → Pattern
Consumer Multiplicity Do multiple services need the same event? • Single consumer → Point-to-Point
• Multiple consumers → Publish-Subscribe or Streaming
Ordering Requirements Must events be processed in sequence? • No ordering needed → Standard queues or topics
• Strict ordering → FIFO queues or partitioned streams
Replay Needs Will you need to reprocess historical events? • Never → Traditional messaging
• Sometimes → Event streaming
Latency Tolerance How quickly must events be processed? • Real-time → Direct messaging patterns
• Can batch → Streaming with batch consumers
Volume and Throughput How many events per second? • < 1,000/sec → Any pattern works
• > 10,000/sec → Consider streaming

Common Anti-Patterns to Avoid

1. Request-Reply Over Pub/Sub

graph LR
    A[Service A] -->|"Request"| T[Topic]
    T --> B[Service B]
    T --> C[Service C]
    T --> D[Service D]
    B -->|"Reply?"| T2[Reply Topic]

    style T fill:#ffcccc,stroke:#ff0000,stroke-width:2px

    Note[❌ Multiple services might reply<br/>❌ No correlation guarantee<br/>❌ Timing issues]

The Smell:

  • Publishing “GetUserDetails” or “CalculatePrice” events (these are commands, not events)
  • Consumers publishing to generic “reply” topics
  • Timeouts waiting for responses that may never come
  • Complex correlation logic trying to match requests to replies

Real Example:

# ❌ BAD: Trying to do request-reply via SNS
sns.publish(
    TopicArn='user-requests',
    Message=json.dumps({
        'action': 'getUserProfile',
        'userId': '123',
        'replyTopic': 'user-replies'
    })
)
# Now what? Wait for a reply? From whom? For how long?

Use Instead:

  • For true request-reply: Use SQS with correlation IDs or Step Functions
  • For commands: Use point-to-point queues (SQS)
  • For notifications: Use proper events like “UserProfileUpdated”

2. Point-to-Point for Fan-Out

graph LR
    P[Publisher] -->|"Copy 1"| Q1[Queue 1]
    P -->|"Copy 2"| Q2[Queue 2]
    P -->|"Copy 3"| Q3[Queue 3]

    Q1 --> C1[Consumer 1]
    Q2 --> C2[Consumer 2]
    Q3 --> C3[Consumer 3]

    style P fill:#ffcccc,stroke:#ff0000,stroke-width:2px

    Note[❌ Publisher knows all consumers<br/>❌ Adding consumers requires code changes<br/>❌ Tight coupling]

The Smell:

  • Publisher code with hardcoded queue URLs for each consumer
  • Loops sending the same message to multiple queues
  • Publisher deployments needed when adding new consumers
  • “Failed to send to queue X” errors when consumers change

Real Example:

# ❌ BAD: Manual fan-out creating tight coupling
order_event = create_order_event(order)

# Publisher knows about every consumer!
sqs.send_message(QueueUrl=INVENTORY_QUEUE, MessageBody=order_event)
sqs.send_message(QueueUrl=SHIPPING_QUEUE, MessageBody=order_event)
sqs.send_message(QueueUrl=ANALYTICS_QUEUE, MessageBody=order_event)
sqs.send_message(QueueUrl=EMAIL_QUEUE, MessageBody=order_event)
# Adding new consumer = code change + deployment

Use Instead:

  • SNS for simple topic-based fan-out
  • EventBridge for content-based routing
  • Let consumers subscribe themselves to topics/rules

3. Streaming for Simple Queuing

graph LR
    P[Producer] -->|"Simple tasks"| K[Kinesis Stream]
    K --> C[Consumer]

    style K fill:#ffcccc,stroke:#ff0000,stroke-width:2px

    Note[❌ Overkill for simple work distribution<br/>❌ Unnecessary complexity<br/>❌ Higher costs]

The Smell:

  • Single consumer reading from a Kinesis stream
  • No need for replay or ordering
  • Resharding complexity for simple workloads
  • Paying for provisioned shards with low utilization

Real Example:

# ❌ BAD: Using Kinesis for simple image resize tasks
kinesis.put_record(
    StreamName='image-resize-stream',
    Data=json.dumps({
        'imageId': '123',
        'size': 'thumbnail'
    }),
    PartitionKey='image-123'  # Why do we need partitioning?
)

# Consumer complexity for no benefit
# KCL setup, checkpointing, shard management...

Use Instead:

  • SQS Standard queue for simple work distribution
  • SQS FIFO if you need ordering (still simpler than Kinesis)
  • Reserve Kinesis for true streaming needs: replay, multiple consumers, high throughput analytics

Combining Patterns for Real-World Systems

Most production systems combine multiple patterns. Here’s a real-world example from an e-commerce platform:

graph TB
    subgraph "Order Processing"
        API[API Gateway]
        OS[Order Service]
        API -->|"Request-Reply"| OS
    end

    subgraph "Event Distribution"
        EB[EventBridge]
        KIN[Kinesis Data Stream]
        OS -->|"Ordered events"| KIN
        OS -->|"Business events"| EB
    end

    subgraph "Immediate Processing"
        SQS1[Payment Queue]
        SQS2[Inventory Queue]
        EB -->|"Route by rules"| SQS1
        EB -->|"Route by rules"| SQS2
        PS[Payment Service]
        IS[Inventory Service]
        SQS1 -->|"Process"| PS
        SQS2 -->|"Process"| IS
    end

    subgraph "Stream Processing"
        KCL[Analytics Consumer]
        ML[ML Consumer]
        RT[Real-time Dashboard]
        KIN -->|"Ordered stream"| KCL
        KIN -->|"Ordered stream"| ML
        KCL -->|"Aggregated data"| RT
    end

    subgraph "Notifications"
        SNS[SNS Topic]
        EB -->|"High-value orders"| SNS
        EMAIL[Email]
        SMS[SMS]
        SNS -->|"Fan-out"| EMAIL
        SNS -->|"Fan-out"| SMS
    end

    style API fill:#f3f4f6,stroke:#374151,stroke-width:2px
    style EB fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style KIN fill:#f3e8ff,stroke:#9333ea,stroke-width:2px
    style SNS fill:#dcfce7,stroke:#16a34a,stroke-width:2px

This architecture combines:

  • Request-Reply for synchronous API responses
  • Content-based Pub/Sub for intelligent event routing
  • Point-to-Point queues for reliable task processing
  • Event Streaming for analytics and machine learning
  • Topic-based Pub/Sub for notifications

Each pattern serves a specific purpose, and together they create a robust, scalable system.

The Path Forward

Communication patterns are the nervous system of your event-driven architecture. Choose the wrong patterns, and you’ll fight against your architecture every day. Choose the right ones, and your system will feel natural and intuitive to work with.

The key lessons from years of building these systems:

  1. Start simple: Begin with the simplest pattern that meets your needs. You can always add complexity later.
  2. Design for evolution: Your communication needs will change. Design with clear boundaries so you can swap patterns without rewriting everything.
  3. Monitor everything: Track message flow, processing times, and error rates. You can’t improve what you can’t measure.
  4. Test failure modes: Every pattern fails differently. Understand and test these failures before they happen in production.

Remember: there’s no perfect pattern, only patterns that fit your specific context. The best architects aren’t those who memorize patterns, but those who understand tradeoffs and can match patterns to problems.

Next week, we’ll explore consumption patterns - how to process events at scale without overwhelming your services. We’ll cover effective filtering strategies, handling backpressure when producers outpace consumers, and scaling approaches that work with Lambda, SQS, and Kinesis. No complex infrastructure required - just practical patterns that keep your event processing running smoothly.

What communication patterns have worked well in your systems? What combinations have you found effective? What patterns have caused you pain? I’d love to hear your experiences and learn from your journey.


References

Core Communication Pattern Resources

  1. Enterprise Integration Patterns - Gregor Hohpe and Bobby Woolf’s foundational patterns

  2. Messaging Patterns Overview - Complete catalog of messaging patterns

Publish-Subscribe Pattern Resources

  1. Publish-Subscribe Channel Pattern - Enterprise Integration Patterns

  2. Amazon SNS Developer Guide - Topic-based pub/sub on AWS

  3. Amazon EventBridge User Guide - Content-based routing and filtering

Point-to-Point Pattern Resources

  1. Point-to-Point Channel Pattern - Enterprise Integration Patterns

  2. Amazon SQS Developer Guide - Queue-based messaging on AWS

  3. Amazon SQS FIFO Queues - Ordered message processing

Request-Reply Pattern Resources

  1. Request-Reply Pattern - Enterprise Integration Patterns

  2. Correlation Identifier Pattern - Matching requests with replies

  3. Implementing Request-Response Pattern - AWS implementation guide

  4. AWS Step Functions Callback Pattern - Callback tasks with SQS

Event Streaming Resources

  1. Amazon Kinesis Data Streams Developer Guide - Stream processing on AWS

  2. Amazon MSK Developer Guide - Managed Kafka on AWS

  3. DynamoDB Streams Developer Guide - Change streaming for DynamoDB

Pattern Selection and Architecture

  1. Choosing the Right Messaging Service - AWS Builder’s Library

  2. EventBridge vs SNS vs SQS - AWS comparison guide

  3. Messaging Anti-Patterns - Common mistakes to avoid

  4. Amazon SQS vs Amazon Kinesis - Detailed comparison

Books and Extended Reading

  1. Enterprise Integration Patterns - Hohpe & Woolf (ISBN: 978-0321200686)