EDA for the Rest of Us: Communication Patterns

This is part 3 of the “EDA for the Rest of Us” series. If you haven’t read A Guide for the Confused and Event Design Patterns, I recommend starting there as we’ll build on those concepts.

In my previous post, we explored event design patterns - how to structure the information in your events. Those design choices directly shape how your systems interact. Event Carried State Transfer enables autonomous processing but requires careful consideration of who needs that state. Event Notification keeps events lightweight but creates dependencies on API calls. The pattern you choose fundamentally influences how services communicate.

Today, we’re exploring how these events actually move through your system. The communication patterns you choose - Publish-Subscribe, Point-to-Point, Request-Reply, and Event Streaming - determine not just the mechanics of delivery, but the fundamental coupling and interaction models between your services.

Get the combination wrong, and you’ll create architectural friction: state transfer events sent point-to-point when multiple services need them, or notification events broadcast to services that can’t fetch the details they need. Get it right, and your event design and communication patterns reinforce each other, creating systems that are both loosely coupled and highly cohesive.

Table of Contents

Why Communication Patterns Are Critical

Communication patterns are the highways of your event-driven architecture. They determine:

  • Message Delivery Guarantees: Will every consumer definitely receive the message? In what order?
  • System Coupling: How tightly are publishers and consumers bound together?
  • Scalability Limits: Can you add new consumers without affecting existing ones?
  • Performance Characteristics: What’s the latency and throughput of your message flow?
  • Failure Modes: What happens when a consumer is offline or overwhelmed?

Get these patterns wrong, and you’ll face problems that are expensive to fix. Use request-reply for high-volume scenarios, and you’ll create bottlenecks as services wait for responses they don’t really need. Implement pub/sub without proper filtering, and services drown in irrelevant events, wasting resources processing messages they should never see. Pick the wrong streaming partition key, and related events scatter across partitions, breaking your ordering guarantees. Most painfully, these aren’t bugs you can just patch - they’re architectural decisions that require significant rework to fix.

The good news? Once you understand the core patterns and their tradeoffs, choosing the right one becomes much more straightforward.

How Event Design Meets Communication

The event design patterns from our previous post directly influence which communication patterns work best:

  • Event Notification → Often uses Publish-Subscribe for broad distribution
  • Event Carried State Transfer → May use Point-to-Point for targeted delivery or Streaming for analytics
  • Async Commands → Typically uses Point-to-Point queues
  • CDC Events → Usually flows through Streaming for ordered processing

Understanding these natural affinities helps you make coherent architectural decisions.

Pattern 1: Publish-Subscribe

Publish-Subscribe (pub/sub) is the pattern where publishers broadcast events to multiple consumers without knowing who they are. Think of it as a radio station: the DJ doesn’t know who’s listening, they just broadcast the signal and anyone with a radio can tune in.

Here’s what makes pub/sub tick: producers and consumers are completely decoupled. Producers don’t know (or care) who’s listening, and consumers can come and go without affecting producers. At its simplest, publishers send events to topics and consumers subscribe to receive all events on those topics. Modern implementations add filtering capabilities, letting consumers subscribe only to events matching specific criteria.

sequenceDiagram
    participant OS as Order Service
    participant B as Message Broker
    participant AS as Analytics Service
    participant NS as Notification Service
    participant IS as Inventory Service
    participant FS as Fraud Service

    OS->>B: Publish OrderPlaced Event
    Note over B: Event:<br/>{<br/>  "orderId": "123",<br/>  "customerId": "456",<br/>  "items": [...],<br/>  "total": 1299.99,<br/>  "shipping": "express"<br/>}

    Note over B: Topic subscribers get all events<br/>Rule-based subscribers get filtered events

    par Parallel Delivery
        B-->>AS: All order events
        Note over AS: Update metrics
    and
        B-->>NS: All order events
        Note over NS: Send confirmation
    and
        B-->>IS: All order events
        Note over IS: Reserve items
    and
        B-->>FS: Only if total > 500
        Note over FS: Run fraud checks
    end

When pub/sub shines:

  • Multiple services need to react to the same business event
  • Broadcasting Event Notifications (from my previous post) to many consumers
  • Event Carried State Transfer where multiple services need the same state snapshot
  • Broadcasting updates or notifications across your system

I’ve used pub/sub successfully in order processing where analytics needs every order for metrics, notifications needs them for customer communication, and fraud detection only cares about high-value orders. Each service subscribes according to its needs - some to the entire stream, others with specific filters.

The tradeoffs:

The catch? You’ll never know exactly who received your events. Sure, they’re delivered, but to whom? That’s the broker’s secret. And as you add sophisticated routing rules, debugging becomes a detective story - “why didn’t the fraud service get this $10,000 order?” could lead you through a maze of filter conditions.

Schema changes are another headache. When all your consumers depend on the event structure, even small changes require careful coordination. Plus, more consumers mean more message deliveries, which can impact performance and costs.

Publish-Subscribe with AWS Services

AWS offers several services for implementing pub/sub, each optimized for different scenarios:

Amazon EventBridge provides advanced content-based routing with rich filtering:

  • Filtering: Complex rules with pattern matching, numeric comparisons, and IP address matching
  • Throughput: 10,000 events/second (soft limit, can be increased)
  • Targets: 20+ AWS service integrations plus Lambda, SQS, and API destinations
  • Features: Schema registry, event replay, archive, and event discovery
    • Event transformation before delivery
    • Cross-account and cross-region routing
    • Built-in error handling and retry logic
    • Partner integrations (Salesforce, Zendesk, PagerDuty)

Amazon SNS is the classic pub/sub service, perfect for simple topic-based broadcasting:

  • Scale: Handles millions of messages per second
  • Delivery: Push-based to SQS, Lambda, HTTP endpoints, email, SMS
  • Filtering: Basic attribute-based message filtering
  • Fan-out: Up to 12.5 million subscriptions per topic
    • FIFO topics for ordered delivery (300 TPS, up to 3,000 with batching)
    • Standard topics for maximum throughput
    • Dead letter queues for failed deliveries
    • Cross-region replication for global systems

For most pub/sub implementations, start with EventBridge. It provides the best balance of features, filtering capabilities, and AWS service integrations. Use SNS when you need simple topic-based fan-out at massive scale or non-AWS endpoint delivery (email, SMS).

Pattern 2: Point-to-Point

Point-to-Point ensures each message is processed by exactly one consumer. Unlike pub/sub where messages are broadcast, point-to-point creates a direct channel between producers and consumers. Think of it as registered mail - one sender, one recipient, with confirmation of delivery.

What really defines this pattern is exclusivity: once a consumer takes a message, it’s gone from the queue. This pattern supports two delivery modes: unordered for maximum throughput where multiple workers process messages in parallel, or strictly ordered (FIFO) when sequence matters for business correctness.

sequenceDiagram
    participant P as Producer
    participant Q as Message Queue
    participant W1 as Worker 1
    participant W2 as Worker 2
    participant DB as Database

    Note over Q: Standard Queue Example
    P->>Q: Image Resize Task A
    P->>Q: Image Resize Task B
    P->>Q: Image Resize Task C

    par Parallel Processing
        W1->>Q: Poll for message
        Q-->>W1: Task A
        W1->>W1: Process image
    and
        W2->>Q: Poll for message
        Q-->>W2: Task B
        W2->>W2: Process image
    end

    Note over Q: FIFO Queue Example
    P->>Q: Deposit $1000 (seq: 1)
    P->>Q: Withdraw $500 (seq: 2)

    W1->>Q: Receive next message
    Q-->>W1: Deposit $1000
    W1->>DB: Update balance: +$1000

    W1->>Q: Receive next message
    Q-->>W1: Withdraw $500
    W1->>DB: Update balance: -$500

    Note over Q: Order preserved for<br/>business correctness

When point-to-point shines:

  • Distributing work across multiple processors (image processing, report generation)
  • Ensuring exactly-once processing for critical operations
  • Decoupling producers from consumers with reliable delivery
  • Sequential processing for operations that must happen in order (financial transactions, state machines)

I’ve used standard queues successfully for image processing workloads where users upload photos needing resizing and optimization. Multiple workers pull from the same queue, automatically scaling based on backlog. For financial systems, FIFO queues ensure transaction ordering - deposits, withdrawals, and transfers execute in the exact sequence they were submitted.

The tradeoffs:

Here’s the thing: point-to-point isn’t for broadcasting. Each message has one destination, period. This means producers need to know which queue to use, creating some coupling. FIFO queues add another wrinkle - they sacrifice throughput for ordering guarantees, typically capping at 300 TPS.

The complexity lies in the details. Visibility timeouts, message acknowledgments, and failure handling all need careful tuning. Set your visibility timeout too short, and messages reappear while still being processed. Too long, and failed messages sit idle.

Point-to-Point with AWS Services

Amazon SQS is the native AWS service for point-to-point messaging:

Standard Queues offer maximum throughput with best-effort ordering:

  • Throughput: Nearly unlimited TPS
  • Delivery: At-least-once with potential duplicates
  • Features:
    • Message size up to 256 KB (2 GB with S3)
    • Retention up to 14 days
    • Long polling for cost efficiency
    • Dead letter queues for poison messages
    • Visibility timeout up to 12 hours
    • Delay queues up to 15 minutes

FIFO Queues guarantee ordered, exactly-once processing:

  • Throughput: 300 TPS (3,000 with batching)
  • Ordering: Strict FIFO within message groups
  • Delivery: Exactly-once processing with deduplication
  • Features:
    • Message group IDs for parallel ordered streams
    • Deduplication IDs prevent duplicate sends
    • All standard queue features
    • Content-based deduplication

For new implementations, start with SQS standard queues and only enable FIFO if you need strict ordering. The throughput difference is significant, and many use cases that seem to need ordering can work with eventual consistency.

Pattern 3: Request-Reply

Request-Reply adds synchronous-style communication to asynchronous messaging. A service sends a request message and expects a specific reply, maintaining the benefits of decoupling while enabling two-way communication. It’s like sending a letter with a return envelope - asynchronous delivery but with an expected response.

graph TB
    subgraph "Client Side"
        C[Client Service]
        RQ[Reply Queue]
    end

    subgraph "Message Infrastructure"
        REQ[Request Queue]
    end

    subgraph "Server Side"
        S[Server Service]
    end

    C -->|"1: Request + ReplyTo"| REQ
    REQ -->|"2: Process Request"| S
    S -->|"3: Send Reply"| RQ
    RQ -->|"4: Correlate Response"| C

    style REQ fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style RQ fill:#dcfce7,stroke:#16a34a,stroke-width:2px

Correlation-Based Reply

This approach uses correlation IDs to match requests with replies, allowing multiple outstanding requests without confusion.

sequenceDiagram
    participant CS as Client Service
    participant RQ as Request Queue
    participant RQR as Reply Queue
    participant PS as Processing Service

    Note over CS: Generate correlationId: abc123
    CS->>RQ: Request {<br/>  correlationId: "abc123",<br/>  replyTo: "reply-queue",<br/>  action: "calculatePrice",<br/>  data: {...}<br/>}

    Note over CS: Generate correlationId: xyz789
    CS->>RQ: Request {<br/>  correlationId: "xyz789",<br/>  replyTo: "reply-queue",<br/>  action: "checkInventory",<br/>  data: {...}<br/>}

    PS->>RQ: Poll for messages
    RQ-->>PS: Request (correlationId: abc123)
    PS->>PS: Process calculatePrice

    PS->>RQR: Reply {<br/>  correlationId: "abc123",<br/>  result: { price: 99.99 }<br/>}

    PS->>RQ: Poll for messages
    RQ-->>PS: Request (correlationId: xyz789)
    PS->>PS: Process checkInventory

    PS->>RQR: Reply {<br/>  correlationId: "xyz789",<br/>  result: { available: true }<br/>}

    CS->>RQR: Poll for replies
    RQR-->>CS: Reply (correlationId: xyz789)
    CS->>CS: Match xyz789 to request

    CS->>RQR: Poll for replies
    RQR-->>CS: Reply (correlationId: abc123)
    CS->>CS: Match abc123 to request

When correlation-based reply works well:

  • Multiple concurrent requests from the same client
  • Stateless processing services
  • Scenarios where reply order doesn’t matter
  • Integration with legacy systems expecting request-reply

I’ve used this pattern successfully for price calculation services where the frontend could request prices for multiple products simultaneously. Each request had a unique correlation ID, and responses could return in any order without confusion.

The challenges:

Actually, managing correlation-based replies gets messy fast. Clients need to track every outstanding request, implement timeouts for responses that never arrive, and handle orphaned replies when they give up waiting. Each client also needs its own reply queue or a way to filter shared queues.

Callback-Based Reply

Instead of correlation IDs, this pattern uses callback endpoints or queues specified in each request.

sequenceDiagram
    participant MS as Microservice
    participant EB as EventBridge
    participant LS as Lambda Service
    participant CQ as Callback Queue

    MS->>EB: Request Event {<br/>  action: "processDocument",<br/>  callbackQueue: "team-a-results",<br/>  documentId: "doc-123"<br/>}

    EB-->>LS: Route to processor
    LS->>LS: Process document

    LS->>CQ: Send to specified callback {<br/>  documentId: "doc-123",<br/>  status: "completed",<br/>  result: {...}<br/>}

    MS->>CQ: Poll callback queue
    CQ-->>MS: Receive result

    Note over MS: Different teams can use<br/>different callback queues

When callback-based reply shines:

  • Multi-tenant systems where each tenant has separate infrastructure
  • Scenarios with different reply handling requirements
  • Systems where reply destination varies by request type
  • Webhook-style integrations

The tradeoffs:

Look, callback patterns introduce their own complexity. You must validate every callback destination (security nightmare if you don’t), manage multiple reply endpoints, and handle failed callbacks gracefully. Service discovery becomes crucial - how does the processor know which callbacks are valid?

Request-Reply with AWS Services

AWS doesn’t offer a dedicated request-reply service. Both patterns use the same core services but configure them differently:

Core Services:

  • SQS: Request queues and (for correlation-based) reply queues
  • DynamoDB or ElastiCache: Store request state and correlations
  • Your choice of compute: Lambda, ECS, or EC2 for processing

Pattern Differences:

  • Correlation-Based: Uses fixed reply queue(s) where consumers poll and match correlation IDs
  • Callback-Based: Each request specifies its reply destination (queue, URL, or topic)

When you might need additional services:

  • Step Functions: Handles request-reply natively with waitForTaskToken
  • API Gateway: For HTTP callback endpoints
  • EventBridge: If routing requests based on content

What works in practice:

Reply queues should use long polling (20 seconds) to reduce costs - no point in hammering empty queues. Set DynamoDB TTL to automatically clean up old correlations, because they will accumulate. Your visibility timeout must exceed processing time with buffer, or you’ll get duplicate processing. And always, always implement request timeouts. Messages will get lost.

Most teams start with SQS + DynamoDB for simple request-reply, then evaluate Step Functions if they need complex orchestration or visual workflow tracking.

Pattern 4: Event Streaming

Event streaming maintains events in an ordered, replayable log that multiple consumers can read independently. Unlike queues where messages are deleted after processing, streams preserve events for a retention period, enabling replay and multiple processing patterns.

graph TB
    subgraph "Event Producers"
        P1[Order Service]
        P2[User Service]
        P3[Inventory Service]
    end

    subgraph "Event Stream"
        subgraph "Partition 0"
            E1[Event 1]
            E2[Event 2]
            E3[Event 3]
        end
        subgraph "Partition 1"
            E4[Event 4]
            E5[Event 5]
            E6[Event 6]
        end
        subgraph "Partition 2"
            E7[Event 7]
            E8[Event 8]
            E9[Event 9]
        end
    end

    subgraph "Consumers"
        subgraph "Real-time Analytics"
            C1[Consumer at Event 9]
        end
        subgraph "Batch Processing"
            C2[Consumer at Event 5]
        end
        subgraph "New ML Model"
            C3[Consumer at Event 1]
        end
    end

    P1 --> E3
    P1 --> E6
    P1 --> E9
    P2 --> E1
    P2 --> E4
    P2 --> E7
    P3 --> E2
    P3 --> E5
    P3 --> E8

    E9 -.->|"Latest"| C1
    E5 -.->|"Processing"| C2
    E1 -.->|"Replay from start"| C3

    style E1 fill:#f3f4f6
    style E2 fill:#f3f4f6
    style E3 fill:#f3f4f6
    style E4 fill:#e5e7eb
    style E5 fill:#e5e7eb
    style E6 fill:#d1d5db
    style E7 fill:#9ca3af
    style E8 fill:#6b7280
    style E9 fill:#374151

Here’s what streaming brings to the table: events are immutable once written, distributed across partitions for scale, maintain order within each partition, and stay available for the entire retention window. Each consumer tracks their own position independently - your real-time dashboard reads the latest while your ML pipeline replays last month’s data.

sequenceDiagram
    participant PS as Payment Service
    participant K as Kinesis Stream
    participant RT as Real-time Fraud Detection
    participant BA as Analytics (Batch)
    participant ML as ML Training

    loop Continuous Event Flow
        PS->>K: PaymentProcessed Event
        Note over K: Append to shard<br/>Sequence: 1000
    end

    Note over RT: Reading from latest
    K-->>RT: Events 998, 999, 1000
    RT->>RT: Check for fraud patterns

    Note over BA: Reading hourly batches
    K-->>BA: Events 500-1000
    BA->>BA: Aggregate metrics

    Note over ML: Replaying from beginning
    K-->>ML: Events 1-1000
    ML->>ML: Train new model

    Note over K: Each consumer maintains<br/>independent position

When event streaming shines:

  • Multiple consumers need events at different paces (real-time, batch, replay)
  • Event sourcing or audit requirements
  • Real-time analytics and monitoring
  • Change data capture (CDC) patterns
  • Need to reprocess historical data for new features or bug fixes

In a payments platform I worked on, we used streaming to feed multiple systems: real-time fraud detection read the latest events, hourly batch jobs computed business metrics, and ML teams could replay the entire history to train new models. Each consumer moved at its own pace without affecting others.

The tradeoffs:

Streaming isn’t free complexity. Managing consumer positions, handling shard assignments, choosing the right partition key (get it wrong and you’ll create hot shards that kill performance) - it all adds up. You pay for provisioned capacity whether you use it or not. There’s no built-in filtering either - consumers process everything in their shards. And resharding? That temporarily breaks ordering guarantees.

Event Streaming with AWS Services

Amazon Kinesis Data Streams is the primary AWS streaming service:

  • Throughput: 1 MB/sec or 1,000 records/sec per shard
  • Retention: 24 hours to 365 days
  • Ordering: Guaranteed within shard
  • Scaling: Manual or on-demand auto-scaling
  • Consumer options:
    • Kinesis Client Library (KCL) for at-least-once processing
    • Enhanced fan-out for dedicated throughput per consumer
    • Lambda integration for serverless processing
    • Kinesis Analytics for SQL-based stream processing
    • Kinesis Data Firehose for loading to S3/Redshift

DynamoDB Streams for database change streams:

  • Integration: Built into DynamoDB
  • Ordering: Per item (partition key)
  • Retention: 24 hours
  • Use cases: Triggering Lambda functions on data changes

When to consider alternatives:

  • SQS FIFO: For simple ordered processing without replay needs (300 TPS limit)
  • Amazon MSK: Only if you need Kafka compatibility or specific Kafka features
  • MSK Serverless: For variable Kafka workloads without operational overhead

My take on choosing:

Start with Kinesis Data Streams for event streaming on AWS. It provides the best balance of features, AWS integration, and operational simplicity. Use DynamoDB Streams when you’re already using DynamoDB and need change events. Consider SQS FIFO when you need ordering but not replay - it’s simpler and cheaper. Only reach for MSK if you’re migrating from Kafka or absolutely need Kafka-specific features.

Choosing the Right Pattern

Selecting the right communication pattern isn’t about finding the “best” pattern - it’s about matching patterns to your specific requirements. Here’s a framework to guide your decision:

Pattern Selection Framework

graph TD
    Start[Message Delivery Need]

    Start --> Q0{Need synchronous<br/>response?}

    Q0 -->|Yes| RR[Request-Reply<br/>→ SQS + State Store]
    Q0 -->|No| Q1{Do multiple consumers<br/>need the same message?}

    Q1 -->|Yes| Q2{Need replay<br/>capability?}
    Q1 -->|No| Q5{Need strict<br/>ordering?}

    Q2 -->|Yes| ES[Event Streaming<br/>→ Kinesis]
    Q2 -->|No| Q4{Need content-based<br/>filtering?}

    Q4 -->|Yes| CB[Publish-Subscribe<br/>→ EventBridge]
    Q4 -->|No| TB[Publish-Subscribe<br/>→ SNS]

    Q5 -->|Yes| PTP_FIFO[Point-to-Point<br/>→ SQS FIFO]
    Q5 -->|No| PTP_STD[Point-to-Point<br/>→ SQS Standard]

    style CB fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style TB fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style ES fill:#f3e8ff,stroke:#9333ea,stroke-width:2px
    style RR fill:#dcfce7,stroke:#16a34a,stroke-width:2px
    style PTP_FIFO fill:#fef3c7,stroke:#f59e0b,stroke-width:2px
    style PTP_STD fill:#fef3c7,stroke:#f59e0b,stroke-width:2px

Key Decision Factors

Factor Question Answer → Pattern
Consumer Multiplicity Do multiple services need the same event? • Single consumer → Point-to-Point
• Multiple consumers → Publish-Subscribe or Streaming
Ordering Requirements Must events be processed in sequence? • No ordering needed → Standard queues or topics
• Strict ordering → FIFO queues or partitioned streams
Replay Needs Will you need to reprocess historical events? • Never → Traditional messaging
• Sometimes → Event streaming
Latency Tolerance How quickly must events be processed? • Real-time → Direct messaging patterns
• Can batch → Streaming with batch consumers
Volume and Throughput How many events per second? • < 1,000/sec → Any pattern works
• > 10,000/sec → Consider streaming

Common Anti-Patterns to Avoid

1. Request-Reply Over Pub/Sub

graph LR
    A[Service A] -->|"Request"| T[Topic]
    T --> B[Service B]
    T --> C[Service C]
    T --> D[Service D]
    B -->|"Reply?"| T2[Reply Topic]

    style T fill:#ffcccc,stroke:#ff0000,stroke-width:2px

    Note[❌ Multiple services might reply<br/>❌ No correlation guarantee<br/>❌ Timing issues]

The Smell:

  • Publishing “GetUserDetails” or “CalculatePrice” events (these are commands, not events)
  • Consumers publishing to generic “reply” topics
  • Timeouts waiting for responses that may never come
  • Complex correlation logic trying to match requests to replies

Real Example:

# ❌ BAD: Trying to do request-reply via SNS
sns.publish(
    TopicArn='user-requests',
    Message=json.dumps({
        'action': 'getUserProfile',
        'userId': '123',
        'replyTopic': 'user-replies'
    })
)
# Now what? Wait for a reply? From whom? For how long?

Use Instead:

  • For true request-reply: Use SQS with correlation IDs or Step Functions
  • For commands: Use point-to-point queues (SQS)
  • For notifications: Use proper events like “UserProfileUpdated”

2. Point-to-Point for Fan-Out

graph LR
    P[Publisher] -->|"Copy 1"| Q1[Queue 1]
    P -->|"Copy 2"| Q2[Queue 2]
    P -->|"Copy 3"| Q3[Queue 3]

    Q1 --> C1[Consumer 1]
    Q2 --> C2[Consumer 2]
    Q3 --> C3[Consumer 3]

    style P fill:#ffcccc,stroke:#ff0000,stroke-width:2px

    Note[❌ Publisher knows all consumers<br/>❌ Adding consumers requires code changes<br/>❌ Tight coupling]

The Smell:

  • Publisher code with hardcoded queue URLs for each consumer
  • Loops sending the same message to multiple queues
  • Publisher deployments needed when adding new consumers
  • “Failed to send to queue X” errors when consumers change

Real Example:

# ❌ BAD: Manual fan-out creating tight coupling
order_event = create_order_event(order)

# Publisher knows about every consumer!
sqs.send_message(QueueUrl=INVENTORY_QUEUE, MessageBody=order_event)
sqs.send_message(QueueUrl=SHIPPING_QUEUE, MessageBody=order_event)
sqs.send_message(QueueUrl=ANALYTICS_QUEUE, MessageBody=order_event)
sqs.send_message(QueueUrl=EMAIL_QUEUE, MessageBody=order_event)
# Adding new consumer = code change + deployment

Use Instead:

  • SNS for simple topic-based fan-out
  • EventBridge for content-based routing
  • Let consumers subscribe themselves to topics/rules

3. Streaming for Simple Queuing

graph LR
    P[Producer] -->|"Simple tasks"| K[Kinesis Stream]
    K --> C[Consumer]

    style K fill:#ffcccc,stroke:#ff0000,stroke-width:2px

    Note[❌ Overkill for simple work distribution<br/>❌ Unnecessary complexity<br/>❌ Higher costs]

The Smell:

  • Single consumer reading from a Kinesis stream
  • No need for replay or ordering
  • Resharding complexity for simple workloads
  • Paying for provisioned shards with low utilization

Real Example:

# ❌ BAD: Using Kinesis for simple image resize tasks
kinesis.put_record(
    StreamName='image-resize-stream',
    Data=json.dumps({
        'imageId': '123',
        'size': 'thumbnail'
    }),
    PartitionKey='image-123'  # Why do we need partitioning?
)

# Consumer complexity for no benefit
# KCL setup, checkpointing, shard management...

Use Instead:

  • SQS Standard queue for simple work distribution
  • SQS FIFO if you need ordering (still simpler than Kinesis)
  • Reserve Kinesis for true streaming needs: replay, multiple consumers, high throughput analytics

Combining Patterns for Real-World Systems

Most production systems combine multiple patterns. Here’s a real-world example from an e-commerce platform:

graph TB
    subgraph "Order Processing"
        API[API Gateway]
        OS[Order Service]
        API -->|"Request-Reply"| OS
    end

    subgraph "Event Distribution"
        EB[EventBridge]
        KIN[Kinesis Data Stream]
        OS -->|"Ordered events"| KIN
        OS -->|"Business events"| EB
    end

    subgraph "Immediate Processing"
        SQS1[Payment Queue]
        SQS2[Inventory Queue]
        EB -->|"Route by rules"| SQS1
        EB -->|"Route by rules"| SQS2
        PS[Payment Service]
        IS[Inventory Service]
        SQS1 -->|"Process"| PS
        SQS2 -->|"Process"| IS
    end

    subgraph "Stream Processing"
        KCL[Analytics Consumer]
        ML[ML Consumer]
        RT[Real-time Dashboard]
        KIN -->|"Ordered stream"| KCL
        KIN -->|"Ordered stream"| ML
        KCL -->|"Aggregated data"| RT
    end

    subgraph "Notifications"
        SNS[SNS Topic]
        EB -->|"High-value orders"| SNS
        EMAIL[Email]
        SMS[SMS]
        SNS -->|"Fan-out"| EMAIL
        SNS -->|"Fan-out"| SMS
    end

    style API fill:#f3f4f6,stroke:#374151,stroke-width:2px
    style EB fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style KIN fill:#f3e8ff,stroke:#9333ea,stroke-width:2px
    style SNS fill:#dcfce7,stroke:#16a34a,stroke-width:2px

This architecture combines:

  • Request-Reply for synchronous API responses
  • Content-based Pub/Sub for intelligent event routing
  • Point-to-Point queues for reliable task processing
  • Event Streaming for analytics and machine learning
  • Topic-based Pub/Sub for notifications

Each pattern serves a specific purpose, and together they create a resilient, scalable system.

The Path Forward

Communication patterns are the nervous system of your event-driven architecture. Choose the wrong patterns, and you’ll fight against your architecture every day. Choose the right ones, and your system will feel natural and intuitive to work with.

The real lessons from years of building these systems:

  1. Start simple: Begin with the simplest pattern that meets your needs. You can always add complexity later.
  2. Design for evolution: Your communication needs will change. Design with clear boundaries so you can swap patterns without rewriting everything.
  3. Monitor everything: Track message flow, processing times, and error rates. You can’t improve what you can’t measure.
  4. Test failure modes: Every pattern fails differently. Understand and test these failures before they happen in production.

Remember: there’s no perfect pattern, only patterns that fit your specific context. The best architects aren’t those who memorize patterns, but those who understand tradeoffs and can match patterns to problems.

Next week, we’ll explore consumption patterns - how to process events at scale without overwhelming your services. We’ll cover effective filtering strategies, handling backpressure when producers outpace consumers, and scaling approaches that work with Lambda, SQS, and Kinesis. No complex infrastructure required - just practical patterns that keep your event processing running smoothly.

What communication patterns have worked well in your systems? What combinations have you found effective? What patterns have caused you pain? I’d love to hear your experiences and learn from your journey.


References

Core Communication Pattern Resources

  1. Enterprise Integration Patterns - Gregor Hohpe and Bobby Woolf’s foundational patterns

  2. Messaging Patterns Overview - Complete catalog of messaging patterns

Publish-Subscribe Pattern Resources

  1. Publish-Subscribe Channel Pattern - Enterprise Integration Patterns

  2. Amazon SNS Developer Guide - Topic-based pub/sub on AWS

  3. Amazon EventBridge User Guide - Content-based routing and filtering

Point-to-Point Pattern Resources

  1. Point-to-Point Channel Pattern - Enterprise Integration Patterns

  2. Amazon SQS Developer Guide - Queue-based messaging on AWS

  3. Amazon SQS FIFO Queues - Ordered message processing

Request-Reply Pattern Resources

  1. Request-Reply Pattern - Enterprise Integration Patterns

  2. Correlation Identifier Pattern - Matching requests with replies

  3. Implementing Request-Response Pattern - AWS implementation guide

  4. AWS Step Functions Callback Pattern - Callback tasks with SQS

Event Streaming Resources

  1. Amazon Kinesis Data Streams Developer Guide - Stream processing on AWS

  2. Amazon MSK Developer Guide - Managed Kafka on AWS

  3. DynamoDB Streams Developer Guide - Change streaming for DynamoDB

Pattern Selection and Architecture

  1. Choosing the Right Messaging Service - AWS Builder’s Library

  2. EventBridge vs SNS vs SQS - AWS comparison guide

  3. Messaging Anti-Patterns - Common mistakes to avoid

  4. Amazon SQS vs Amazon Kinesis - Detailed comparison

Books and Extended Reading

  1. Enterprise Integration Patterns - Hohpe & Woolf (ISBN: 978-0321200686)