In Part 1, we covered the fundamental tradeoffs. In Part 2, we built GitHub’s backend. Now let’s talk about what happens when you’re running this in production.

These are the sharp edges that separate successful DynamoDB implementations from disasters. Hot partitions that throttle your entire application. Migrations that take days and cost thousands. Debugging sessions where you can’t figure out why your query returns nothing.

Let’s dig into the problems nobody mentions in the tutorials.

Table of Contents

Hot Partitions: The Silent Killer

DynamoDB’s performance promise is simple: consistent single-digit millisecond latency at any scale. But there’s a catch—that’s per partition. If all your traffic hits one partition, you’re toast.

The Problem

Each partition in DynamoDB can handle:

  • 3,000 read capacity units per second
  • 1,000 write capacity units per second

Exceed these limits and DynamoDB starts throttling. Your requests fail with ProvisionedThroughputExceededException. Your application slows down. Users complain.

The Pattern That Kills

// ❌ DEATH: All events go to one partition
async function recordEvent(eventType: string, eventData: any) {
  await EventEntity.build(PutItemCommand)
    .item({
      pk: 'EVENTS',  // Every event hits the same partition!
      sk: `${Date.now()}#${eventData.id}`,
      eventType,
      ...eventData
    })
    .send();
}

// At 2,000 events/second: works fine
// At 5,000 events/second: throttling starts
// At 10,000 events/second: complete failure

The problem? Every single event write goes to the same partition (EVENTS). You hit the 1,000 WCU/sec limit and everything breaks.

The Solution: Sharding

Distribute your writes across multiple partitions:

// ✅ LIFE: Distribute across partitions
async function recordEvent(eventType: string, eventData: any) {
  // Rule of thumb: expected writes per second / 1000 = minimum shards
  const expectedWritesPerSecond = 5000;
  const shardsNeeded = Math.ceil(expectedWritesPerSecond / 1000);
  const shardId = Math.floor(Math.random() * shardsNeeded);

  await EventEntity.build(PutItemCommand)
    .item({
      // Each shard is a separate partition
      pk: `EVENTS#${new Date().toISOString().slice(0, 10)}#SHARD${shardId}`,
      sk: `${Date.now()}#${eventData.id}`,
      eventType,
      ...eventData
    })
    .send();
}

Now your 5,000 writes/second are spread across 5 partitions. Each partition handles 1,000 writes/second—well within limits.

Reading from Sharded Data

The tradeoff? Reading requires fan-out queries:

async function getEventsForDate(date: string) {
  const shardsNeeded = 5;  // Must match write sharding

  // Query each shard in parallel
  const promises = [];
  for (let i = 0; i < shardsNeeded; i++) {
    promises.push(
      GitHubTable.build(QueryCommand)
        .query({
          partition: `EVENTS#${date}#SHARD${i}`
        })
        .send()
    );
  }

  const results = await Promise.all(promises);

  // Merge and sort results client-side
  const allEvents = results.flatMap(r => r.Items || []);
  return allEvents.sort((a, b) =>
    b.sk.localeCompare(a.sk)  // Sort by timestamp
  );
}

Five queries instead of one. But five queries that work beats one query that throttles.

Time-Based Hot Partitions

Another common mistake—time-series data without sharding:

// ❌ BAD: All users' activity for a day in one partition
const ActivityEntity = new Entity({
  name: 'Activity',
  table: GitHubTable,
  schema: item({
    userId: string().required(),
    date: string().required(),  // YYYY-MM-DD
    timestamp: string().required(),
    activityType: string().required()
  }).and((_schema) => ({
    pk: string().key().link<typeof _schema>(
      ({ date }) => `ACTIVITY#${date}`  // Hot partition!
    ),
    sk: string().key().link<typeof _schema>(
      ({ timestamp, userId }) => `${timestamp}#${userId}`
    )
  }))
} as const);

// ✅ GOOD: Partition by user AND date
const ActivityEntity = new Entity({
  name: 'Activity',
  table: GitHubTable,
  schema: item({
    userId: string().required(),
    yearMonth: string().required(),  // YYYY-MM
    timestamp: string().required(),
    activityType: string().required()
  }).and((_schema) => ({
    pk: string().key().link<typeof _schema>(
      ({ userId, yearMonth }) => `USER#${userId}#${yearMonth}`
    ),
    sk: string().key().link<typeof _schema>(
      ({ timestamp }) => timestamp
    )
  }))
} as const);

Now each user’s activity is in their own partition. No hot spots.

The Migration Problem

Let’s be honest: migrations in single table design are brutal. This is one of the genuine tradeoffs you’re making for performance at scale.

Why Migrations Are Hard

In a relational database, adding a column is simple:

ALTER TABLE issues ADD COLUMN priority VARCHAR(20);

Done. Existing rows get NULL for the new column. New rows can set it.

In DynamoDB? You have three options, none of them great.

Option 1: Add a GSI (If You Have Room)

You only get 20 GSIs per table. If you haven’t used them all:

// New requirement: query issues by priority
const IssueEntity = new Entity({
  name: 'Issue',
  table: GitHubTable,
  schema: item({
    owner: string().required().key(),
    repo_name: string().required().key(),
    issue_number: number().required().key(),
    title: string().required(),
    status: string().required(),
    priority: string().optional()  // New field
  }).and((_schema) => ({
    // Existing keys...
    PK: string().key().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    SK: string().key().link<typeof _schema>(
      ({ issue_number }) => `ISSUE#${String(issue_number).padStart(6, '0')}`
    ),
    // New GSI for priority queries
    GSI5PK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    GSI5SK: string().link<typeof _schema>(
      ({ priority, issue_number }) =>
        priority
          ? `PRIORITY#${priority}#${String(issue_number).padStart(6, '0')}`
          : undefined  // Sparse index - only items with priority
    )
  }))
} as const);

This works but:

  • You’re burning one of your 20 GSI slots
  • GSI storage costs money
  • New items work immediately, old items won’t appear in the GSI until updated

Option 2: Full Table Scan and Update (Painful)

For existing items to appear in the new GSI, you need to update them:

import { ScanCommand, UpdateItemCommand } from 'dynamodb-toolbox';

async function migrateIssuePriority() {
  console.log('Starting migration...');

  let lastEvaluatedKey = undefined;
  let processedCount = 0;
  let errorCount = 0;

  do {
    // Scan the table
    const response = await GitHubTable.build(ScanCommand)
      .options({
        exclusiveStartKey: lastEvaluatedKey
      })
      .send();

    // Process each item
    for (const item of response.Items || []) {
      if (item.entity === 'Issue' && !item.priority) {
        try {
          // Set default priority based on labels
          const priority = item.labels?.has('critical')
            ? 'high'
            : 'medium';

          await IssueEntity.build(UpdateItemCommand)
            .key({
              owner: item.owner,
              repo_name: item.repo_name,
              issue_number: item.issue_number
            })
            .operations({
              set: {
                priority,
                GSI5PK: `REPO#${item.owner}#${item.repo_name}`,
                GSI5SK: `PRIORITY#${priority}#${String(item.issue_number).padStart(6, '0')}`
              }
            })
            .send();

          processedCount++;

          if (processedCount % 100 === 0) {
            console.log(`Processed ${processedCount} issues...`);
          }
        } catch (error) {
          console.error(`Failed to update issue ${item.issue_number}:`, error);
          errorCount++;
        }
      }
    }

    lastEvaluatedKey = response.LastEvaluatedKey;
  } while (lastEvaluatedKey);

  console.log(`Migration complete. Processed: ${processedCount}, Errors: ${errorCount}`);
}

This approach:

  • Scans your entire table (expensive!)
  • Takes hours or days for large tables
  • Consumes read and write capacity
  • Can cost thousands of dollars
  • No rollback if something breaks

Option 3: Dual-Write and Lazy Migration

The safest but most complex approach:

class IssueRepository {
  // Write to both old and new formats
  async createIssue(data: any) {
    const priority = this.calculatePriority(data);

    await IssueEntity.build(PutCommand)
      .item({
        ...data,
        priority,  // New field
        // GSI5 keys populated for new items
        GSI5PK: `REPO#${data.owner}#${data.repo_name}`,
        GSI5SK: `PRIORITY#${priority}#${String(data.issue_number).padStart(6, '0')}`
      })
      .send();
  }

  // Migrate on read
  async getIssue(owner: string, repoName: string, issueNumber: number) {
    const issue = await IssueEntity.build(GetCommand)
      .key({ owner, repo_name: repoName, issue_number: issueNumber })
      .send();

    // Migrate if missing new attributes
    if (issue.Item && !issue.Item.priority) {
      const priority = this.calculatePriority(issue.Item);

      await IssueEntity.build(UpdateItemCommand)
        .key({ owner, repo_name: repoName, issue_number: issueNumber })
        .operations({
          set: {
            priority,
            GSI5PK: `REPO#${owner}#${repoName}`,
            GSI5SK: `PRIORITY#${priority}#${String(issueNumber).padStart(6, '0')}`
          }
        })
        .send();

      // Return updated item
      issue.Item.priority = priority;
    }

    return issue.Item;
  }

  private calculatePriority(issue: any): string {
    if (issue.labels?.has('critical')) return 'high';
    if (issue.labels?.has('bug')) return 'medium';
    return 'low';
  }
}

This approach:

  • Writes are more complex (but no downtime)
  • Reads migrate items lazily (gradual migration)
  • Eventually all items get migrated
  • Can run for weeks/months alongside old code

The Reality Check

Here’s what nobody tells you:

  1. Cost: Full table scans of a 1TB table can cost thousands
  2. Time: Updating millions of items takes days
  3. Risk: No atomic rollback—bugs corrupt data
  4. Primary Key Limitation: You CANNOT change partition/sort keys. Ever. New table + full data copy required.

This is the real tradeoff of single table design. Plan your keys carefully upfront because changing them later is incredibly painful.

Understanding Eventual Consistency

DynamoDB’s consistency model is simple but unforgiving: main table reads can be strongly consistent, but Global Secondary Indexes (GSIs) are always eventually consistent. Miss this and you’ll ship bugs.

The Problem

Here’s the classic mistake that looks fine in development but breaks in production:

// ❌ BUG: Create and immediately query via GSI
async function createAndListRepos(owner: string, repoName: string) {
  // 1. Create a repository
  await RepoEntity.build(PutItemCommand)
    .item({
      owner,
      repo_name: repoName,
      description: 'New repository',
      is_private: false
    })
    .send();

  // 2. Immediately query all repos for this account via GSI3
  const repos = await GitHubTable.build(QueryCommand)
    .entities(RepoEntity)
    .query({
      index: 'GSI3',
      partition: `ACCOUNT#${owner}`,
      range: { lt: 'ACCOUNT#' }
    })
    .send();

  // ⚠️ The new repo might not appear in results!
  // GSI updates happen asynchronously (typically < 1 second)
  return repos.Items;
}

The write succeeds. The query succeeds. But the item isn’t there. In development with low traffic, this usually works (< 100ms lag). In production with high write volume, GSI lag can exceed 1 second.

Why This Happens

When you write to DynamoDB:

  1. Main table write: Synchronous, immediately consistent
  2. GSI update: Asynchronous, eventually consistent (typically 100ms-1s)

Your code doesn’t wait for step 2. The GSI update is happening, but your query might run before it completes.

Pattern 1: Read-After-Write via Main Table

When you need immediate consistency after a write, query the main table:

// ✅ CORRECT: Use main table for immediate consistency
async function createAndVerifyRepo(owner: string, repoName: string) {
  // Create the repository
  await RepoEntity.build(PutItemCommand)
    .item({
      owner,
      repo_name: repoName,
      description: 'New repository',
      is_private: false
    })
    .send();

  // Get directly from main table (strongly consistent)
  const repo = await RepoEntity.build(GetItemCommand)
    .key({ owner, repo_name: repoName })
    .send();

  // This WILL return the item immediately
  return repo.Item;
}

Pattern 2: Optimistic UI Updates

For user-facing operations, show the change immediately and sync in the background:

// ✅ CORRECT: Optimistic update pattern
async function starRepo(userName: string, repoOwner: string, repoName: string) {
  // Create the star
  await StarEntity.build(PutItemCommand)
    .item({
      username: userName,
      repo_owner: repoOwner,
      repo_name: repoName
    })
    .send();

  // Return success immediately - don't wait for GSI
  return {
    success: true,
    // Client shows star immediately
    // Background sync will confirm later
  };
}

// Separate query for "starred repos list" that tolerates slight lag
async function getStarredRepos(userName: string) {
  const response = await GitHubTable.build(QueryCommand)
    .entities(StarEntity)
    .query({
      partition: `ACCOUNT#${userName}`,
      range: { beginsWith: 'STAR#' }
    })
    .send();

  // This might not include the most recent star
  // That's okay - it'll appear within 1 second
  return response.Items || [];
}

Pattern 3: Retry with Backoff

For critical operations where you MUST confirm the item appears in a GSI:

// ✅ CORRECT: Retry until GSI catches up
async function waitForGSI<T>(
  queryFn: () => Promise<T[]>,
  validateFn: (items: T[]) => boolean,
  maxAttempts: number = 5
): Promise<T[]> {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const items = await queryFn();

    if (validateFn(items)) {
      return items;
    }

    if (attempt < maxAttempts) {
      // Exponential backoff: 50ms, 100ms, 200ms, 400ms
      await new Promise(resolve =>
        setTimeout(resolve, 50 * Math.pow(2, attempt - 1))
      );
    }
  }

  throw new Error('GSI consistency timeout - item not found after retries');
}

// Usage
async function createIssueAndListByStatus(
  owner: string,
  repoName: string,
  title: string
) {
  const issue = await IssueEntity.build(PutItemCommand)
    .item({
      owner,
      repo_name: repoName,
      issue_number: 1,
      title,
      status: 'open',
      author: 'maintainer'
    })
    .send();

  // Wait for issue to appear in GSI4 (status index)
  const openIssues = await waitForGSI(
    () => GitHubTable.build(QueryCommand)
      .entities(IssueEntity)
      .query({
        index: 'GSI4',
        partition: `REPO#${owner}#${repoName}`,
        range: { beginsWith: 'ISSUE#OPEN#' }
      })
      .send()
      .then(r => r.Items || []),
    (items) => items.some(i => i.issue_number === 1),
    5
  );

  return openIssues;
}

Pattern 4: Timestamp-Based Deduplication

Handle updates that might arrive out of order:

// ✅ CORRECT: Use timestamps to resolve conflicts
async function updateIssueStatus(
  owner: string,
  repoName: string,
  issueNumber: number,
  newStatus: 'open' | 'closed'
) {
  const now = Date.now();

  await IssueEntity.build(UpdateItemCommand)
    .key({ owner, repo_name: repoName, issue_number: issueNumber })
    .operations({
      set: {
        status: newStatus,
        updated_at: now,
        // GSI4 keys recalculated automatically via .link()
      }
    })
    .options({
      // Only update if our timestamp is newer
      condition: {
        or: [
          { attr: 'updated_at', exists: false },
          { attr: 'updated_at', lt: now }
        ]
      }
    })
    .send();
}

When to Use Strongly Consistent Reads

Main table queries can be strongly consistent, but it costs more:

// Standard: Eventually consistent (default, cheaper)
const issue = await IssueEntity.build(GetItemCommand)
  .key({ owner: 'aws', repo_name: 'toolkit', issue_number: 42 })
  .send();

// Strongly consistent (2x read capacity, slower)
const issueStrong = await IssueEntity.build(GetItemCommand)
  .key({ owner: 'aws', repo_name: 'toolkit', issue_number: 42 })
  .options({ consistent: true })
  .send();

Use strongly consistent reads when:

  • Financial transactions (payments, credits)
  • Access control decisions (permissions checks)
  • Critical business logic that can’t tolerate stale data

Skip it for:

  • User-facing lists (slight lag is fine)
  • Analytics and dashboards
  • Non-critical reads where 2x cost isn’t worth it

Testing for Eventual Consistency

Simulate GSI lag in your tests:

// Test helper: Simulate eventual consistency
async function createWithDelay<T>(
  entity: any,
  item: T,
  delayMs: number = 100
): Promise<void> {
  await entity.build(PutItemCommand).item(item).send();

  // Simulate GSI lag
  await new Promise(resolve => setTimeout(resolve, delayMs));
}

// Integration test
test('handles GSI eventual consistency', async () => {
  // Create repo with simulated delay
  await createWithDelay(RepoEntity, {
    owner: 'test-user',
    repo_name: 'test-repo',
    is_private: false
  }, 500);

  // GSI query might fail without retry logic
  const repos = await retryGSIQuery(() =>
    listReposByAccount('test-user')
  );

  expect(repos).toHaveLength(1);
});

The Reality

Eventual consistency isn’t a bug—it’s the tradeoff DynamoDB makes for its performance and availability guarantees. Your code needs to handle it:

  • Use main table queries when you need immediate consistency
  • Use optimistic updates for better user experience
  • Add retries for critical GSI queries
  • Accept slight lag for non-critical reads

Get this wrong and your application will have subtle, hard-to-reproduce bugs. Get it right and your users won’t even notice the delay.

Cost Implications You Need to Know

Single table design isn’t always cheaper. In fact, it can be MORE expensive if you’re not careful.

Where Costs Spiral

1. Denormalization = Larger Items

// Normalized (SQL mindset)
{
  "userId": "user-123"  // 8 bytes
}

// Denormalized (DynamoDB mindset)
{
  "user": {
    "id": "user-123",
    "name": "John Doe",
    "email": "[email protected]",
    "avatarUrl": "https://...",
    "reputation": 1523
  }  // 200+ bytes
}

Denormalization trades storage costs for query performance. Sometimes that’s worth it. Sometimes it’s not.

2. Multiple GSIs = Multiple Writes

Every write to your main table also writes to EVERY GSI. With 5 GSIs, one put operation = 6 writes = 6x the cost.

// This single put triggers writes to:
// - Main table
// - GSI1
// - GSI2
// - GSI3
// - GSI4
await RepoEntity.build(PutItemCommand)
  .item({
    owner: 'aws',
    repo_name: 'toolkit',
    description: 'A toolkit',
    is_private: false
  })
  .send();

3. GSI Projections Double Your Storage

// ❌ EXPENSIVE: ALL projection
const ExpensiveTable = new Table({
  indexes: {
    GSI1: {
      type: 'global',
      partitionKey: { name: 'GSI1PK', type: 'string' },
      sortKey: { name: 'GSI1SK', type: 'string' },
      projection: 'all'  // Every attribute duplicated!
    }
  }
});

// ✅ CHEAPER: Only project what you need
const EfficientTable = new Table({
  indexes: {
    GSI1: {
      type: 'global',
      partitionKey: { name: 'GSI1PK', type: 'string' },
      sortKey: { name: 'GSI1SK', type: 'string' },
      projection: 'keys_only'  // Minimal storage
    }
  }
});

With projection: 'all', you’re storing every attribute twice—once in the main table, once in the GSI. Double the storage = double the cost.

Cost Comparison

For a medium-scale app (100K users, 1M requests/day):

Aspect Multiple Tables Single Table (Optimized) Single Table (Naive)
Storage $50/month $75/month $150/month
Write costs $100/month $150/month $500/month
Read costs $200/month $100/month $100/month
GSI costs N/A $50/month $300/month
Total $350/month $375/month $1,050/month

Single table CAN be cheaper (fewer round-trips), but only if you’re careful with:

  • GSI count (use sparingly)
  • GSI projections (keys_only when possible)
  • Denormalization (only when it serves a query)

Advanced Query Patterns

Some patterns that come up in production:

Batch Operations

import { BatchGetCommand, BatchWriteCommand } from 'dynamodb-toolbox';

// Fetch multiple repos efficiently
async function getMultipleRepos(repos: Array<{owner: string, name: string}>) {
  const response = await RepoEntity.build(BatchGetCommand)
    .keys(repos.map(r => ({ owner: r.owner, repo_name: r.name })))
    .send();

  return response.Items;
}

// Batch writes across entities
async function batchOperations() {
  await GitHubTable.build(BatchWriteCommand)
    .addPutRequest(RepoEntity, {
      item: {
        owner: 'aws',
        repo_name: 'new-tool',
        description: 'A new tool'
      }
    })
    .addPutRequest(IssueEntity, {
      item: {
        owner: 'aws',
        repo_name: 'new-tool',
        issue_number: 1,
        title: 'Initial setup',
        status: 'open',
        author: 'maintainer'
      }
    })
    .send();
}

Conditional Writes

import { TransactWriteCommand, DynamoDBToolboxError } from 'dynamodb-toolbox';

async function starRepository(userName: string, repoOwner: string, repoName: string) {
  try {
    await GitHubTable.build(TransactWriteCommand)
      .addWrite(
        StarEntity.build(PutItemCommand)
          .item({
            username: userName,
            repo_owner: repoOwner,
            repo_name: repoName
          })
          .options({
            condition: {
              attr: 'PK',
              exists: false  // Prevent duplicate stars
            }
          })
      )
      .send();

    return { success: true };
  } catch (error: any) {
    if (error instanceof DynamoDBToolboxError) {
      return { success: false, reason: 'Validation error' };
    }
    if (error.code === 'TransactionCanceledException') {
      return { success: false, reason: 'Already starred' };
    }
    throw error;
  }
}

The Scan That Wasn’t Planned For

// What happens when you need a pattern you didn't design for
async function findReposByLanguage(language: string) {
  console.warn('⚠️  Scanning entire table - this is expensive!');

  const response = await GitHubTable.build(ScanCommand)
    .options({
      filters: {
        attr: 'language',
        eq: language
      }
    })
    .send();

  // Filter by entity type in memory
  return response.Items?.filter(item =>
    item.entity === 'Repository' && item.language === language
  ) || [];

  // This examines EVERY item in your table
  // At 1GB: slow but works
  // At 1TB: timeout, massive cost, angry users
}

If you find yourself scanning regularly, you didn’t plan your access patterns properly. Add a GSI or reconsider your design.

The Bottom Line

After building GitHub’s backend and being honest about the production realities, here’s my final framework:

When Single Table Design Works

  • Access patterns are stable - Requirements won’t change every sprint
  • Performance is critical - Sub-10ms queries at scale matter
  • Team is committed - Everyone understands the patterns
  • You’ve planned for hot partitions - Sharding strategies are in place
  • Migration strategy exists - You know how you’ll handle schema changes

When to Walk Away

  • You’re constantly scanning - If you can’t design efficient queries, wrong tool
  • Requirements change weekly - Flexibility matters more than performance
  • Team is struggling - Better to use tools people understand
  • Migration is frequent - Schema evolution is too painful
  • Cost is spiraling - Naive single table can be more expensive than multiple tables

The Philosophical Truth

“Single table design is learnable, but it will always feel slightly alien compared to relational modeling. That’s not a bug—it’s the tradeoff you’re making for predictable performance at scale.”

Here’s what I’ve learned after years of working with DynamoDB:

The Good:

  • Predictable performance at any scale
  • Lower read costs (fewer round-trips)
  • Forces you to think about access patterns upfront
  • Works brilliantly when done right

The Bad:

  • Migrations are genuinely painful
  • New developers struggle with the patterns
  • Easy to create hot partitions by accident
  • Debugging is harder than SQL

The Reality:

  • Use PostgreSQL for 90% of applications
  • Use DynamoDB when you need its specific guarantees
  • Use the hybrid approach (DynamoDB + data warehouse) when you need both performance and flexibility
  • Never use DynamoDB just because it’s “web scale”

If you’ve made it through all three parts, you now understand single table design better than most engineers using DynamoDB in production. You know the patterns that work, the gotchas that kill projects, and most importantly—when to use this approach and when to walk away.

Series Complete

Thanks for following along! If you found this series helpful, I’d love to hear about your experiences with DynamoDB, both successes and failures. The best way to learn is by sharing war stories.


Series Navigation


Additional Resources

DynamoDB Deep Dives

DynamoDB Toolbox

Real-World Case Studies

Anti-Patterns