In Part 1, we covered why DynamoDB forces you to think differently—you design for access patterns, not flexibility. Now let’s put that into practice by building something real: GitHub’s backend.
This implementation is based on the GitHub metadata example from The DynamoDB Book by Alex DeBrie—widely regarded as the definitive guide to DynamoDB single table design. Alex’s book walks through this example using raw AWS SDK code. Here, I’m adapting those same battle-tested patterns to show how they work with DynamoDB Toolbox v2, along with some variations in the design.
We’re modeling the whole data structure—repositories, users, organizations, issues, pull requests, and stars. All the relationships you’d normally handle with JOINs and foreign keys, except we’re doing it in a single DynamoDB table. These aren’t toy patterns; this is how you’d actually build something like GitHub’s API with DynamoDB.
If you’re serious about learning DynamoDB, buy Alex’s book. It’s the best money you’ll spend on understanding this database. What I’m showing here is how to implement those concepts with modern tooling and type-safe code you can actually run.
Table of Contents
- Why We Need Better Tooling
- The Modeling Process
- Setting Up DynamoDB Toolbox
- Implementing the Data Model
- Query Patterns That Actually Work
- Working Example
- What’s Next
Why We Need Better Tooling
Before we dive into the implementation, let me show you why raw DynamoDB code becomes unmaintainable fast. Here’s creating a simple repository:
// Raw DynamoDB SDK - this is what you're avoiding
const putRepo = {
TableName: 'GitHub',
Item: {
pk: { S: 'REPO#aws#dynamodb-toolbox' },
sk: { S: 'REPO#aws#dynamodb-toolbox' },
GSI1PK: { S: 'REPO#aws#dynamodb-toolbox' },
GSI1SK: { S: 'REPO#aws#dynamodb-toolbox' },
GSI2PK: { S: 'ACCOUNT#aws' },
GSI2SK: { S: '#2024-12-24T10:30:00Z' },
entityType: { S: 'Repo' },
owner: { S: 'aws' },
name: { S: 'dynamodb-toolbox' },
description: { S: 'A set of tools for working with DynamoDB' },
stars: { N: '1234' },
createdAt: { S: '2024-01-15T08:00:00Z' },
updatedAt: { S: '2024-12-24T10:30:00Z' }
}
};
Every attribute needs type annotations ({ S: value }, { N: value }). Keys are manually computed. There’s no type safety. Change one key format and you break queries across your codebase. Teams end up building their own abstraction layers, each slightly different.
DynamoDB Toolbox solves this. It’s not an ORM—it doesn’t try to make DynamoDB look like SQL. Instead, it provides a type-safe way to define entities while embracing DynamoDB’s patterns.
The Modeling Process
Remember from Part 1—you can’t just wing it with DynamoDB. You need a disciplined process.
Step 1: Create an Entity-Relationship Diagram
Yes, you still create an ERD like with relational databases. But these entities become different item types within your single table:
erDiagram
User ||--o{ Repo : owns
Organization ||--o{ Repo : owns
User ||--o{ Organization : memberOf
Repo ||--o{ Issue : contains
Repo ||--o{ PullRequest : contains
Issue ||--o{ IssueComment : has
PullRequest ||--o{ PRComment : has
User ||--o{ Reaction : creates
Issue ||--o{ Reaction : receives
PullRequest ||--o{ Reaction : receives
IssueComment ||--o{ Reaction : receives
PRComment ||--o{ Reaction : receives
User ||--o{ Star : stars
Repo ||--o{ Star : receivedBy
Repo ||--o{ Fork : forkedFrom
Repo ||--o{ Fork : forkedTo
Step 2: List Every Access Pattern
This is the step that doesn’t exist in relational modeling. You enumerate every way your application will access data:
| Entity | Access Pattern | Query Type |
|---|---|---|
| Repository | Get a repository by owner and name | Direct get |
| List all repositories for an account (user or org) | Query GSI3 | |
| List repositories sorted by update time | Query GSI3 with sort | |
| Find forks of a repository | Query GSI2 | |
| Issue | Get an issue by repo and number | Direct get |
| List all issues for a repository | Query GSI1 | |
| List open issues for a repository (newest first) | Query GSI4 with beginsWith | |
| List closed issues for a repository | Query GSI4 with beginsWith | |
| Star | List repositories a user has starred | Query main table |
| List users who starred a repository | Query GSI1 | |
| Check if user has starred a repo | Direct get |
Write these down. Be specific. This list becomes your contract.
Step 3: Design Your Keys
Now comes the interesting part. You organize data to enable your access patterns. Each entity gets its own unique partition key for direct access, and we use Global Secondary Indexes (GSIs) to enable different query patterns.
Here’s our key design:
| Entity | Partition Key (PK) | Sort Key (SK) | Why This Design |
|---|---|---|---|
| Repo | REPO#<owner>#<name> |
REPO#<owner>#<name> |
Unique identifier |
| Issue | ISSUE#<owner>#<name>#<padded_number> |
ISSUE#<owner>#<name>#<padded_number> |
Unique identifier, enables direct access |
| PR | PR#<owner>#<name>#<padded_number> |
PR#<owner>#<name>#<padded_number> |
Unique identifier, enables direct access |
| IssueComment | ISSUECOMMENT#<owner>#<name>#<issue_number> |
ISSUECOMMENT#<comment_id> |
Groups comments by issue |
| PRComment | PRCOMMENT#<owner>#<name>#<pr_number> |
PRCOMMENT#<comment_id> |
Groups comments by PR |
| Reaction | <type>REACTION#<owner>#<name>#<target_id>#<user> |
<type>REACTION#<owner>#<name>#<target_id>#<user> |
Unique per user reaction |
| User | ACCOUNT#<username> |
ACCOUNT#<username> |
Unique identifier |
| Org | ACCOUNT#<orgname> |
ACCOUNT#<orgname> |
Unique identifier |
| Membership | ACCOUNT#<orgname> |
MEMBERSHIP#<username> |
Lives with org, lists members |
| Gist | ACCOUNT#<username> |
GIST#<gist_id> |
Lives with user, lists gists |
| Fork | REPO#<original_owner>#<name> |
FORK#<fork_owner> |
Lives with original repo |
Each entity has its own partition key for efficient direct lookups. We use GSIs to enable collection queries like “list all issues for a repo” or “list all repos for an account”.
Step 4: Add Global Secondary Indexes
Your primary key design handles many patterns, but not all. GSIs give you alternative ways to query your data. The trick? Overload them just like your main table:
| GSI | Purpose | Example Entities | Example Query |
|---|---|---|---|
| GSI1 | Entity collections by parent | Issues by repo, PRs by repo, Apps by account | “List all issues for this repo” |
| GSI2 | Repo lookups and forks | Repo self-reference, Forks by original repo | “Find all forks of this repo” |
| GSI3 | Account queries with timestamps | Repos by account (sorted by update time), Account lookups | “List repos owned by aws, newest first” |
| GSI4 | Status-based filtering | Open/Closed issues and PRs | “Show open issues for this repo, newest first” |
Setting Up DynamoDB Toolbox
Let’s start with the foundation. First, install the dependencies:
npm install dynamodb-toolbox @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb
Now define the table structure:
import { Table } from 'dynamodb-toolbox/table';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient } from '@aws-sdk/lib-dynamodb';
// Define the table structure (without client initially)
const GitHubTable = new Table({
name: 'GitHubTable',
partitionKey: { name: 'PK', type: 'string' },
sortKey: { name: 'SK', type: 'string' },
indexes: {
GSI1: {
type: 'global',
partitionKey: { name: 'GSI1PK', type: 'string' },
sortKey: { name: 'GSI1SK', type: 'string' }
},
GSI2: {
type: 'global',
partitionKey: { name: 'GSI2PK', type: 'string' },
sortKey: { name: 'GSI2SK', type: 'string' }
},
GSI3: {
type: 'global',
partitionKey: { name: 'GSI3PK', type: 'string' },
sortKey: { name: 'GSI3SK', type: 'string' }
},
GSI4: {
type: 'global',
partitionKey: { name: 'GSI4PK', type: 'string' },
sortKey: { name: 'GSI4SK', type: 'string' }
}
}
});
// Initialize with DynamoDB client
const client = new DynamoDBClient({});
GitHubTable.documentClient = DynamoDBDocumentClient.from(client, {
marshallOptions: {
removeUndefinedValues: true,
convertEmptyValues: false
}
});
Notice the generic names: PK, SK, GSI1PK, etc. This is the single table design pattern—one physical schema supports multiple logical entity types.
Implementing the Data Model
Now let’s implement our entities. DynamoDB Toolbox v2 uses a clean schema API with linked keys.
Repository Entity
import { Entity } from 'dynamodb-toolbox/entity';
import { item } from 'dynamodb-toolbox/schema/item';
import { string } from 'dynamodb-toolbox/schema/string';
import { number } from 'dynamodb-toolbox/schema/number';
import { boolean } from 'dynamodb-toolbox/schema/boolean';
const RepoEntity = new Entity({
name: 'Repository',
table: GitHubTable,
schema: item({
// Business attributes
owner: string()
.required()
.validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
.key(),
repo_name: string()
.required()
.validate((value: string) => /^[a-zA-Z0-9_.-]+$/.test(value))
.key(),
description: string().optional(),
is_private: boolean().required().default(false),
language: string().optional()
}).and((_schema) => ({
// DynamoDB keys - automatically computed from business attributes
PK: string()
.key()
.link<typeof _schema>(
({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
),
SK: string()
.key()
.link<typeof _schema>(
({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
),
// GSI1: Repo self-reference
GSI1PK: string().link<typeof _schema>(
({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
),
GSI1SK: string().link<typeof _schema>(
({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
),
// GSI2: Repo self-reference
GSI2PK: string().link<typeof _schema>(
({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
),
GSI2SK: string().link<typeof _schema>(
({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
),
// GSI3: Query repos by account with timestamp ordering
GSI3PK: string().link<typeof _schema>(
({ owner }) => `ACCOUNT#${owner}`
),
GSI3SK: string()
.default(() => new Date().toISOString())
.savedAs('GSI3SK')
}))
} as const);
Look at what this gives you:
- Type safety: TypeScript knows exactly what fields are required
- Linked keys: Keys are automatically computed using
.link() - Automatic timestamps: GSI3SK timestamp set via
.default()for sorting - Validation: Regex validation on owner and repo names
- Clean API: No manual type conversions
Note: In our implementation, created and modified timestamps are managed by the Entity layer (e.g., RepositoryEntity), not in the DynamoDB schema. This keeps business logic separate from storage concerns.
Understanding the Schema API
The .and() pattern separates business logic from DynamoDB internals:
schema: item({
// Your business domain - what your app cares about
owner: string().required().key(),
repo_name: string().required().key(),
description: string().optional()
}).and((_schema) => ({
// DynamoDB keys - infrastructure concerns
PK: string().key().link<typeof _schema>(
({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
)
}))
The _schema type reference makes .link() type-safe. Change your business attributes and TypeScript will catch broken key computations.
Issue Entity with Smart Status Filtering
Here’s where it gets interesting. We want to efficiently query issues by status (open/closed):
import { set } from 'dynamodb-toolbox/schema/set';
const IssueEntity = new Entity({
name: 'Issue',
table: GitHubTable,
schema: item({
owner: string()
.required()
.validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
.key(),
repo_name: string()
.required()
.validate((value: string) => /^[a-zA-Z0-9_.-]+$/.test(value))
.key(),
issue_number: number().required().key(),
title: string()
.required()
.validate((value: string) => value.length <= 255),
body: string().optional(),
status: string().required().default('open'),
author: string().required(),
assignees: set(string()).optional(),
labels: set(string()).optional()
}).and((_schema) => ({
// Main table: unique PK per issue
PK: string()
.key()
.link<typeof _schema>(
({ owner, repo_name, issue_number }) =>
`ISSUE#${owner}#${repo_name}#${String(issue_number).padStart(8, '0')}`
),
SK: string()
.key()
.link<typeof _schema>(
({ owner, repo_name, issue_number }) =>
`ISSUE#${owner}#${repo_name}#${String(issue_number).padStart(8, '0')}`
),
// GSI1: List all issues for a repo
GSI1PK: string().link<typeof _schema>(
({ owner, repo_name }) => `ISSUE#${owner}#${repo_name}`
),
GSI1SK: string().link<typeof _schema>(
({ issue_number }) => `ISSUE#${String(issue_number).padStart(8, '0')}`
),
// GSI4: Optimized for status queries with newest-first sorting
GSI4PK: string().link<typeof _schema>(
({ owner, repo_name }) => `ISSUE#${owner}#${repo_name}`
),
GSI4SK: string().link<typeof _schema>(({ issue_number, status }) => {
if (status === 'open') {
// Reverse numbering: higher numbers come first
const reverseNumber = String(99999999 - issue_number).padStart(8, '0');
return `ISSUE#OPEN#${reverseNumber}`;
}
// Closed issues sort after open (# prefix sorts last)
const paddedNumber = String(issue_number).padStart(8, '0');
return `#ISSUE#CLOSED#${paddedNumber}`;
})
}))
} as const);
The GSI4SK pattern is clever:
- Open issues use reverse numbering so newer issues appear first
- The
#prefix for closed issues ensures they always sort after open issues - One query can return both open and closed issues in the right order
User and Organization Entities
Both are “accounts” in GitHub’s model:
const UserEntity = new Entity({
name: 'User',
table: GitHubTable,
schema: item({
username: string()
.required()
.validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
.key(),
email: string()
.required()
.validate((value: string) => /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(value)),
bio: string().optional(),
payment_plan_id: string().optional()
}).and((_schema) => ({
PK: string()
.key()
.link<typeof _schema>(({ username }) => `ACCOUNT#${username}`),
SK: string()
.key()
.link<typeof _schema>(({ username }) => `ACCOUNT#${username}`),
GSI1PK: string().link<typeof _schema>(
({ username }) => `ACCOUNT#${username}`
),
GSI1SK: string().link<typeof _schema>(
({ username }) => `ACCOUNT#${username}`
)
}))
} as const);
const OrganizationEntity = new Entity({
name: 'Organization',
table: GitHubTable,
schema: item({
org_name: string()
.required()
.validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
.key(),
description: string().optional(),
payment_plan_id: string().optional()
}).and((_schema) => ({
PK: string()
.key()
.link<typeof _schema>(({ org_name }) => `ACCOUNT#${org_name}`),
SK: string()
.key()
.link<typeof _schema>(({ org_name }) => `ACCOUNT#${org_name}`),
GSI1PK: string().link<typeof _schema>(
({ org_name }) => `ACCOUNT#${org_name}`
),
GSI1SK: string().link<typeof _schema>(
({ org_name }) => `ACCOUNT#${org_name}`
)
}))
} as const);
Notice they share the same key pattern (ACCOUNT#<name>). This is intentional—repos don’t care if their owner is a user or an org.
Star Entity - The Many-to-Many Pattern
Stars represent a many-to-many relationship. We use the adjacency list pattern:
const StarEntity = new Entity({
name: 'Star',
table: GitHubTable,
schema: item({
user_name: string().required().key(),
repo_owner: string().required().key(),
repo_name: string().required().key(),
starred_at: string()
.default(() => new Date().toISOString())
.savedAs('starred_at')
}).and((_schema) => ({
// Direction 1: User -> Repos they've starred
PK: string()
.key()
.link<typeof _schema>(({ user_name }) => `ACCOUNT#${user_name}`),
SK: string()
.key()
.link<typeof _schema>(
({ repo_owner, repo_name, starred_at }) =>
`STAR#${repo_owner}#${repo_name}#${starred_at}`
),
// Direction 2: Repo -> Users who starred it (via GSI)
GSI1PK: string().link<typeof _schema>(
({ repo_owner, repo_name }) => `REPO#${repo_owner}#${repo_name}`
),
GSI1SK: string().link<typeof _schema>(
({ user_name, starred_at }) => `STAR#${user_name}#${starred_at}`
)
}))
} as const);
The beauty: query in either direction efficiently. Want repos a user starred? Query the main table. Want users who starred a repo? Query GSI1.
Query Patterns That Actually Work
Now for the payoff. Let’s implement GitHub’s actual access patterns.
Creating Items
import { PutItemCommand } from 'dynamodb-toolbox';
// Create a repository - keys computed automatically
await RepoEntity.build(PutItemCommand)
.item({
owner: 'aws',
repo_name: 'dynamodb-toolbox',
description: 'A set of tools for working with DynamoDB',
is_private: false,
language: 'TypeScript'
})
.send();
// Create an issue - lives with its repo
await IssueEntity.build(PutItemCommand)
.item({
owner: 'aws',
repo_name: 'dynamodb-toolbox',
issue_number: 42,
title: 'Add TypeScript support',
status: 'open',
author: 'developer123'
})
.send();
No manual key generation. No type conversions. Just business logic.
Listing Issues for a Repository
With our design, each entity has its own partition key. To list all issues for a repo, we use GSI1:
import { QueryCommand } from 'dynamodb-toolbox';
async function getIssuesForRepo(owner: string, repoName: string) {
// Query GSI1 to find all issues for this repo
const response = await GitHubTable.build(QueryCommand)
.entities(IssueEntity) // Type-safe entity filtering
.query({
index: 'GSI1',
partition: `ISSUE#${owner}#${repoName}`
})
.options({
limit: 50
})
.send();
return response.Items || [];
}
// Usage
const issues = await getIssuesForRepo('aws', 'dynamodb-toolbox');
console.log(`Found ${issues.length} issues`);
The .entities() method provides type safety and automatic entity parsing. TypeScript knows exactly what fields are available on each issue.
Querying by Status
async function getOpenIssues(owner: string, repoName: string) {
const response = await GitHubTable.build(QueryCommand)
.entities(IssueEntity) // Type-safe entity filtering
.query({
index: 'GSI4',
partition: `ISSUE#${owner}#${repoName}`,
range: {
beginsWith: 'ISSUE#OPEN#'
}
})
.options({
limit: 20
})
.send();
return response.Items || [];
}
The reverse numbering in GSI4SK means you get the most recent open issues first. The .entities() method ensures type safety and automatic entity parsing.
Many-to-Many Queries
async function getUserStarredRepos(username: string) {
// Direction 1: User -> Repos
const response = await GitHubTable.build(QueryCommand)
.query({
partition: `ACCOUNT#${username}`,
range: {
beginsWith: 'STAR#'
}
})
.send();
return response.Items;
}
async function getRepoStargazers(owner: string, repoName: string) {
// Direction 2: Repo -> Users (via GSI1)
const response = await GitHubTable.build(QueryCommand)
.query({
index: 'GSI1',
partition: `REPO#${owner}#${repoName}`,
range: {
beginsWith: 'STAR#'
}
})
.send();
return response.Items;
}
Both directions work efficiently because we designed for them upfront.
Pagination Done Right
async function listReposPaginated(accountName: string, pageSize: number = 20) {
const response = await GitHubTable.build(QueryCommand)
.entities(RepoEntity)
.query({
index: 'GSI3',
partition: `ACCOUNT#${accountName}`,
range: { beginsWith: '#' } // Only repos (timestamps start with #)
})
.options({
limit: pageSize,
reverse: true // Newest repos first (by GSI3SK timestamp)
})
.send();
return {
items: response.Items || [],
// URL-safe base64 encode the continuation token
nextPageToken: response.LastEvaluatedKey
? encodeURIComponent(btoa(JSON.stringify(response.LastEvaluatedKey)))
: undefined
};
}
// Next page
async function getNextPage(accountName: string, pageToken: string) {
const lastKey = JSON.parse(atob(decodeURIComponent(pageToken)));
const response = await GitHubTable.build(QueryCommand)
.entities(RepoEntity)
.query({
index: 'GSI3',
partition: `ACCOUNT#${accountName}`,
range: { beginsWith: '#' }
})
.options({
exclusiveStartKey: lastKey,
reverse: true
})
.send();
return response.Items || [];
}
URL-safe base64 encoding prevents clients from tampering with pagination tokens while ensuring they work in query parameters.
What Real Items Look Like
To make this concrete, here’s what’s actually stored in the table:
// Item 1: Repository
{
"PK": "REPO#aws#dynamodb-toolbox",
"SK": "REPO#aws#dynamodb-toolbox",
"entity": "Repository",
"owner": "aws",
"repo_name": "dynamodb-toolbox",
"description": "Toolbox for DynamoDB",
"is_private": false,
"language": "TypeScript",
"GSI1PK": "REPO#aws#dynamodb-toolbox",
"GSI1SK": "REPO#aws#dynamodb-toolbox",
"GSI2PK": "REPO#aws#dynamodb-toolbox",
"GSI2SK": "REPO#aws#dynamodb-toolbox",
"GSI3PK": "ACCOUNT#aws",
"GSI3SK": "#2024-01-15T08:00:00.000Z"
}
// Item 2: Issue (separate partition key)
{
"PK": "ISSUE#aws#dynamodb-toolbox#00000042",
"SK": "ISSUE#aws#dynamodb-toolbox#00000042",
"entity": "Issue",
"owner": "aws",
"repo_name": "dynamodb-toolbox",
"issue_number": 42,
"title": "Add TypeScript support",
"body": "We should add full TypeScript type definitions",
"status": "open",
"author": "developer123",
"labels": ["enhancement", "typescript"],
"GSI1PK": "ISSUE#aws#dynamodb-toolbox",
"GSI1SK": "ISSUE#00000042",
"GSI4PK": "ISSUE#aws#dynamodb-toolbox",
"GSI4SK": "ISSUE#OPEN#99999958"
}
// Item 3: Star (many-to-many relationship)
{
"PK": "ACCOUNT#john",
"SK": "STAR#aws#dynamodb-toolbox#2024-12-01T10:00:00Z",
"entity": "Star",
"user_name": "john",
"repo_owner": "aws",
"repo_name": "dynamodb-toolbox",
"starred_at": "2024-12-01T10:00:00Z",
"GSI1PK": "REPO#aws#dynamodb-toolbox",
"GSI1SK": "STAR#john#2024-12-01T10:00:00Z"
}
Notice how each entity has its own unique partition key. The repository is REPO#aws#dynamodb-toolbox while the issue is ISSUE#aws#dynamodb-toolbox#00000042. We use GSIs to query collections—for example, GSI1 lets us list all issues for a repo by querying with GSI1PK = ISSUE#aws#dynamodb-toolbox.
Working Example
All the code from this post is available in a working repository: github-ddb
The repository includes:
- Complete entity definitions with DynamoDB Toolbox v2
- Working query examples for all access patterns
- Type-safe implementations
- Tests demonstrating the patterns in action
Clone it, run the examples, and experiment with the patterns. Seeing the code run makes the concepts click.
What’s Next
You’ve now seen single table design in action. We’ve implemented GitHub’s core data model with proper access patterns, type safety, and clean code.
But we’re not done. In Part 3, we’ll tackle the hard problems:
- Hot partitions and how to prevent them
- Migrations when requirements change
- Debugging your overloaded table
- Cost optimization strategies
- When to walk away from single table design
These are the production realities that separate successful DynamoDB implementations from disasters. See you in Part 3.
Continue to Part 3: Advanced Patterns and Production Realities →
Series Navigation
- Part 1: When DynamoDB Stops Being Simple
- Part 2: Building GitHub’s Backend in DynamoDB
- Part 3: Advanced Patterns and Production Realities
Additional Resources
Code Examples
- github-ddb Repository - Working implementation of all patterns from this post