Relational DynamoDB: Building GitHub’s Backend in DynamoDB

In Part 1, we covered why DynamoDB forces you to think differently—you design for access patterns, not flexibility. Now let’s put that into practice by building something real: GitHub’s backend.

This implementation is based on the GitHub metadata example from The DynamoDB Book by Alex DeBrie—widely regarded as the definitive guide to DynamoDB single table design. Alex’s book walks through this example using raw AWS SDK code. Here, I’m adapting those same battle-tested patterns to show how they work with DynamoDB Toolbox v2, along with some variations in the design.

We’re modeling the whole data structure—repositories, users, organizations, issues, pull requests, and stars. All the relationships you’d normally handle with JOINs and foreign keys, except we’re doing it in a single DynamoDB table. These aren’t toy patterns; this is how you’d actually build something like GitHub’s API with DynamoDB.

If you’re serious about learning DynamoDB, buy Alex’s book. It’s the best money you’ll spend on understanding this database. What I’m showing here is how to implement those concepts with modern tooling and type-safe code you can actually run.

Why We Need Better Tooling
The Modeling Process
Setting Up DynamoDB Toolbox
Implementing the Data Model
Query Patterns That Actually Work
Working Example
What’s Next

Why We Need Better Tooling

Before we dive into the implementation, let me show you why raw DynamoDB code becomes unmaintainable fast. Here’s creating a simple repository:

// Raw DynamoDB SDK - this is what you're avoiding
const putRepo = {
  TableName: 'GitHub',
  Item: {
    pk: { S: 'REPO#aws#dynamodb-toolbox' },
    sk: { S: 'REPO#aws#dynamodb-toolbox' },
    GSI1PK: { S: 'REPO#aws#dynamodb-toolbox' },
    GSI1SK: { S: 'REPO#aws#dynamodb-toolbox' },
    GSI2PK: { S: 'ACCOUNT#aws' },
    GSI2SK: { S: '#2024-12-24T10:30:00Z' },
    entityType: { S: 'Repo' },
    owner: { S: 'aws' },
    name: { S: 'dynamodb-toolbox' },
    description: { S: 'A set of tools for working with DynamoDB' },
    stars: { N: '1234' },
    createdAt: { S: '2024-01-15T08:00:00Z' },
    updatedAt: { S: '2024-12-24T10:30:00Z' }
  }
};

Every attribute needs type annotations ({ S: value }, { N: value }). Keys are manually computed. There’s no type safety. Change one key format and you break queries across your codebase. Teams end up building their own abstraction layers, each slightly different.

DynamoDB Toolbox solves this. It’s not an ORM—it doesn’t try to make DynamoDB look like SQL. Instead, it provides a type-safe way to define entities while embracing DynamoDB’s patterns.

The Modeling Process

Remember from Part 1—you can’t just wing it with DynamoDB. You need a disciplined process.

Step 1: Create an Entity-Relationship Diagram

Yes, you still create an ERD like with relational databases. But these entities become different item types within your single table:

erDiagram
    User ||--o{ Repo : owns
    Organization ||--o{ Repo : owns
    User ||--o{ Organization : memberOf
    Repo ||--o{ Issue : contains
    Repo ||--o{ PullRequest : contains
    Issue ||--o{ IssueComment : has
    PullRequest ||--o{ PRComment : has
    User ||--o{ Reaction : creates
    Issue ||--o{ Reaction : receives
    PullRequest ||--o{ Reaction : receives
    IssueComment ||--o{ Reaction : receives
    PRComment ||--o{ Reaction : receives
    User ||--o{ Star : stars
    Repo ||--o{ Star : receivedBy
    Repo ||--o{ Fork : forkedFrom
    Repo ||--o{ Fork : forkedTo

Step 2: List Every Access Pattern

This is the step that doesn’t exist in relational modeling. You enumerate every way your application will access data:

Entity	Access Pattern	Query Type
Repository	Get a repository by owner and name	Direct get
	List all repositories for an account (user or org)	Query GSI3
	List repositories sorted by update time	Query GSI3 with sort
	Find forks of a repository	Query GSI2
Issue	Get an issue by repo and number	Direct get
	List all issues for a repository	Query GSI1
	List open issues for a repository (newest first)	Query GSI4 with beginsWith
	List closed issues for a repository	Query GSI4 with beginsWith
Star	List repositories a user has starred	Query main table
	List users who starred a repository	Query GSI1
	Check if user has starred a repo	Direct get

Write these down. Be specific. This list becomes your contract.

Step 3: Design Your Keys

Now comes the interesting part. You organize data to enable your access patterns. Each entity gets its own unique partition key for direct access, and we use Global Secondary Indexes (GSIs) to enable different query patterns.

Here’s our key design:

Entity	Partition Key (PK)	Sort Key (SK)	Why This Design
Repo	`REPO#<owner>#<name>`	`REPO#<owner>#<name>`	Unique identifier
Issue	`ISSUE#<owner>#<name>#<padded_number>`	`ISSUE#<owner>#<name>#<padded_number>`	Unique identifier, enables direct access
PR	`PR#<owner>#<name>#<padded_number>`	`PR#<owner>#<name>#<padded_number>`	Unique identifier, enables direct access
IssueComment	`ISSUECOMMENT#<owner>#<name>#<issue_number>`	`ISSUECOMMENT#<comment_id>`	Groups comments by issue
PRComment	`PRCOMMENT#<owner>#<name>#<pr_number>`	`PRCOMMENT#<comment_id>`	Groups comments by PR
Reaction	`<type>REACTION#<owner>#<name>#<target_id>#<user>`	`<type>REACTION#<owner>#<name>#<target_id>#<user>`	Unique per user reaction
User	`ACCOUNT#<username>`	`ACCOUNT#<username>`	Unique identifier
Org	`ACCOUNT#<orgname>`	`ACCOUNT#<orgname>`	Unique identifier
Membership	`ACCOUNT#<orgname>`	`MEMBERSHIP#<username>`	Lives with org, lists members
Gist	`ACCOUNT#<username>`	`GIST#<gist_id>`	Lives with user, lists gists
Fork	`REPO#<original_owner>#<name>`	`FORK#<fork_owner>`	Lives with original repo

Each entity has its own partition key for efficient direct lookups. We use GSIs to enable collection queries like “list all issues for a repo” or “list all repos for an account”.

Step 4: Add Global Secondary Indexes

Your primary key design handles many patterns, but not all. GSIs give you alternative ways to query your data. The trick? Overload them just like your main table:

GSI	Purpose	Example Entities	Example Query
GSI1	Entity collections by parent	Issues by repo, PRs by repo, Apps by account	“List all issues for this repo”
GSI2	Repo lookups and forks	Repo self-reference, Forks by original repo	“Find all forks of this repo”
GSI3	Account queries with timestamps	Repos by account (sorted by update time), Account lookups	“List repos owned by aws, newest first”
GSI4	Status-based filtering	Open/Closed issues and PRs	“Show open issues for this repo, newest first”

Setting Up DynamoDB Toolbox

Let’s start with the foundation. First, install the dependencies:

npm install dynamodb-toolbox @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb

Now define the table structure:

import { Table } from 'dynamodb-toolbox/table';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient } from '@aws-sdk/lib-dynamodb';

// Define the table structure (without client initially)
const GitHubTable = new Table({
  name: 'GitHubTable',
  partitionKey: { name: 'PK', type: 'string' },
  sortKey: { name: 'SK', type: 'string' },
  indexes: {
    GSI1: {
      type: 'global',
      partitionKey: { name: 'GSI1PK', type: 'string' },
      sortKey: { name: 'GSI1SK', type: 'string' }
    },
    GSI2: {
      type: 'global',
      partitionKey: { name: 'GSI2PK', type: 'string' },
      sortKey: { name: 'GSI2SK', type: 'string' }
    },
    GSI3: {
      type: 'global',
      partitionKey: { name: 'GSI3PK', type: 'string' },
      sortKey: { name: 'GSI3SK', type: 'string' }
    },
    GSI4: {
      type: 'global',
      partitionKey: { name: 'GSI4PK', type: 'string' },
      sortKey: { name: 'GSI4SK', type: 'string' }
    }
  }
});

// Initialize with DynamoDB client
const client = new DynamoDBClient({});
GitHubTable.documentClient = DynamoDBDocumentClient.from(client, {
  marshallOptions: {
    removeUndefinedValues: true,
    convertEmptyValues: false
  }
});

Notice the generic names: PK, SK, GSI1PK, etc. This is the single table design pattern—one physical schema supports multiple logical entity types.

Implementing the Data Model

Now let’s implement our entities. DynamoDB Toolbox v2 uses a clean schema API with linked keys.

Repository Entity

import { Entity } from 'dynamodb-toolbox/entity';
import { item } from 'dynamodb-toolbox/schema/item';
import { string } from 'dynamodb-toolbox/schema/string';
import { number } from 'dynamodb-toolbox/schema/number';
import { boolean } from 'dynamodb-toolbox/schema/boolean';

const RepoEntity = new Entity({
  name: 'Repository',
  table: GitHubTable,
  schema: item({
    // Business attributes
    owner: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
      .key(),
    repo_name: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_.-]+$/.test(value))
      .key(),
    description: string().optional(),
    is_private: boolean().required().default(false),
    language: string().optional()
  }).and((_schema) => ({
    // DynamoDB keys - automatically computed from business attributes
    PK: string()
      .key()
      .link<typeof _schema>(
        ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
      ),
    SK: string()
      .key()
      .link<typeof _schema>(
        ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
      ),
    // GSI1: Repo self-reference
    GSI1PK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    GSI1SK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    // GSI2: Repo self-reference
    GSI2PK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    GSI2SK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    // GSI3: Query repos by account with timestamp ordering
    GSI3PK: string().link<typeof _schema>(
      ({ owner }) => `ACCOUNT#${owner}`
    ),
    GSI3SK: string()
      .default(() => new Date().toISOString())
      .savedAs('GSI3SK')
  }))
} as const);

Look at what this gives you:

Type safety: TypeScript knows exactly what fields are required
Linked keys: Keys are automatically computed using .link()
Automatic timestamps: GSI3SK timestamp set via .default() for sorting
Validation: Regex validation on owner and repo names
Clean API: No manual type conversions

Note: In our implementation, created and modified timestamps are managed by the Entity layer (e.g., RepositoryEntity), not in the DynamoDB schema. This keeps business logic separate from storage concerns.

Understanding the Schema API

The .and() pattern separates business logic from DynamoDB internals:

schema: item({
  // Your business domain - what your app cares about
  owner: string().required().key(),
  repo_name: string().required().key(),
  description: string().optional()
}).and((_schema) => ({
  // DynamoDB keys - infrastructure concerns
  PK: string().key().link<typeof _schema>(
    ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
  )
}))

The _schema type reference makes .link() type-safe. Change your business attributes and TypeScript will catch broken key computations.

Issue Entity with Smart Status Filtering

Here’s where it gets interesting. We want to efficiently query issues by status (open/closed):

import { set } from 'dynamodb-toolbox/schema/set';

const IssueEntity = new Entity({
  name: 'Issue',
  table: GitHubTable,
  schema: item({
    owner: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
      .key(),
    repo_name: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_.-]+$/.test(value))
      .key(),
    issue_number: number().required().key(),
    title: string()
      .required()
      .validate((value: string) => value.length <= 255),
    body: string().optional(),
    status: string().required().default('open'),
    author: string().required(),
    assignees: set(string()).optional(),
    labels: set(string()).optional()
  }).and((_schema) => ({
    // Main table: unique PK per issue
    PK: string()
      .key()
      .link<typeof _schema>(
        ({ owner, repo_name, issue_number }) =>
          `ISSUE#${owner}#${repo_name}#${String(issue_number).padStart(8, '0')}`
      ),
    SK: string()
      .key()
      .link<typeof _schema>(
        ({ owner, repo_name, issue_number }) =>
          `ISSUE#${owner}#${repo_name}#${String(issue_number).padStart(8, '0')}`
      ),
    // GSI1: List all issues for a repo
    GSI1PK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `ISSUE#${owner}#${repo_name}`
    ),
    GSI1SK: string().link<typeof _schema>(
      ({ issue_number }) => `ISSUE#${String(issue_number).padStart(8, '0')}`
    ),
    // GSI4: Optimized for status queries with newest-first sorting
    GSI4PK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `ISSUE#${owner}#${repo_name}`
    ),
    GSI4SK: string().link<typeof _schema>(({ issue_number, status }) => {
      if (status === 'open') {
        // Reverse numbering: higher numbers come first
        const reverseNumber = String(99999999 - issue_number).padStart(8, '0');
        return `ISSUE#OPEN#${reverseNumber}`;
      }
      // Closed issues sort after open (# prefix sorts last)
      const paddedNumber = String(issue_number).padStart(8, '0');
      return `#ISSUE#CLOSED#${paddedNumber}`;
    })
  }))
} as const);

The GSI4SK pattern is clever:

Open issues use reverse numbering so newer issues appear first
The # prefix for closed issues ensures they always sort after open issues
One query can return both open and closed issues in the right order

User and Organization Entities

Both are “accounts” in GitHub’s model:

const UserEntity = new Entity({
  name: 'User',
  table: GitHubTable,
  schema: item({
    username: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
      .key(),
    email: string()
      .required()
      .validate((value: string) => /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(value)),
    bio: string().optional(),
    payment_plan_id: string().optional()
  }).and((_schema) => ({
    PK: string()
      .key()
      .link<typeof _schema>(({ username }) => `ACCOUNT#${username}`),
    SK: string()
      .key()
      .link<typeof _schema>(({ username }) => `ACCOUNT#${username}`),
    GSI1PK: string().link<typeof _schema>(
      ({ username }) => `ACCOUNT#${username}`
    ),
    GSI1SK: string().link<typeof _schema>(
      ({ username }) => `ACCOUNT#${username}`
    )
  }))
} as const);

const OrganizationEntity = new Entity({
  name: 'Organization',
  table: GitHubTable,
  schema: item({
    org_name: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
      .key(),
    description: string().optional(),
    payment_plan_id: string().optional()
  }).and((_schema) => ({
    PK: string()
      .key()
      .link<typeof _schema>(({ org_name }) => `ACCOUNT#${org_name}`),
    SK: string()
      .key()
      .link<typeof _schema>(({ org_name }) => `ACCOUNT#${org_name}`),
    GSI1PK: string().link<typeof _schema>(
      ({ org_name }) => `ACCOUNT#${org_name}`
    ),
    GSI1SK: string().link<typeof _schema>(
      ({ org_name }) => `ACCOUNT#${org_name}`
    )
  }))
} as const);

Notice they share the same key pattern (ACCOUNT#<name>). This is intentional—repos don’t care if their owner is a user or an org.

Star Entity - The Many-to-Many Pattern

Stars represent a many-to-many relationship. We use the adjacency list pattern:

const StarEntity = new Entity({
  name: 'Star',
  table: GitHubTable,
  schema: item({
    user_name: string().required().key(),
    repo_owner: string().required().key(),
    repo_name: string().required().key(),
    starred_at: string()
      .default(() => new Date().toISOString())
      .savedAs('starred_at')
  }).and((_schema) => ({
    // Direction 1: User -> Repos they've starred
    PK: string()
      .key()
      .link<typeof _schema>(({ user_name }) => `ACCOUNT#${user_name}`),
    SK: string()
      .key()
      .link<typeof _schema>(
        ({ repo_owner, repo_name, starred_at }) =>
          `STAR#${repo_owner}#${repo_name}#${starred_at}`
      ),
    // Direction 2: Repo -> Users who starred it (via GSI)
    GSI1PK: string().link<typeof _schema>(
      ({ repo_owner, repo_name }) => `REPO#${repo_owner}#${repo_name}`
    ),
    GSI1SK: string().link<typeof _schema>(
      ({ user_name, starred_at }) => `STAR#${user_name}#${starred_at}`
    )
  }))
} as const);

The beauty: query in either direction efficiently. Want repos a user starred? Query the main table. Want users who starred a repo? Query GSI1.

Query Patterns That Actually Work

Now for the payoff. Let’s implement GitHub’s actual access patterns.

Creating Items

import { PutItemCommand } from 'dynamodb-toolbox';

// Create a repository - keys computed automatically
await RepoEntity.build(PutItemCommand)
  .item({
    owner: 'aws',
    repo_name: 'dynamodb-toolbox',
    description: 'A set of tools for working with DynamoDB',
    is_private: false,
    language: 'TypeScript'
  })
  .send();

// Create an issue - lives with its repo
await IssueEntity.build(PutItemCommand)
  .item({
    owner: 'aws',
    repo_name: 'dynamodb-toolbox',
    issue_number: 42,
    title: 'Add TypeScript support',
    status: 'open',
    author: 'developer123'
  })
  .send();

No manual key generation. No type conversions. Just business logic.

Listing Issues for a Repository

With our design, each entity has its own partition key. To list all issues for a repo, we use GSI1:

import { QueryCommand } from 'dynamodb-toolbox';

async function getIssuesForRepo(owner: string, repoName: string) {
  // Query GSI1 to find all issues for this repo
  const response = await GitHubTable.build(QueryCommand)
    .entities(IssueEntity)  // Type-safe entity filtering
    .query({
      index: 'GSI1',
      partition: `ISSUE#${owner}#${repoName}`
    })
    .options({
      limit: 50
    })
    .send();

  return response.Items || [];
}

// Usage
const issues = await getIssuesForRepo('aws', 'dynamodb-toolbox');
console.log(`Found ${issues.length} issues`);

The .entities() method provides type safety and automatic entity parsing. TypeScript knows exactly what fields are available on each issue.

Querying by Status

async function getOpenIssues(owner: string, repoName: string) {
  const response = await GitHubTable.build(QueryCommand)
    .entities(IssueEntity)  // Type-safe entity filtering
    .query({
      index: 'GSI4',
      partition: `ISSUE#${owner}#${repoName}`,
      range: {
        beginsWith: 'ISSUE#OPEN#'
      }
    })
    .options({
      limit: 20
    })
    .send();

  return response.Items || [];
}

The reverse numbering in GSI4SK means you get the most recent open issues first. The .entities() method ensures type safety and automatic entity parsing.

Many-to-Many Queries

async function getUserStarredRepos(username: string) {
  // Direction 1: User -> Repos
  const response = await GitHubTable.build(QueryCommand)
    .query({
      partition: `ACCOUNT#${username}`,
      range: {
        beginsWith: 'STAR#'
      }
    })
    .send();

  return response.Items;
}

async function getRepoStargazers(owner: string, repoName: string) {
  // Direction 2: Repo -> Users (via GSI1)
  const response = await GitHubTable.build(QueryCommand)
    .query({
      index: 'GSI1',
      partition: `REPO#${owner}#${repoName}`,
      range: {
        beginsWith: 'STAR#'
      }
    })
    .send();

  return response.Items;
}

Both directions work efficiently because we designed for them upfront.

Pagination Done Right

async function listReposPaginated(accountName: string, pageSize: number = 20) {
  const response = await GitHubTable.build(QueryCommand)
    .entities(RepoEntity)
    .query({
      index: 'GSI3',
      partition: `ACCOUNT#${accountName}`,
      range: { beginsWith: '#' }  // Only repos (timestamps start with #)
    })
    .options({
      limit: pageSize,
      reverse: true  // Newest repos first (by GSI3SK timestamp)
    })
    .send();

  return {
    items: response.Items || [],
    // URL-safe base64 encode the continuation token
    nextPageToken: response.LastEvaluatedKey
      ? encodeURIComponent(btoa(JSON.stringify(response.LastEvaluatedKey)))
      : undefined
  };
}

// Next page
async function getNextPage(accountName: string, pageToken: string) {
  const lastKey = JSON.parse(atob(decodeURIComponent(pageToken)));

  const response = await GitHubTable.build(QueryCommand)
    .entities(RepoEntity)
    .query({
      index: 'GSI3',
      partition: `ACCOUNT#${accountName}`,
      range: { beginsWith: '#' }
    })
    .options({
      exclusiveStartKey: lastKey,
      reverse: true
    })
    .send();

  return response.Items || [];
}

URL-safe base64 encoding prevents clients from tampering with pagination tokens while ensuring they work in query parameters.

What Real Items Look Like

To make this concrete, here’s what’s actually stored in the table:

// Item 1: Repository
{
  "PK": "REPO#aws#dynamodb-toolbox",
  "SK": "REPO#aws#dynamodb-toolbox",
  "entity": "Repository",
  "owner": "aws",
  "repo_name": "dynamodb-toolbox",
  "description": "Toolbox for DynamoDB",
  "is_private": false,
  "language": "TypeScript",
  "GSI1PK": "REPO#aws#dynamodb-toolbox",
  "GSI1SK": "REPO#aws#dynamodb-toolbox",
  "GSI2PK": "REPO#aws#dynamodb-toolbox",
  "GSI2SK": "REPO#aws#dynamodb-toolbox",
  "GSI3PK": "ACCOUNT#aws",
  "GSI3SK": "#2024-01-15T08:00:00.000Z"
}

// Item 2: Issue (separate partition key)
{
  "PK": "ISSUE#aws#dynamodb-toolbox#00000042",
  "SK": "ISSUE#aws#dynamodb-toolbox#00000042",
  "entity": "Issue",
  "owner": "aws",
  "repo_name": "dynamodb-toolbox",
  "issue_number": 42,
  "title": "Add TypeScript support",
  "body": "We should add full TypeScript type definitions",
  "status": "open",
  "author": "developer123",
  "labels": ["enhancement", "typescript"],
  "GSI1PK": "ISSUE#aws#dynamodb-toolbox",
  "GSI1SK": "ISSUE#00000042",
  "GSI4PK": "ISSUE#aws#dynamodb-toolbox",
  "GSI4SK": "ISSUE#OPEN#99999958"
}

// Item 3: Star (many-to-many relationship)
{
  "PK": "ACCOUNT#john",
  "SK": "STAR#aws#dynamodb-toolbox#2024-12-01T10:00:00Z",
  "entity": "Star",
  "user_name": "john",
  "repo_owner": "aws",
  "repo_name": "dynamodb-toolbox",
  "starred_at": "2024-12-01T10:00:00Z",
  "GSI1PK": "REPO#aws#dynamodb-toolbox",
  "GSI1SK": "STAR#john#2024-12-01T10:00:00Z"
}

Notice how each entity has its own unique partition key. The repository is REPO#aws#dynamodb-toolbox while the issue is ISSUE#aws#dynamodb-toolbox#00000042. We use GSIs to query collections—for example, GSI1 lets us list all issues for a repo by querying with GSI1PK = ISSUE#aws#dynamodb-toolbox.

Working Example

All the code from this post is available in a working repository: github-ddb

The repository includes:

Complete entity definitions with DynamoDB Toolbox v2
Working query examples for all access patterns
Type-safe implementations
Tests demonstrating the patterns in action

Clone it, run the examples, and experiment with the patterns. Seeing the code run makes the concepts click.

What’s Next

You’ve now seen single table design in action. We’ve implemented GitHub’s core data model with proper access patterns, type safety, and clean code.

But we’re not done. In Part 3, we’ll tackle the hard problems:

Hot partitions and how to prevent them
Migrations when requirements change
Debugging your overloaded table
Cost optimization strategies
When to walk away from single table design

These are the production realities that separate successful DynamoDB implementations from disasters. See you in Part 3.

Martin C. Richards