Relational DynamoDB: Building GitHub's Backend in DynamoDB

In Part 1, we covered why DynamoDB forces you to think differently—you design for access patterns, not flexibility. Now let’s put that into practice by building something real: GitHub’s backend.

We’re talking about the whole data model—repositories, users, organizations, issues, pull requests, comments, and stars. All the relationships you’d normally handle with JOINs and foreign keys. Except we’re doing it in a single DynamoDB table.

This isn’t a toy example. These are the exact patterns you’d use in production.

Why We Need Better Tooling
The Modeling Process
Setting Up DynamoDB Toolbox
Implementing the Data Model
Query Patterns That Actually Work
Working Example
What’s Next

Why We Need Better Tooling

Before we dive into the implementation, let me show you why raw DynamoDB code becomes unmaintainable fast. Here’s creating a simple repository:

// Raw DynamoDB SDK - this is what you're avoiding
const putRepo = {
  TableName: 'GitHub',
  Item: {
    pk: { S: 'REPO#aws#dynamodb-toolbox' },
    sk: { S: 'REPO#aws#dynamodb-toolbox' },
    GSI1PK: { S: 'REPO#aws#dynamodb-toolbox' },
    GSI1SK: { S: 'REPO#aws#dynamodb-toolbox' },
    GSI2PK: { S: 'ACCOUNT#aws' },
    GSI2SK: { S: '#2024-12-24T10:30:00Z' },
    entityType: { S: 'Repo' },
    owner: { S: 'aws' },
    name: { S: 'dynamodb-toolbox' },
    description: { S: 'A set of tools for working with DynamoDB' },
    stars: { N: '1234' },
    createdAt: { S: '2024-01-15T08:00:00Z' },
    updatedAt: { S: '2024-12-24T10:30:00Z' }
  }
};

Every attribute needs type annotations ({ S: value }, { N: value }). Keys are manually computed. There’s no type safety. Change one key format and you break queries across your codebase. Teams end up building their own abstraction layers, each slightly different.

DynamoDB Toolbox solves this. It’s not an ORM—it doesn’t try to make DynamoDB look like SQL. Instead, it provides a type-safe way to define entities while embracing DynamoDB’s patterns.

The Modeling Process

Remember from Part 1; you can’t just wing it with DynamoDB. You need a disciplined process.

Step 1: Create an Entity-Relationship Diagram

Yes, you still create an ERD like with relational databases. But these entities become different item types within your single table:

erDiagram
    User ||--o{ Repo : owns
    Organization ||--o{ Repo : owns
    User ||--o{ Organization : memberOf
    Repo ||--o{ Issue : contains
    Repo ||--o{ PullRequest : contains
    Issue ||--o{ IssueComment : has
    PullRequest ||--o{ PRComment : has
    User ||--o{ Reaction : creates
    Issue ||--o{ Reaction : receives
    PullRequest ||--o{ Reaction : receives
    IssueComment ||--o{ Reaction : receives
    PRComment ||--o{ Reaction : receives
    User ||--o{ Star : stars
    Repo ||--o{ Star : receivedBy
    Repo ||--o{ Fork : forkedFrom
    Repo ||--o{ Fork : forkedTo

Step 2: List Every Access Pattern

This is the step that doesn’t exist in relational modeling. You enumerate every way your application will access data:

Entity	Access Pattern	Query Type
Repository	Get a repository by owner and name	Direct get
	List all repositories for an account (user or org)	Query GSI1
	List repositories sorted by creation date	Query GSI2
	Get repository with recent issues and PRs	Query main table (item collection)
Issue	Get an issue by repo and number	Direct get
	List issues for a repository (with status filter)	Query GSI4
	List recent open issues for a repository	Query GSI4 with beginsWith
Star	List repositories a user has starred	Query main table
	List users who starred a repository	Query GSI3
	Check if user has starred a repo	Direct get

Write these down. Be specific. This list becomes your contract.

Step 3: Design Your Keys

Now comes the interesting part. You organize data so items needed together live together. In DynamoDB, items with the same partition key are stored together and can be retrieved with a single query.

Here’s our key design:

Entity	Partition Key (PK)	Sort Key (SK)	Why This Design
Repo	`REPO#<owner>#<name>`	`REPO#<owner>#<name>`	Unique identifier
Issue	`REPO#<owner>#<name>`	`ISSUE#<number>`	Lives with parent repo
PR	`REPO#<owner>#<name>`	`PR#<number>`	Lives with parent repo
IssueComment	`REPO#<owner>#<name>`	`ISSUE#<number>#COMMENT#<id>`	Lives with issue and repo
PRComment	`REPO#<owner>#<name>`	`PR#<number>#COMMENT#<id>`	Lives with PR and repo
Reaction	`REPO#<owner>#<name>`	`REACTION#<type>#<target>#<user>#<emoji>`	Lives with repo, targets content
User	`ACCOUNT#<username>`	`ACCOUNT#<username>`	Unique identifier
Org	`ACCOUNT#<orgname>`	`ACCOUNT#<orgname>`	Unique identifier
Star	`ACCOUNT#<username>`	`STAR#<owner>#<name>#<date>`	User’s starred repos
Fork	`REPO#<owner>#<name>`	`FORK#<fork_owner>`	Original repo’s forks

The magic: a repository, its issues, and its pull requests all share the same partition key. One query returns everything.

Step 4: Add Global Secondary Indexes

Your primary key design handles many patterns, but not all. GSIs give you alternative ways to query your data. The trick? Overload them just like your main table:

GSI	Purpose	Example Query
GSI1	Pull request queries	“List PRs for this repo”
GSI2	Repo self-reference	For future access patterns
GSI3	Repos by account + timestamp	“List all repos owned by aws, sorted by creation date”
GSI4	Issues/PRs by status	“Open issues for this repo”

Setting Up DynamoDB Toolbox

Let’s start with the foundation. First, install the dependencies:

npm install dynamodb-toolbox @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb

Now define the table structure:

import { Table } from 'dynamodb-toolbox/table';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient } from '@aws-sdk/lib-dynamodb';

// Define the table structure (without client initially)
const GitHubTable = new Table({
  name: 'GitHubTable',
  partitionKey: { name: 'PK', type: 'string' },
  sortKey: { name: 'SK', type: 'string' },
  indexes: {
    GSI1: {
      type: 'global',
      partitionKey: { name: 'GSI1PK', type: 'string' },
      sortKey: { name: 'GSI1SK', type: 'string' }
    },
    GSI2: {
      type: 'global',
      partitionKey: { name: 'GSI2PK', type: 'string' },
      sortKey: { name: 'GSI2SK', type: 'string' }
    },
    GSI3: {
      type: 'global',
      partitionKey: { name: 'GSI3PK', type: 'string' },
      sortKey: { name: 'GSI3SK', type: 'string' }
    },
    GSI4: {
      type: 'global',
      partitionKey: { name: 'GSI4PK', type: 'string' },
      sortKey: { name: 'GSI4SK', type: 'string' }
    }
  }
});

// Initialize with DynamoDB client
const client = new DynamoDBClient({});
GitHubTable.documentClient = DynamoDBDocumentClient.from(client, {
  marshallOptions: {
    removeUndefinedValues: true,
    convertEmptyValues: false
  }
});

Notice the generic names: PK, SK, GSI1PK, etc. This is the single table design pattern—one physical schema supports multiple logical entity types.

Implementing the Data Model

Now let’s implement our entities. DynamoDB Toolbox v2 uses a clean schema API with linked keys.

Repository Entity

import { Entity } from 'dynamodb-toolbox/entity';
import { item } from 'dynamodb-toolbox/schema/item';
import { string } from 'dynamodb-toolbox/schema/string';
import { number } from 'dynamodb-toolbox/schema/number';
import { boolean } from 'dynamodb-toolbox/schema/boolean';

const RepoEntity = new Entity({
  name: 'Repository',
  table: GitHubTable,
  schema: item({
    // Business attributes
    owner: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
      .key(),
    repo_name: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_.-]+$/.test(value))
      .key(),
    description: string().optional(),
    is_private: boolean().required().default(false),
    language: string().optional()
  }).and((_schema) => ({
    // DynamoDB keys - automatically computed from business attributes
    PK: string()
      .key()
      .link<typeof _schema>(
        ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
      ),
    SK: string()
      .key()
      .link<typeof _schema>(
        ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
      ),
    // GSI1: Repo self-reference
    GSI1PK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    GSI1SK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    // GSI2: Repo self-reference
    GSI2PK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    GSI2SK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    // GSI3: Query repos by account with timestamp ordering
    GSI3PK: string().link<typeof _schema>(
      ({ owner }) => `ACCOUNT#${owner}`
    ),
    GSI3SK: string()
      .default(() => new Date().toISOString())
      .savedAs('GSI3SK')
  }))
} as const);

Look at what this gives you:

Type safety: TypeScript knows exactly what fields are required
Linked keys: Keys are automatically computed using .link()
Automatic timestamps: GSI3SK timestamp set via .default() for sorting
Validation: Regex validation on owner and repo names
Clean API: No manual type conversions

Note: In our implementation, created and modified timestamps are managed by the Entity layer (e.g., RepositoryEntity), not in the DynamoDB schema. This keeps business logic separate from storage concerns.

Understanding the Schema API

The .and() pattern separates business logic from DynamoDB internals:

schema: item({
  // Your business domain - what your app cares about
  owner: string().required().key(),
  repo_name: string().required().key(),
  description: string().optional()
}).and((_schema) => ({
  // DynamoDB keys - infrastructure concerns
  PK: string().key().link<typeof _schema>(
    ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
  )
}))

The _schema type reference makes .link() type-safe. Change your business attributes and TypeScript will catch broken key computations.

Issue Entity with Smart Status Filtering

Here’s where it gets interesting. We want to efficiently query issues by status (open/closed):

import { set } from 'dynamodb-toolbox/schema/set';

const IssueEntity = new Entity({
  name: 'Issue',
  table: GitHubTable,
  schema: item({
    owner: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
      .key(),
    repo_name: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_.-]+$/.test(value))
      .key(),
    issue_number: number().required().key(),
    title: string()
      .required()
      .validate((value: string) => value.length <= 255),
    body: string().optional(),
    status: string().required().default('open'),
    author: string().required(),
    assignees: set(string()).optional(),
    labels: set(string()).optional()
  }).and((_schema) => ({
    // Lives with parent repo
    PK: string()
      .key()
      .link<typeof _schema>(
        ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
      ),
    // Padded for consistent sorting
    SK: string()
      .key()
      .link<typeof _schema>(
        ({ issue_number }) => `ISSUE#${String(issue_number).padStart(6, '0')}`
      ),
    // GSI4: Optimized for status queries
    GSI4PK: string().link<typeof _schema>(
      ({ owner, repo_name }) => `REPO#${owner}#${repo_name}`
    ),
    GSI4SK: string().link<typeof _schema>(({ issue_number, status }) => {
      if (status === 'open') {
        // Reverse numbering: higher numbers come first
        const reverseNumber = String(999999 - issue_number).padStart(6, '0');
        return `ISSUE#OPEN#${reverseNumber}`;
      }
      // Closed issues sort after open (# prefix sorts last)
      const paddedNumber = String(issue_number).padStart(6, '0');
      return `#ISSUE#CLOSED#${paddedNumber}`;
    })
  }))
} as const);

The GSI4SK pattern is clever:

Open issues use reverse numbering so newer issues appear first
The # prefix for closed issues ensures they always sort after open issues
One query can return both open and closed issues in the right order

User and Organization Entities

Both are “accounts” in GitHub’s model:

const UserEntity = new Entity({
  name: 'User',
  table: GitHubTable,
  schema: item({
    username: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
      .key(),
    email: string()
      .required()
      .validate((value: string) => /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(value)),
    bio: string().optional(),
    payment_plan_id: string().optional()
  }).and((_schema) => ({
    PK: string()
      .key()
      .link<typeof _schema>(({ username }) => `ACCOUNT#${username}`),
    SK: string()
      .key()
      .link<typeof _schema>(({ username }) => `ACCOUNT#${username}`),
    GSI1PK: string().link<typeof _schema>(
      ({ username }) => `ACCOUNT#${username}`
    ),
    GSI1SK: string().link<typeof _schema>(
      ({ username }) => `ACCOUNT#${username}`
    )
  }))
} as const);

const OrganizationEntity = new Entity({
  name: 'Organization',
  table: GitHubTable,
  schema: item({
    org_name: string()
      .required()
      .validate((value: string) => /^[a-zA-Z0-9_-]+$/.test(value))
      .key(),
    description: string().optional(),
    payment_plan_id: string().optional()
  }).and((_schema) => ({
    PK: string()
      .key()
      .link<typeof _schema>(({ org_name }) => `ACCOUNT#${org_name}`),
    SK: string()
      .key()
      .link<typeof _schema>(({ org_name }) => `ACCOUNT#${org_name}`),
    GSI1PK: string().link<typeof _schema>(
      ({ org_name }) => `ACCOUNT#${org_name}`
    ),
    GSI1SK: string().link<typeof _schema>(
      ({ org_name }) => `ACCOUNT#${org_name}`
    )
  }))
} as const);

Notice they share the same key pattern (ACCOUNT#<name>). This is intentional—repos don’t care if their owner is a user or an org.

Star Entity - The Many-to-Many Pattern

Stars represent a many-to-many relationship. We use the adjacency list pattern:

const StarEntity = new Entity({
  name: 'Star',
  table: GitHubTable,
  schema: item({
    user_name: string().required().key(),
    repo_owner: string().required().key(),
    repo_name: string().required().key(),
    starred_at: string()
      .default(() => new Date().toISOString())
      .savedAs('starred_at')
  }).and((_schema) => ({
    // Direction 1: User -> Repos they've starred
    PK: string()
      .key()
      .link<typeof _schema>(({ user_name }) => `ACCOUNT#${user_name}`),
    SK: string()
      .key()
      .link<typeof _schema>(
        ({ repo_owner, repo_name, starred_at }) =>
          `STAR#${repo_owner}#${repo_name}#${starred_at}`
      ),
    // Direction 2: Repo -> Users who starred it (via GSI)
    GSI1PK: string().link<typeof _schema>(
      ({ repo_owner, repo_name }) => `REPO#${repo_owner}#${repo_name}`
    ),
    GSI1SK: string().link<typeof _schema>(
      ({ user_name, starred_at }) => `STAR#${user_name}#${starred_at}`
    )
  }))
} as const);

The beauty: query in either direction efficiently. Want repos a user starred? Query the main table. Want users who starred a repo? Query GSI1.

Query Patterns That Actually Work

Now for the payoff. Let’s implement GitHub’s actual access patterns.

Creating Items

import { PutItemCommand } from 'dynamodb-toolbox';

// Create a repository - keys computed automatically
await RepoEntity.build(PutItemCommand)
  .item({
    owner: 'aws',
    repo_name: 'dynamodb-toolbox',
    description: 'A set of tools for working with DynamoDB',
    is_private: false,
    language: 'TypeScript'
  })
  .send();

// Create an issue - lives with its repo
await IssueEntity.build(PutItemCommand)
  .item({
    owner: 'aws',
    repo_name: 'dynamodb-toolbox',
    issue_number: 42,
    title: 'Add TypeScript support',
    status: 'open',
    author: 'developer123'
  })
  .send();

No manual key generation. No type conversions. Just business logic.

The Collection Fetch Pattern

This is where single table design shines—fetching heterogeneous items in one query:

import { QueryCommand } from 'dynamodb-toolbox';

async function getRepoWithActivity(owner: string, name: string) {
  // Query for all items with this partition key
  const response = await GitHubTable.build(QueryCommand)
    .query({
      partition: `REPO#${owner}#${name}`
    })
    .options({
      limit: 50
    })
    .send();

  const items = response.Items || [];

  // Parse different entity types from raw DynamoDB items
  const repo = items.find(item => item.entity === 'Repository');

  const issues = items
    .filter(item => item.entity === 'Issue')
    .map(item => ({
      ...item,
      issueNumber: parseInt(item.SK.replace(/^ISSUE#0*/, ''))
    }));

  const pullRequests = items
    .filter(item => item.entity === 'PullRequest')
    .map(item => ({
      ...item,
      prNumber: parseInt(item.SK.replace(/^PR#0*/, ''))
    }));

  return {
    repo,
    issues,
    pullRequests,
    totalItems: items.length
  };
}

// Usage
const result = await getRepoWithActivity('aws', 'dynamodb-toolbox');
console.log(`Found ${result.issues.length} issues`);

One query. One network request. Everything you need.

Querying by Status

async function getOpenIssues(owner: string, repoName: string) {
  const response = await GitHubTable.build(QueryCommand)
    .entities(IssueEntity)  // Type-safe entity filtering
    .query({
      index: 'GSI4',
      partition: `REPO#${owner}#${repoName}`,
      range: {
        beginsWith: 'ISSUE#OPEN#'
      }
    })
    .options({
      limit: 20
    })
    .send();

  return response.Items || [];
}

The reverse numbering in GSI4SK means you get the most recent open issues first. The .entities() method ensures type safety and automatic entity parsing.

Many-to-Many Queries

async function getUserStarredRepos(username: string) {
  // Direction 1: User -> Repos
  const response = await GitHubTable.build(QueryCommand)
    .query({
      partition: `ACCOUNT#${username}`,
      range: {
        beginsWith: 'STAR#'
      }
    })
    .send();

  return response.Items;
}

async function getRepoStargazers(owner: string, repoName: string) {
  // Direction 2: Repo -> Users (via GSI1)
  const response = await GitHubTable.build(QueryCommand)
    .query({
      index: 'GSI1',
      partition: `REPO#${owner}#${repoName}`,
      range: {
        beginsWith: 'STAR#'
      }
    })
    .send();

  return response.Items;
}

Both directions work efficiently because we designed for them upfront.

Pagination Done Right

async function listReposPaginated(accountName: string, pageSize: number = 20) {
  const response = await GitHubTable.build(QueryCommand)
    .entities(RepoEntity)
    .query({
      index: 'GSI3',
      partition: `ACCOUNT#${accountName}`,
      range: { lt: 'ACCOUNT#' }  // Only repos (timestamps < "ACCOUNT#")
    })
    .options({
      limit: pageSize,
      reverse: true  // Newest repos first
    })
    .send();

  return {
    items: response.Items || [],
    // URL-safe base64 encode the continuation token
    nextPageToken: response.LastEvaluatedKey
      ? encodeURIComponent(btoa(JSON.stringify(response.LastEvaluatedKey)))
      : undefined
  };
}

// Next page
async function getNextPage(accountName: string, pageToken: string) {
  const lastKey = JSON.parse(atob(decodeURIComponent(pageToken)));

  const response = await GitHubTable.build(QueryCommand)
    .entities(RepoEntity)
    .query({
      index: 'GSI3',
      partition: `ACCOUNT#${accountName}`,
      range: { lt: 'ACCOUNT#' }
    })
    .options({
      exclusiveStartKey: lastKey,
      reverse: true
    })
    .send();

  return response.Items || [];
}

URL-safe base64 encoding prevents clients from tampering with pagination tokens while ensuring they work in query parameters.

What Real Items Look Like

To make this concrete, here’s what’s actually stored in the table:

// Item 1: Repository
{
  "PK": "REPO#aws#dynamodb-toolbox",
  "SK": "REPO#aws#dynamodb-toolbox",
  "entity": "Repository",
  "owner": "aws",
  "repo_name": "dynamodb-toolbox",
  "description": "Toolbox for DynamoDB",
  "is_private": false,
  "language": "TypeScript",
  "GSI1PK": "REPO#aws#dynamodb-toolbox",
  "GSI1SK": "REPO#aws#dynamodb-toolbox",
  "GSI2PK": "REPO#aws#dynamodb-toolbox",
  "GSI2SK": "REPO#aws#dynamodb-toolbox",
  "GSI3PK": "ACCOUNT#aws",
  "GSI3SK": "2024-01-15T08:00:00.000Z"
}

// Item 2: Issue (same partition key!)
{
  "PK": "REPO#aws#dynamodb-toolbox",
  "SK": "ISSUE#000042",
  "entity": "Issue",
  "owner": "aws",
  "repo_name": "dynamodb-toolbox",
  "issue_number": 42,
  "title": "Add TypeScript support",
  "body": "We should add full TypeScript type definitions",
  "status": "open",
  "author": "developer123",
  "labels": ["enhancement", "typescript"],
  "GSI4PK": "REPO#aws#dynamodb-toolbox",
  "GSI4SK": "ISSUE#OPEN#999958"
}

// Item 3: Star (many-to-many relationship)
{
  "PK": "ACCOUNT#john",
  "SK": "STAR#aws#dynamodb-toolbox#2024-12-01T10:00:00Z",
  "entity": "Star",
  "user_name": "john",
  "repo_owner": "aws",
  "repo_name": "dynamodb-toolbox",
  "starred_at": "2024-12-01T10:00:00Z",
  "GSI1PK": "REPO#aws#dynamodb-toolbox",
  "GSI1SK": "STAR#john#2024-12-01T10:00:00Z"
}

Notice how the repository and its issue share the same partition key (REPO#aws#dynamodb-toolbox). That’s the item collection that makes single-query fetches possible.

Working Example

All the code from this post is available in a working repository: github-ddb

The repository includes:

Complete entity definitions with DynamoDB Toolbox v2
Working query examples for all access patterns
Type-safe implementations
Tests demonstrating the patterns in action

Clone it, run the examples, and experiment with the patterns. Seeing the code run makes the concepts click.

What’s Next

You’ve now seen single table design in action. We’ve implemented GitHub’s core data model with proper access patterns, type safety, and clean code.

But we’re not done. In Part 3, we’ll tackle the hard problems:

Hot partitions and how to prevent them
Migrations when requirements change
Debugging your overloaded table
Cost optimization strategies
When to walk away from single table design

These are the production realities that separate successful DynamoDB implementations from disasters. See you in Part 3.

Martin C. Richards

Table of Contents