Article

Verizon

A Python implementation of Git's core — object store, commits, branches, staging, diff, log — from scratch. Binary-compatible with Git's object format. Built to understand Git, not memorize its commands.

What It Implements

add, commit, log, branch, checkout, diff, status. Standard library only.

Architecture

Three key pieces:

1. ObjectStore — SHA-1 hashes, zlib compression, content-addressable blobs/trees/commits/tags matching Git's exact binary format.

2. Commit — serialization/deserialization matching Git's format precisely; parent pointers enable DAG traversal for log and diff.

3. Refs — branches are files containing commit SHAs; HEAD is a file pointing to either a branch name (attached) or a commit SHA directly (detached).

What Building It Taught

Three things only became obvious after implementing them:

  • Trees are recursive Merkle structures — each directory is a tree object containing blob SHAs and sub-tree SHAs. Content-addressable all the way down.
  • Detached HEAD is trivial — HEAD is a file. Detached just means it contains a commit SHA instead of a branch name. Checkout elsewhere → commit becomes unreachable but isn't deleted until GC.
  • Diff is harder than expected — recursive tree traversal with edge cases for renames, type changes, and files only in one tree.

> "You don't understand something until you've built it."

After this, rebase, force push, and merge conflicts stopped being magic — they're all traversals over a DAG of immutable objects.

Related Pages

Sources

Evidence

Linked source: GitHub Repo: verizon

Linked source: Website Source: blog / verizon_blog