You can read more about this system here.
revisions.py exists because I needed the site to track changes itself.
This file stores a history for each entry in SQLite. It computes a fingerprint from the entry's created date, title, and body text. When that fingerprint changes, a new row is written. The row records the hash, timestamp, word count, word delta, worked hours, and a short summary.
That data is read by the rest of the system. Templates read it to show hashes and drift. The index page reads it to show recent changes. Taxonomy reads it to attach revision state to listings. Nothing else rereads Markdown to figure out what changed.
There is also a cache table that stores the latest known values per entry. It holds the most recent hash, timestamp, title, and GUID. Other code uses this table to answer questions like which entry changed last or which hash is current without touching the filesystem.
When the database has a value, the rest of the code uses it. No other part of the site tries to infer change state from file timestamps or rebuild history on its own.
How it works
Not every edit creates a revision.
The fingerprint is built from three fields: the created date, the title, and the body text. The body text is stripped of leading and trailing whitespace. Empty lines are dropped. If those three inputs stay the same, the fingerprint stays the same.
def compute_fingerprint(md_fp: str) -> str:
parsed = parse_frontmatter(md_fp)
frontmatter = parsed.get("frontmatter", {})
content = parsed.get("content", "")
created = str(frontmatter.get("created", ""))
title = str(frontmatter.get("title", ""))
normalized_body = "\n".join(
line.strip() for line in content.splitlines() if line.strip()
)
raw = f"{created}|{title}|{normalized_body}"
sha = hashlib.sha1(raw.encode("utf-8")).hexdigest()
return sha[:7]
That hash is the revision identifier. When it changes, a new row is written. When it does not change, nothing is recorded, even if the file timestamp changed.
When a revision is written, a row is added to the commits table. This table stores the full history. Each row includes the hash, slug, timestamp, parent hash, word count, word delta, worked hours, and summary.
cur.execute(
"""
INSERT OR IGNORE INTO commits
(hash, slug, timestamp, summary, parent_hash, title, created, word_count, word_delta, worked_hours, meta_json)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(...),
)
At the same time, the articles table is updated. This table only holds the most recent values for each entry.
cur.execute(
"""
INSERT INTO articles (slug, title, created, last_hash, last_timestamp, guid)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT(slug) DO UPDATE SET
title=excluded.title,
created=excluded.created,
last_hash=excluded.last_hash,
last_timestamp=excluded.last_timestamp,
guid=COALESCE(articles.guid, excluded.guid)
""",
(...),
)
Other code reads from this table instead of scanning files. It is used for listings, recent changes, and per-entry state.
Each entry also has a GUID. The slug can change. The title can change. The GUID does not.
def get_entry_guid(slug: str) -> str:
cur.execute("SELECT guid FROM articles WHERE slug = ?", (slug,))
...
guid = uuid.uuid4().hex
Seeding existing content
seed_database() runs once to populate the database from existing files.
It walks the Markdown files on disk. For each file, it reads frontmatter, takes the created date, computes a fingerprint, and inserts a commit row using that timestamp. It also writes the corresponding row to the articles table.
After this runs, the database contains one commit per entry and a populated cache. No generate step is required.