Ordinal

Introduction

Ordinal is my personal Markdown static site generator. I built it to turn a content/ directory into a fully linked wiki style site using Jinja templates, backlinking, and SCSS. It exists because I have taken thousands of personal notes in wiki style systems over the years, starting with Vimwiki, then Dendron, and eventually Obsidian. I wanted an easy way to put that material online without using Obsidian's built in publishing, and with more control over structure, templates, and how everything links together.

It is not trying to be a general purpose CMS. It is meant for personal notes, project logs, technical writing, and long running thoughts/ideas. It's very tailored to me so I don't expect others to adopt it.

Markdown Parser

markdown_parser.py is the Markdown pass for Ordinal. It leaves regular Markdown alone and adds the few things I actually use: wikilinks, footnotes, a table of contents, and some basic media handling.

Most of this is done with regular expressions. I'm doing pattern matching and replacement, not building a parse tree. Wikilinks, footnotes, and external links all follow the same pattern: find the text shape, replace it with HTML, and write down whatever extra information is needed.

Once you get into nested lists, multi-paragraph blockquotes, or edge cases around emphasis, regex starts to fall apart. I'm not trying to solve that at this moment. This isn't a full Markdown implementation. It's a pragmatic renderer that covers my writing style.

How it works

You can find a quick reference to this here.

Frontmatter is read first and split from the body. The YAML parsing is minimal, and date fields are cast to strings so they don't leak type complexity into later stages.

frontmatter_match = FRONTMATTER_RE.match(md_content)
if frontmatter_match:
    frontmatter = yaml.safe_load(frontmatter_match.group(1)) or {}
    content = frontmatter_match.group(2).strip()
else:
    frontmatter = {}
    content = md_content

After that, the body is transformed in layers. The first step is protection. Fenced code blocks and inline code are replaced with placeholders so no later transform can touch them.

processed_content = FENCED_CODE_RE.sub(replace_fenced, processed_content)
processed_content = INLINE_CODE_RE.sub(replace_inline, processed_content)

Once code is protected, the parser applies the Ordinal-specific features:

Wikilinks to internal links.
External links with target="_blank".
Images and videos with optional |WxH sizing.
Quotes + citations.
Tables, bold, italics.
Footnotes with linkable refs.

The parser also builds the table of contents by turning ## and ### headings into anchors:

if line.startswith("## "):
    heading_text = line[3:].strip()
    anchor = _anchor(heading_text)
    toc.append({"text": heading_text, "anchor": anchor, "level": 2})
    line = f'<h2 id="{anchor}">{heading_text}</h2>'

That TOC is what the templates render on wiki and devlog pages.

Wikilinks and backlinks

Wikilinks are the main feature. The parser resolves the category, builds the URL, and records backlinks as it goes.

def replace_link(match):
    link_text = match.group(1)
    slug = link_text.replace(" ", "-").lower()
    category = resolve_category(slug)
    parse_backlink(source_page, link_text, backlinks)
    return f'<a href="/{category}/{slug}.html">{link_text}</a>'

It also avoids replacing wikilinks inside inline code so that code examples stay untouched.

Media rendering

Media is a special case. A normal image renders as a <figure>, and videos render as <video> tags:

if file_ext in VALID_VIDEO_EXTENSIONS:
    return f"""
        <figure class="media-video">
            <video src="{video_path}" controls{size_attr} aria-label="{alt_text}"></video>
            <figcaption>{alt_text}</figcaption>
        </figure>
        """

The sizing hint is encoded in the alt text Alt Text|100x200 which is a hack, but one I like because it keeps the syntax compact.

HTML Renderer

html_renderer.py is the orchestration layer. It's the part of Ordinal that turns a folder of Markdown into a real website. It doesn't do heavy parsing itself. It wires together parsing, context building, templating, asset handling, and revision tracking into one pipeline.

I wrote it because I wanted a single, predictable path from "content changed" to "site rebuilt."

If this file ever becomes confusing to read, that's a sign something upstream needs to be split or simplified.

The pipeline

It starts by making sure the revision database is usable, then it makes sure wikilinks will not break by creating missing placeholders up front. After that it renders the homepage, then renders each category, then does the boring stuff like copying assets and compiling CSS. Only after the site exists on disk does it compute activity and generate the extra views like revisions and domain listings.

That's all encoded in generate_static_site().

def generate_static_site(category="all", commit=False, commit_all=False, summary=None):
    init_db()
    generate_missing()

    process_index(backlinks, commit, commit_context)

    for cat in cats:
        process_category(cat, backlinks, commit, commit_context)

    copy_static_files()
    merge_image_dir()
    merge_video_dir()
    compile_scss()
    update_activity_log()
    generate_revision_pages()
    generate_domain_pages()

Templates

process_file() loads frontmatter, builds a context, and renders using the chosen template. Templates are selected late, after parsing and context assembly.

Frontmatter can override the default template on a per-entry basis. That makes layout a property of the entry, not the category, and avoids hardcoding special cases into the renderer.

parsed_data = parse_frontmatter(md_fp)
context = build_entry_context(md_fp, default_template, backlinks, parsed_data)

template_name = context.pop("template_name", default_template)
_render_to_file(template_name, context, out_path)

The homepage is a special case. It always renders with index.html, even if frontmatter says otherwise.

Commit flow hook

When generation runs with --commit, the renderer checks for content drift, prompts for a summary, and writes a revision entry if something actually changed. The commit hook runs during rendering, not as a separate step.

if commit and should_commit(slug, entry_fingerprint):
    entry_fingerprint, entry_drift = record_commit(...)
    context["entry_fingerprint"] = entry_fingerprint
    context["entry_drift"] = entry_drift

This matters because revisions and output are produced in the same pass. The fingerprint recorded in the database is the fingerprint rendered into the page. There's no window where history and HTML can disagree.

Activity tracking

The renderer also tracks output churn by comparing public/ timestamps against a previous JSON snapshot. This powers the recent changes feed and the activity grid.

previous = load_json(ACTIVITY_STATE_FP, {})
current = scan_public_files()
...
save_json(ACTIVITY_STATE_FP, current)

Extra views

After the main render pass, the renderer generates two special views:

index.html for the global changelog.
<domain>.html for per-domain listings.

These are not Markdown-driven, they're rendered directly from templates with built contexts.

File Manager

file_manager.py is the backbone of Ordinal. It discovers categories, enforces directory structure, regenerates section pages, and keeps the output directory clean. It also handles copying media so the rendered HTML can reference images and videos without special rules.

Category discovery

The category list is derived from the folder structure: any directory under content/ with a matching section file (like articles/articles.md) is treated as a category.

for item in CONTENT_PATH.iterdir():
    if item.is_dir():
        category_md_file = item / f"{item.name}.md"
        if category_md_file.is_file():
            categories.append(item.name)

Structure enforcement

setup_project() ensures the baseline folders exist, and also creates per-category output folders in public/

required_dirs = [
    CONTENT_PATH,
    TEMPLATES_PATH,
    PUBLIC_PATH,
    SNAPSHOTS_PATH,
    LOGS_PATH,
]
...
required_dirs.append(CONTENT_PATH / category)
required_dirs.append(PUBLIC_PATH / category)

It's a sanity check and a bootstrapper.

Section regeneration

generate_section() rebuilds each category's section markdown file. It collects all entries in the category, groups them by domain, and writes a standardized outline.

articles_by_domain.setdefault(article["domain"], []).append(article)
...
new_content_lines.append("## All Articles by Domain")
for domain, domain_articles in articles_by_domain.items():
    new_content_lines.append(f"\n### {domain.title()}")
    new_content_lines.extend([f"- {article['wikilink']}" for article in domain_articles])

It's a small thing, but it ensures category pages always stay synced to content.

Auto-creating missing pages

generate_missing() scans every markdown file for [[wikilinks]] and creates placeholders for any missing targets. This keeps the wiki graph from breaking.

pattern = r"\[\[(.*?)\]\]"
wikilinks = re.findall(pattern, scrubbed)

for link in wikilinks:
    slug = link.replace(" ", "-").lower()
    filename = f"{slug}.md"
    ...
    filepath.write_text(frontmatter, encoding="utf-8")

It uses template.md as the base, so placeholders always start with consistent frontmatter.

Output cleanup

cleanup_orphans() deletes generated HTML in public/ when the source markdown no longer exists.

md_files = {path.stem for path in CONTENT_PATH.rglob("*.md")}

for html_path in PUBLIC_PATH.rglob("*.html"):
    if html_path.stem not in md_files:
        html_path.unlink()

This prevents a slow accumulation of ghost pages.

Media sync

merge_image_dir() and merge_video_dir() copy content/images and content/videos into public/, preserving structure and filenames.

shutil.copy2(source_file, destination_file)

This makes media references stable and avoids any extra build steps.

Context Builder

context_builder.py takes parsed Markdown and produces the data templates read. Frontmatter and parsed content go in. A single dictionary comes out. Templates do not touch raw Markdown and do not check for missing fields. They render what they receive.

I wrote it because I needed one place to assemble page data. Template files stopped carrying logic after this existed.

This is where an entry is finished. Dates get formatted. Revision rows get attached. Links get resolved. Media fields get filled or left empty. When rendering starts, nothing downstream makes decisions.

The main entry point

Everything centers on build_entry_context().

frontmatter, raw_content = _load_markdown(Path(md_fp), parsed_data)
frontmatter["last_modified"] = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")

slug = _slug_from_path(md_fp)
entry_fingerprint = get_entry_fingerprint(slug) or ""
entry_guid = get_entry_guid(slug)
latest_commit_row = get_latest_commit(slug)
prev_commit_row = get_previous_commit(slug)

It reads the markdown, resolves the entry slug, pulls revision info, and begins assembling a context dict. revision data is pulled before rendering so the same values drive both HTML and metadata

Article parsing and structure

This file delegates the actual parsing to markdown_parser.py, but wires the results into the final context:

footnotes_content, footnotes = parse_footnotes(raw_content)
articles = parse_articles(footnotes_content, os.path.basename(md_fp), backlinks)
related_raw = parse_related(frontmatter)

That means the context already includes:

content (parsed/processed)
articles (the internal section list)
toc
footnotes
backlinks
related_articles

Templates just do the rendering.

Page metadata

The metadata block is assembled here, pulling from frontmatter and revisions:

context["page_meta"] = {
    "created_val": created_val or "N/A",
    "domain_val": domain_val or "N/A",
    "division_val": ", ".join(division_val) if division_val else "N/A",
    "last_modified": frontmatter.get("last_modified", "N/A").split(" ")[0],
    "worked": f"{worked_val}h",
    "worked_delta": f"{worked_delta:+.1f}h",
    "hash": entry_fingerprint or "N/A",
}

This is what powers the meta bar in wiki.html, devlog.html, and gallery.html.

Drift and revisions

The file computes "drift", how stale an entry is since its last revision:

last_dt = datetime.fromisoformat(str(latest_cache_ts))
entry_drift = f"{max((now_dt - last_dt).days, 0)}d"

It also surfaces the per-entry changelog:

"context["page_changelog"] = get_changelog(slug)

Media and extras

Media logic lives here too, not in templates:

hero_image (auto-derived if missing)
hero_video, poster
gallery (frontmatter-driven)

It also passes through optional fields like tags, series, location, and reading_time.

Index and section context

This file isn't just for entries. It also attaches context for index and section pages:

context["categorized_articles"] = get_articles_list()
context["entries_total"] = sum(len(v) for v in context["categorized_articles"].values())
context["domain_max"] = max((len(v) for v in context["categorized_articles"].values()), default=0)

For the index, it adds activity graphs and recent media:

context["activity_graph"] = get_commit_activity(global_changes, days=365)
context["recent_media"] = get_recent_media()

Taxonomy

taxonomy.py builds the lists used by the homepage and the domain pages. It reads entries from content/articles and attaches revision data from the database. Nothing else generates these lists.

The file walks the articles directory and reads frontmatter from each Markdown file. It then looks up revision rows for the same slug. If a value exists in the database, it uses that. Frontmatter only fills gaps.

The result is a per-domain list of entries. Each entry includes the title, URL, hash, drift, word count, and worked hours. Templates do not read Markdown or frontmatter. They render the data they receive.

All of this happens in get_articles_list(). Each entry is read once. Revision data is applied once. The result is reused everywhere.

Here is the entry loop in simplified form:

for md_path in ARTICLES_PATH.iterdir():
    if not md_path.is_file() or md_path.suffix != ".md":
        continue

    parsed_data = parse_frontmatter(str(md_path))
    frontmatter = parsed_data.get("frontmatter", {}) or {}
    content = parsed_data.get("content", "") or ""

    title = frontmatter.get("title", md_path.stem.replace("-", " ").title())
    slug = md_path.stem
    domain = frontmatter.get("domain", "Miscellaneous")
    url = f"/articles/{md_path.with_suffix('.html').name}"

    header_image = derive_header_image(frontmatter, content)
    fingerprint = cache.get("last_hash")

Revision fields are added next. If a commit row exists, its values are used. If not, frontmatter values are used.

latest_commit = get_latest_commit(slug)
if latest_commit:
    worked_hours = _safe_float(latest_commit["worked_hours"])
    word_count = int(latest_commit["word_count"])
else:
    worked_hours = _safe_float(frontmatter.get("worked"), default=None)
    word_count = _safe_int(frontmatter.get("word_count"), default=None)

Each entry is then appended under its domain with the collected fields.

categorized_articles[domain].append(
    {
        "title": title,
        "url": url,
        "last_modified": last_modified,
        "domain": domain,
        "division": division,
        "header_image": header_image,
        "slug": slug,
        "entry_fingerprint": fingerprint,
        "entry_drift": entry_drift,
        "worked_hours": worked_hours,
        "word_count": word_count,
    }
)

The homepage reads this structure to render the domain grid. Domain pages read it to render their lists.

Revisions

You can read more about this system here.

revisions.py exists because I needed the site to track changes itself.

This file stores a history for each entry in SQLite. It computes a fingerprint from the entry's created date, title, and body text. When that fingerprint changes, a new row is written. The row records the hash, timestamp, word count, word delta, worked hours, and a short summary.

That data is read by the rest of the system. Templates read it to show hashes and drift. The index page reads it to show recent changes. Taxonomy reads it to attach revision state to listings. Nothing else rereads Markdown to figure out what changed.

There is also a cache table that stores the latest known values per entry. It holds the most recent hash, timestamp, title, and GUID. Other code uses this table to answer questions like which entry changed last or which hash is current without touching the filesystem.

When the database has a value, the rest of the code uses it. No other part of the site tries to infer change state from file timestamps or rebuild history on its own.

How it works

Not every edit creates a revision.

The fingerprint is built from three fields: the created date, the title, and the body text. The body text is stripped of leading and trailing whitespace. Empty lines are dropped. If those three inputs stay the same, the fingerprint stays the same.

def compute_fingerprint(md_fp: str) -> str:
    parsed = parse_frontmatter(md_fp)
    frontmatter = parsed.get("frontmatter", {})
    content = parsed.get("content", "")
    created = str(frontmatter.get("created", ""))
    title = str(frontmatter.get("title", ""))
    normalized_body = "\n".join(
        line.strip() for line in content.splitlines() if line.strip()
    )
    raw = f"{created}|{title}|{normalized_body}"
    sha = hashlib.sha1(raw.encode("utf-8")).hexdigest()
    return sha[:7]

That hash is the revision identifier. When it changes, a new row is written. When it does not change, nothing is recorded, even if the file timestamp changed.

When a revision is written, a row is added to the commits table. This table stores the full history. Each row includes the hash, slug, timestamp, parent hash, word count, word delta, worked hours, and summary.

cur.execute(
    """
    INSERT OR IGNORE INTO commits
    (hash, slug, timestamp, summary, parent_hash, title, created, word_count, word_delta, worked_hours, meta_json)
    VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """,
    (...),
)

At the same time, the articles table is updated. This table only holds the most recent values for each entry.

cur.execute(
    """
    INSERT INTO articles (slug, title, created, last_hash, last_timestamp, guid)
    VALUES (?, ?, ?, ?, ?, ?)
    ON CONFLICT(slug) DO UPDATE SET
        title=excluded.title,
        created=excluded.created,
        last_hash=excluded.last_hash,
        last_timestamp=excluded.last_timestamp,
        guid=COALESCE(articles.guid, excluded.guid)
    """,
    (...),
)

Other code reads from this table instead of scanning files. It is used for listings, recent changes, and per-entry state.

Each entry also has a GUID. The slug can change. The title can change. The GUID does not.

def get_entry_guid(slug: str) -> str:
    cur.execute("SELECT guid FROM articles WHERE slug = ?", (slug,))
    ...
    guid = uuid.uuid4().hex

Seeding existing content

seed_database() runs once to populate the database from existing files.

It walks the Markdown files on disk. For each file, it reads frontmatter, takes the created date, computes a fingerprint, and inserts a commit row using that timestamp. It also writes the corresponding row to the articles table.

After this runs, the database contains one commit per entry and a populated cache. No generate step is required.

Snapshot Manager

snapshot_manager.py snapshots the generated HTML in public/ into a parallel snapshots/ tree with timestamps. If I ever ship something broken, I can roll back the *output* in seconds without touching content or history.

Revisions track content evolution. Snapshots track deployable output.

How it works

A snapshot is a copy of every generated HTML file, stored with a timestamp in the filename. The directory structure mirrors public/.

snapshot_file = f\"{os.path.splitext(file)[0]}_{timestamp}.html\"
shutil.copy2(os.path.join(root, file), os.path.join(snapshot_dir, snapshot_file))

Or snapshot a single category:

def snapshot_category(public_dir: str, snapshots_dir: str, category: str) -> None:
    ...

Both copy HTML out of public/ into snapshots/,

Restoring a snapshot

Restore logic is interactive: list snapshots, pick one, and then copy matching files back into public/.

if not snapshot:
    print("Available snapshots for restoration:")
    ...
    snapshot = snapshot_dict[selected_index]

for file in snapshots:
    if snapshot in file:
        shutil.copy2(file, dest_fp)

You restore by timestamp substring, which keeps the UX simple and prevents complex indexing.

Cleanup tooling

Snapshots can also be deleted, either all of them or a selected subset. The manager lists what exists, asks for confirmation, and deletes on command.

Issues and Improvements

I keep a running list of known issues and possible improvements in improving-ordinal. That page exists to document issues without cluttering this article.

THE INFINITE ARCHIVE

Ordinal

Introduction

Markdown Parser

How it works

Wikilinks and backlinks

Media rendering

HTML Renderer

The pipeline

Templates

Commit flow hook

Activity tracking

Extra views

File Manager

Category discovery

Structure enforcement

Section regeneration

Auto-creating missing pages

Output cleanup

Media sync

Context Builder

The main entry point

Article parsing and structure

Page metadata

Drift and revisions

Media and extras

Index and section context

Taxonomy

Revisions

How it works

Seeding existing content

Snapshot Manager

How it works

Restoring a snapshot

Cleanup tooling

Issues and Improvements

Outbound

Referenced by