This is a snapshot of what I think about writing and reading digital prose, on technical topics, with other people. Stuff like docs, a collaborative wiki, a knowledge base, or a blog.
In this post I will argue for
I will refer to source markup as the content, and to its visual rendering as the presentation. In WYSWIG editors, like Notion or Google Docs, content and presentation are the same.
I believe this split is a useful tool. For instance typesetting mathematics in TeX is easy to edit and reproduce. Graphical equation editor is much harder to use. I admit it can make text harder to edit, especially for non-technical users. I believe benefits in reproducibility and automation outweight these costs.
I would like a repository of knowledge, developed collaboratively over many years, with evolving needs and conventions, that’s always readable as raw content and as “trivial” presentation, but continuously upgrades its preferred presentation.
I believe CommonMark is the only Markdown flavor likely to survive with no changes for a long time. There is also too much Markdown written for no Markdown at all to exist.
Most existing Markdown authoring frameworks, such as Hugo, Zola, or Docusaurus, depend on some combination of extensions like front matters, React components, or shortcodes. Some even require a special directory structure. Migrating between these frameworks can be hard. The framework shouldn’t matter that much in my opinion.
Configuration should be minimal. It’s easier to get started, and it’s harder to break. Eleventy does this well.
If storage is not an issue, record milestone releases in various formats like HTML, PDF or PNG.
I would like a fast, fuzzy search over
Depending on size, search could run in the browser, rg and fzf on a local clone, or online with an external index.
Directory trees as URL paths are hard to maintain because documents must have exactly one place. It may be time-consuming to place it well, and the file may fit elsewhere over time, leading to stale links or stale structure.
/where/do/i/put/this
Tag systems, like those in social networks, enable somewhat natural indexing of topics. I’m not convinced yet they are useful for docs.
Consider reusing _italic_ and/or __bold__. They already represent emphasis in Markdown, and can be filtered, searched, or indexed. I can mention them in content
Using _words_ as _tags_.
or hide them in HTML comments
<!-- _tags_ _words_ -->
Git can record granular history of content and presentation. A tidy commit history is useful in finding authors (who to ask about some piece of knowledge). A version of docs is identified by a Git commit.
First commit should be a readable version, and all subsequent commit should be readable. When making edits, try to remain in narrow scope.
Use git rebase -i to refine local history Refactors are good, but make sure it’s possible to filter these commits out for e.g. git blame for instance with refactor(scope) header.
For displaying different versions there are a few options to consider
docs.example.org/a/b/cdocs.example.org/v1/a/b/cdocs.example.org/a/b/c?v=1If content is managed as plain text files, any local text editor or IDE can be used for editing. User can configure linters, formatters, autocomplete, snippets, macros and motions like textobjects or leaping.
A browser-first environment can be very useful.
Collaborating on prose, like on code, can happen asynchronously or synchronously. I believe there should be a process for both, but asynchronous should be the default.
Patches should be submitted and reviewed by individuals. Review commonly happens on forges like GitHub or GitLab, where the threads are not recorded in the Git repository. For reproducibility, I kind of wish it stayed in Git. There are solutions like Gerrit or git-appraise that manage reviews as Git refs through git-notes.
For live pairing, a real-time collaboration tool is useful.
The term was borrowed from the word lint, the tiny bits of fiber and fluff shed by clothing, as the command he wrote would act like a lint trap in a clothes dryer, capturing waste fibers while leaving whole fabrics intact.
Like software, prose can be linted in pre-commit stage of CI/CD or live inside an editor.
This includes CommonMark formatters, spelling and grammar checkers, and even readability analysis.
Also known as semantic breaks, ventilated prose or visual-syntactic text formatting. Markup languages like HTML, Markdown, or TeX rewrite single new-lines to spaces by default. This allows the writer to split lines semantically.
If there be any truth in the remark,
the crisis at which we are arrived
may with propriety be regarded as the era
in which that decision is to be made;
and a wrong election of the part we shall act may,
in this view, deserve to be considered
as the general misfortune of mankind.
This convention encourages short, well-punctuated sentences, and was shown to increase reading comprahension and reduce eyestrain. By limiting unrelated changes, line are meaningful for longer. Lines integrate well with other line-oriented software like Vim.
Reflowing to wrap at fixed column width or single-line paragraphs, albeit more common in practice, both break default Git (hunk) diffs.
Either way, consider a better diffing algorithm, like [Difftastic] or git --word-diff.
Reference links move URLs out of the way. This is great for readability and Git.
Wrap a [link] in square brackets.
[Capitalisation] doesn't matter
and [spaces are allowed].
[link]: https://kszk.eu
[capitalisation]: /assets/example.pdf
[spaces are allowed]: /link/to/somewhere
Order of resolution is from bottom to top. That is, if I add a reference in content, it will override whatever comes later. This means I can safely append arbitrary references. For instance, all pages in some directory.
If references are missing, CommonMark will render it like [this]. It’s fairly readable, similar to IETF RFCs.
A syntax extension should only be considered if the presentation of its source remains readable in reference CommonMark.
That is, it should only improve the presentation, but not change the content significantly.
Metadata is often stored in a custom YAML front matter. It’s hard to read in CommonMark. Additionally, schemas can be inconsistent, complicating upgrades or migrations to other frameworks.
---
title: Some title
slug: some-title
author: Bob Smith
date: 2023-10-10
---
# Some title
I, Bob, wrote this in October.
For titles, I believe the first top heading should be treated as the document title. Slugs should be derived from titles. Timestamps should come from Git.
I believe all metadata that’s not in content, should be tracked entirely in Git. It may be slow on big old repositories, as git blame goes through every commit. At such scale, one can use incremental builds.
Raw TeX source code, often between dollar signs, is arguably very readable to its intended audience, as most readers are also writers in this case.
For consumption, it’s still very useful to render. One might stare at a proof for a long time, working with the same symbols on a piece of paper.
Let ${X_i}$ be a collection of groups
indexed by a directed set $I$.
For $i<j$ let
$\pi^{j \to i} \colon X_j \to X_i$
be a homomorphism such that
$\pi^{i\to i}$ is identity
and if $i<j<k$ then
$\pi^{j\to i}\circ \pi^{k\to j}=\pi^{k\to i}$.
Presentation unreadable if not supported, hard to edit without editor assist, and moving whitespace pollutes Git diffs.
On the other hand, they look good in code, and may not change much. They are of course excellent for comparisons, before and after, paper results, etc.
As a compromise, I could write them in fenced blocks, agree on a type like table, and render them in presentation.
|One|Two|Three|
|---|---|-----|
| 1 | 2 | 3 |
Footnotes and sidenotes add a side-channel for communication. Paradoxically, this can be good for linearity and focus, as it signals to they are less important and can be deferred.
Here[^1] is a footnote.
[^1]: This is a footnote.
Reference links already can kinda do this. We can specify a page fragment as sidenote identifier, and the content can go into the link comment.
Look [here].
[here]: #some-id "This
will show up
only as alt."
Another form of “communication sidechannel” are in-text “notification” paragraphs. Again, many Markdown implementations have their own way of doing this.
:::warning
This may render with a ⚠️
:::
or
[!WARNING]
Another one of those
A more CommonMark-friendly way could be to put it in a comment
<!-- ⚠️ One more warning -->
and render it in presentation. No special presentation, no adamonition. To always be seen, some people use blockquotes
> ⚠️ Third warning
which renders like this
⚠️ Third warning
They nest, look good in plaintext and CommonMark, Unicode is everywhere now, I like this.
Unfortunately, if it renders into <blockquote> this is not really semantically correct, and may be problematic for accessibility.
There are so many other things that can break in our fallback render scenario; it may be okay to assume some intent in presentation.
If possible, I would use the comment trick instead, unless the adamonition must always be seen, in which case perhaps it shouldn’t be an adamonition?
A snippet of Mermaid or Graphviz source may replace itself with its SVG render.
This is a bit different to TeX rendering, as nobody can imagine these from source.
While content would be as unreadable as , it’s another language to depend on, not a feature that ships with the browser.
On the other hand, images are hard to edit, one needs to know how to generate a new one, so there is an indirect dependency there as well.
It’s easiest to sketch a raster. I can also draw and edit SVGs in Graphite or https://draw.io.
There are extensions like citeproc but I like to instead use reference links.
They can link to publisher’s website, DOI or Arxiv, or internal notes.
Quantisation tends to outperform pruning.
See [kuzmin23] for details.
[kuzmin23]: https://arxiv.org/pdf/2307.02973.pdf
The links can be specified in-text, or auto-generated from a BibTex file or pages in /bibliography.
[angelopoulos22]: /bibliography/angelopoulos22.html
Depending on context, [kuzmin23] can be rendered in presentation as
Kuzmin et al. (2023)(Kuzmin et al., 2023)Kuzmin, Nagel, van Baalen, Behboodi, Blankevoort (2023)Without presentation support, it’s still readable. Square brackets in text are already in use as punctuation to isolate text from it’s surroundings.
It seems that serious typography shouldn’t target HTML, but I believe adding much new syntax to content can be bad for reproducibility and writing experience.
Otherwise, rewriting some quotes and semantic CSS can improve presentation without needing changes to content. Practical Typography is a great resource for rules. See Pollen for details on limitations of web publishing.
A model of how docs are consumed, and how they should be structured.

how-to for problem solvingreferences for looking up informationtutorials for learning new thingsexplanations for narrow depthA few more categories could work for modelling how docs are produced
drafts for early iterations and experimentsbibliography for notes on publicationschangelog as described in hsiao_2023 for accounting for major changes to the project; adrs or rfcs could work here as wellA model of knowledge creation from nonaka_1994. SECI stands for four subsequent phases of knowledge.

Knowledge is refined as it cycles between tacit and explicit. Docs as code map well to this framework
Editor assist can speed up “E”, Much in “C” can be automated. “I” and “S” are not controlled.
naur_85 argues that programming is more about development of a collaborative insight/theory than writing down the program itself.
Tools for thought.
hillmer_2016 proposed to focus on the intersection of
They also suggest collecting some metrics to
Architectural Decision Records are notes on design decisions that significantly alter architecture.
A sequence of ADRs can also be great documentation, useful for onboarding new people to the project. For that purpose, consider a single ARCHITECTURE.md.