Wires & Books

Some things I learned while binding Fialleril’s “everything I have ever learned”

Apr 19, 2023 4 min read
straylight press fanbinding

Completion Date: March 2023
Author: fialleril
Work: everything I have ever learned
Fandom: Star Wars Original Trilogy
Printing: HP Color Laserjet MFP M283cdw / Hammermill Fore Multi-purpose paper 75 gsm
Binding: Single-signature sewn (5.5 in x 8.5 in)
Font: Domitian by Daniel Benjamin Miller
Software: LuaLaTeX (memoir class for interior, bookcover for cover)

Having successfully hand-bound one copy, I decided to make more copies to send to friends. I picked a lighter-weight paper this time (75 gsm instead of 120 gsm). I also managed to fold a little more crisply, sew a little more tightly, and so on.

I made five copies in this printing; one for my shelf, one for the author if they ever return to fandom, and three for some friends.

I also began thinking about metadata for fan-bound editions. An industry-published book will have an ISBN or perhaps some other identifying number, along with cataloging data for the relevant depository library. There is no such system for fanbinding. Of course, I can blog about an edition I make and I could use the URL for the blog post as an identifier, but URLs can break. I might wipe my blog and restart it, or forget to renew the domain, or any number of other unfortunate things could happen.

More importantly, blog post URLs carry semantic content. Consider the URL /2022/12/fialleril-everything-i-have-ever-learned. This URL carries a timestamp corresponding to when the post was published, and a “slug” based on the post’s title. This makes it long, perhaps too long to expect a user to type. It also carries characters which may be confused with each other; i and l, for instance. URLs of this nature are very good when linking from one web page to another, but poorly suited to the responsibilities of a stable long-term identifier.

Academic articles often have an identifier such as a “DOI”. Such an identifier can assist cross-referencing and make it easier to find articles. For example, doi:10.3983/twc.2022.2107 is the identifier for an article titled “Fan binding as a method of fan work preservation”. The “DOI” system is not accessible for fanbinding usage; participation is aimed at academic institutions and journals, and is not free of charge.

However, reading about the “DOI” system led me to another kind of identifier: Archival Retrieval Keys, or ARKs. The ARK system is intended to provide persistent identifiers for a wide variety of artifacts, without being limited to specific kinds of use. I spoke with some people who work with ARKs and started sketching out a means to use ARKs as fanbinding identifiers, and I almost started building a proof of concept.

And then I stopped myself. I have enough going on in my life right now; the last thing I need is to put together yet another project that might be used only by myself, but which needs to stay stable and online for years on end. And it turns out I can get 90% of the benefit of an ARK right here on my blog.

ARKs have a well-defined structure, including a namespace, a “shoulder” (essentially a sub-namespace), and a check digit. In a few dozen lines of Python, I wrote a pseudo-ARK generator which uses this same structure to generate an id that looks like https://wiresandbooks.com/straylight/7fd129cr. Such an id is simple enough to put in the printed metadata in a fanbound edition. It uses a restricted alphabet for the opaque identifier portion to avoid typos. And the straylight/7fd129cr portion is protected by the check digit algorithm.

So, once I produce a particular edition, I just need to make a page with the relevant metadata, publish it to my blog, and then manually trigger the Internet Archive to take a snapshot. Even if my blog were to become unavailable, the Internet Archive is likely to remain available indefinitely. In this way, I ensure the long-term availability of my metadata.

import secrets
import string
import tomllib

import hyperlink  # https://pypi.org/project/hyperlink/

BETANUMERIC = "0123456789bcdfghjkmnpqrstvwxz"
CHECK_VALUE = {c: i for i, c in enumerate(BETANUMERIC)}

def noid_check_digit(noid: str) -> str:
    # Follows the algorithm given in https://metacpan.org/dist/Noid/view/noid
    if len(noid) >= len(BETANUMERIC):
        raise ValueError("noid length too long for check digit algorithm")
    total = sum(
        [CHECK_VALUE.get(char, 0) * pos for pos, char in enumerate(noid, start=1)]
    )
    remainder = total % len(BETANUMERIC)
    return BETANUMERIC[remainder]

def mint(shoulder: str, template: str = "eeddee"):
    if len(shoulder) == 0:
        raise ValueError("Missing shoulder")
    if shoulder[-1] not in string.digits:
        raise ValueError("Shoulder must end with a digit")
    accum = [shoulder]
    template = template.rstrip("k")
    for kind in template:
        match kind:
            case "e":
                accum.append(secrets.choice(BETANUMERIC))
            case "d":
                accum.append(secrets.choice(string.digits))
            case _:
                raise ValueError(f"Unexpected value {kind!r} in template")
    return "".join(accum)

def main():
    config = tomllib.load(open("config.toml", "rb"))
    baseurl = hyperlink.parse(config["baseurl"])
    components = ["straylight", mint(shoulder="7")]
    # This isn't a real ARK; it uses 'straylight' instead of a NAAN.
    # But baby steps.
    pseudoark = "/".join(components)
    check_digit = noid_check_digit(pseudoark)
    pseudoark += check_digit
    print(baseurl.click(pseudoark).to_text())

if __name__ == "__main__":
    main()