How I built this blog

Is it gauche to write a meta post about the blog in the year of our Linux 2022?

Published at Sat 20 Aug 2022

This post is part sharing how I set this up to possibly inspire others and part documentation for when I forget how I did all this when I inevitably need to do it again -- which will be immediately after I've forgotten what I did.

Table of Contents

Pelican
Locally
Build and deploy
Production
Bonus Round

Pelican

I had previous experience with Pelican and ended up really loving it for a lot of reasons:

Content design separation
Hosting can be extremely cheap
No awful "What you see is what you get" editor¹
Extremely extensible
Out of the box extensible markdown support

Of these, being able to use my editor of choice over a WYSIWYG is probably the most important to me. I use vim extensively but that's a post for another time.

So why Pelican? Why not Hugo or Ghost or any of the other billion static site generators out there? Despite previous experience with Pelican I briefly evaluated these and decided that being able to fall back on my Python knowledge when all else fails was pretty invaluable rather than spending time first figuring out how to read errors in a language I'm not as proficient in and then address them.

~~Pelican also has a bevy of themes and plugins that are a pip install away, a few of the ones I use:~~

~~tag-cloud~~
~~avatar~~
~~simple-footnotes[ref]looks like this (:[/ref]~~
~~foundation-default-colours as the base for the theme~~

Of these, I ended up vendoring[ref]the practice of explicitly including the source of a dependency within the project code base[/ref] simple footnotes to address an outstanding issue that was bothering me. Despite that no one other than me will likely see hidden and draft content it was extremely disorienting when working with them locally. Would I've been able to do that with Hugo or Jekyll or Ghost? Probably but definitely not anywhere near as easily. Besides, I'm already riding the CSS struggle bus, there's no need to make things extra difficult for myself.

Edit 2023-11-19: I've actually ditched all of these

I work with Pelican in two primary environments: locally and on "the build server."

Locally

First, locally I use a virtual environment to hold onto my Pelican install and when writing content, adjusting the theme, editing settings or working with a plugin I run Pelican in an "interactive" way:

pelican --listen --autoreload --relative-urls
# or more tersely:
pelican -rl --relative-urls

This configures Pelican to build the site and serve it via an HTTP server. Additionally for any changes in the content directory, theme directory or to the Pelican configuration file it causes Pelican to rebuild the site. Finally, --relative-urls causes all content to be loaded from localhost rather than the domain it is ultimately deployed to. This allows a very tight feedback loop of edit, refresh, review.

Build and deploy

After I'm satisfied, I commit the article (or theme, plugin, etc) changes and open a PR against a private repository² to run a test build to ensure any changes I've made will actually build. This also allows me to review the content before actually publishing it and possibly even allow others to play editor and provide feedback onto a post.

For builds, I use a docker container that I build locally and push to an ECR repository within my AWS account. I choose this route because I can ensure that the version of packages I'm using remain consistent not only between local and remote but also between builds no matter how far apart. It also is a nice little time save on builds since I only need to pull the image down rather than run pip in the github actions.

Speaking of github actions, building and publishing the blog is just:

name: Publish
on:
  push:

jobs:
  publish-blog:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: build
        run: |
          ./build.sh
        env:
          IMAGE_REPO: ${{ secrets.IMAGE_REPO }}
          BASEIMAGE: ${{ secrets.BASEIMAGE }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_DEFAULT_REGION: us-east-1
      - name: publish
        if: github.ref == 'refs/heads/main'
        run: |
          ./publish.sh --delete
        env:
          BLOGBUCKET: ${{ secrets.BLOGBUCKET }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_DEFAULT_REGION: us-east-1

It's not completely necessary for everything to be stored in the secrets but it's convenient to have them centrally configured centrally rather relying on my infamous spelling abilities³ -- the less I need to rely on myself writing something consistently the better.

The only thing I find particularly noteworthy is the condition for publish which causes it to only run on pushes to the main branch. There are other ways of working out this logic but I opted for a single workflow that only sometimes pushes content out to the destination.

The two scripts the build runs are relatively small. Even though they could be inlined I prefer to keep them as a separate scripts so I can execute them locally when experimenting with build.

# build.sh
#!/usr/bin/env bash
set -e
set -x

function cleanup() {
    set +e
    docker image rm "${BASEIMAGE}"
    docker logout ${IMAGE_REPO}
}

trap cleanup EXIT

aws ecr get-login-password | docker login --password-stdin -u AWS "${IMAGE_REPO}"
docker run -w /tmp/build -v $(pwd):/tmp/build --rm "${BASEIMAGE}" publish

I consider the trap call to be the most important part of the script. trap is a bash builtin that allows handling signals sent to the script, and in this case the pseudo-signal EXIT. Combined with set -e⁴ this allows a "crash fast, but don't leave a mess" error handling strategy. It looks like there's three times as much error handling code as actual build the site code, but EXIT triggers whenever the script exits, including by reaching the end successfully.

# publish.sh
#!/usr/bin/env bash
aws s3 sync $@ ./output s3://${BLOGBUCKET}

This small piece runs only on merges into the main branch and is the actual publish step that pushes changes to the site to production. In this case, it's an S3 bucket. Passing $@ to the sync command allows me to experiment locally without needing to edit the script every time. It also allows me to pass --delete during the publish step to ensure that files I want removed actually are, but if I'm running locally and accidentally hit my prod bucket then I don't nuke everything.

Production

S3 might be the deployment target and while it is possible to serve a static website directly from an S3 bucket this does not support HTTPS and you might notice the little lock next to the domain⁵. You probably also noticed that this site isn't at an S3 URL -- something like etcsudontersblog.s3.us-west-2.amazonaws.com

It is also important for me to note that I am not running a publicly accessible bucket in website mode. It's just an S3 bucket that only I and cloudfront can access. Even if you had the actual S3 bucket name you'd just get the awful xml "I'm sorry Dave I'm afraid I can't do that" error.

To accomplish this took several things in AWS after setting up the S3 bucket.

First was creating a hosted zone within Route53. I won't dive too far into details, but a hosted zone is analogous to a zone file⁶. Even though I had a solid idea of what subdomains I wanted at this point, I didn't set them up yet since I didn't have a cloudfront distribution setup to point the records at.

The next immediate step for me here was heading over to my domain registrar -- Namecheap -- and setting AWS's nameservers as the ones responsible for advertising my records.

After the hosted zone was setup, the next thing I did was setup a certificate for the domain. With AWS, you can use Amazon Certificate Manager -- ACM -- to generate a certificate signed by amazon and automatically set the DNS verification records on Route53. If you're following along, you should not do what I did which is explicitly name the subdomains and instead request a wildcard certificate for subdomains so when you add more subdomains you don't have to request a new certificate, just a little tip.

Now that I had a certificate, I could finally setup a cloudfront distribution to serve the content but more importantly attach the certificate to. There's several important configurations on the cloudfront distribution.

The first is attaching alternate domains. When operating within AWS's walled garden, this allows setting a special Route53 record called an Alias to point at the cloudfront distribution. A certificate must also be supplied that covers all listed alternate domains to prove ownership of the domains.

Next was configuring Cloudfront to access the private bucket via an "Origin Access Identity" which is fancy talk for attaching an IAM access policy to the distribution to enable it to access objects inside the S3 bucket.

After the distribution was created -- but not necessarily enabled, these things take a long time to completely spin up -- I headed back to Route53 to create records to make everything nice and tidy for people to access.

Bonus Round

There's a few other bits I set up that aren't necessary but make my life a little better.

I set up a billing alert for AWS to email me when I reach certain thresholds on spending. For me, I set them at 50 and 200 dollars. I don't anticipate this site alone to ever reach more than around $10 but if, when I start setting up more things in my account this'll give me peace of mind that I'm not about to miss a mortgage payment.

I also added a github action that allows me to smash up the cloudfront cache in case I want to get some published now and not wait for the caches to update themselves. It is possible to exclude certain pages from cloudfront caching but sometimes it's also useful to just nuke the whole cache and start over fresh.

Finally, the thing that motivated actually doing something with the domains I owned, I set up a google workspace so I could have an email address that wasn't just an at gmail and instead receive emails at an address tied to a domain I own.

And because there is the /etc/sudonters github organization I decided to verify the domain there as well despite that I'm just using the free tier for organizations which doesn't confer the verified badge.

Both google workspaces and github verified domains both use DNS records for verification of ownership. Workspaces gives MX records with a challenge record and github uses TXT records as the challenge.

WYSIWYG, I say "wizzy-wigWYSIWYG, I say "wizzy-wig"" ↩
I'm currently considering how to make it non-private, but it having access to my AWS account makes me cautious on this ↩
I am awful at spelling, so much that my poor spelling has literally disrupted production environments ↩
if any command in the script returns a non-zero exit code, immediately exit the script ↩
unless your browser renders favicons there in which case it's a broken lock, not confusing at all ↩
explain like I'm 5: a zone file is a collection of records belonging to a parent domain ↩