Notes on my home lab

4x Tiny/Micro/Mini running kubernetes, 1 control node. Several raspberry pi 3B provide DNS, DHCP, routing, and ingress from home network. Still evaluating NAS, but settled on TrueNAS Scale with redundant WDRed 10TBs. Also source controlled configuration

Published at Sat 27 Apr 2024

Table of Contents

Hardware
Hosts
Network
Kubernetes Services
Lab Services
Management

Hardware

4x Lenovo ThinkCentre, i5-6500T, 16Gb ram, 256Gb ssd
TP-Link TL-SG1024
A floating amount of Raspberry Pi 3B+
2x 24 patch panel
1x i5-6600, 16Gb ram, 2x 10Tb WD Red, 4Tb WD Blue
Two APC UPS, ones a big power strip, ones got a fancy display

I don't think there's anything really exciting here. The i5-6600 used to be my personal computer until a friend rehomed one to me. That actually saved me a ton of money. Anyways more on all of this in a bit. I have some thoughts on how I decided to home all of this -- and some more that's not in use.

All of this lives on a lack stack rack¹. Ikea is cheap crap but you don't get to appreciate the quality of cheap crap until you're attempting to mount a 24 port switch into thin air. I personally wouldn't recommend this unless you really need a 19inch rack as cheap as possible. It's around 9U of storage and if you can get things to stay put its serviceable. I found just running bolts through the gaping emptiness in the legs² to be about the sturdiest I could make it. Because of this my plan to stack the pair and make the most out of 18U fell through, off, somewhere.³

I'm not completely down on it. For one, I have 9ish U of storage for sixty dollars. The pair I have stacked up are actually connected with L brackets so there's no slipping around. I was originally using the bottom table as a stand for my tower, UPS, and the "fuck I need some hardware for CKA prep" edition of the lab. So this basically got a fancy hat. It was nice to actually ditch the undershelf since it had been slowly sagging from the DAS I had on it. Since I ditched the under shelf, I was able to make use of some taller storage and give awkward things like the paper shredder a better place than under my desk to live. The fancy hat gives a great place for the printer to live but also unfortunately makes a great "stuff and crap" catcher.

I give the lack rack two big thumbs down. I can't even make a pun about how I found it "lack"ing because maybe that could be really good. All around a subpar experience.

More fun is the "WiFi shrine" that several of the raspberry pis and the unifi access point are mounted on. The Pis are in the four corners and the access point in the top center. Behind them is a Netgear GS-105 and a little power strip. The initial test fit was fine but I do need to build a sturdier bracket before I actually mount it above my desk.

Hosts

There's eight primary hosts in the home lab. Three of them are RasPi 3B+s; four are the TMM boxes running Kubernetes; the last is the i5-6600 NAS.⁴ The raspberry pis run essential network services. "Wombat" handles routing and WAN access. "Raccoon" handles DNS and DHCP. "Cardinal" handles ingress from our UniFi networks and does its best as a print server.

Gateway/Router

You might wonder, "Why is a raspberry pi handling WAN access?" And its because I only have wifi for internet access in my office. This is primarily an issue for me because the TMMs don't have wifi. This was the first thing that was definitely the "lab" portion of home lab. This is just iptables, wifi credentials and plugging the other end into a switch. iptables was quick and easy to setup and only took me a little bit of reading to get the barebones of what I wanted working. This is not the fullest featured gateway but it lets the lab reach the places it needs to.

Building a router like this was fun⁵. I won't claim to know enough to guide someone else through it. Instead, I recommend Carla Schoder's Linux Cookbook, 2nd Edition. Even though I ended up picking this up after configuring this router, I read through it and the text is very hands on, very informative and touches on newer tools than iptables. The final chapter uses the built up knowledge to setup a raspberry pi as a router. Since reading through felt like only a third of the experience, I'm going to build a replacement gateway with the book.

DNS/DHCP

"Raccoon" is the primary DNS and DHCP server for the lab. I probably shouldn't be running this off something whose storage is an SD card. That's also not my primary reference for their configuration, so I figure I eventually learn a pretty cheap lesson and then figure out a new toy that solves the problem.

There's three primary DNS domains and one DHCP domain:

burrow.lab.sudonters.com is the lab's "actual" DNS and how hosts address one another
sdntrs.quest is for non host services inside the lab
brrws.xyz is for broader home use

I chose bind9 and isc-dhcp-server for these because it's what I already knew. These weren't hard required for the kubernetes cluster setup, but I'd rather deal with hostnames than IP addresses when configuring things. isc-dhcp-server is super end of life but I also didn't want to learn Kea on the fly either. Somethings I planned out well and other things I didn't, this is one of the latter. As for bind9, it's fine to manage manually until you need to edit zone files by hand which sucks and I need to find a better way to deal with this.

"Public" Loadbalancer

Finally, "Cardinal." This is the latest addition and serves as ingress from our existing UniFi network into the lab. It's like the opposite set up of "Womabt", traffic comes in on its wifi and enters the lab via ethernet. Getting the unifi-lab routing figured out has been a little frustrating, I think more due to my local VPN client than I want to admit.⁹ I also attempted to double dip on this host with CUPS but the routing issues and something unhappy between CUPS and my personal have put that on hold for now.

TMMs

I think the article linked at the top pitches the idea of using TMM compute. Mine are stacked in pairs which is about 2U of height. They're not loud which is important because my office is our primary guest room in addition to server room.

Previously, two of these served as the hardware for some CKA labs I was doing. What I primarily learned from that experience is that I want to manually install kubernetes as little as possible. It's not particularly difficult other than the computer expecting me to spell everything correctly all at once, but it is something I needed a checklist with some quick notes to get done properly. Because of this I played with modifying an ubuntu server ISO. I don't recall the particular tool I used, but this was actually pretty easy. I didn't spend enough time figuring out how to do everything kubernetes expected from the ISO to get me most of the way there.¹⁰ This is easily the best decision I made when setting up the current lab. I think this would've been a win for me if it was only needing to apt-get install ... all the kubernetes stuff once.

The installation of these images was still manual however and something that I definitely can handle in a better fashion. Building my own images and PXE booting the kubernetes nodes is a very long term goal for me.

As for the hardware itself, I was able to get all four of these for around five hundred dollars second hand off ebay. Before ordering these, I had also been considering a singular beefy tower to run kubernetes in VMs on. However, the TMMs being less expensive, the small form, quietness and actual distribution of compute was too good of a combo to pass up.

NAS

This is the part where I get back to the computer I received from a friend. I found the bottle neck to actually making use of my cluster was having to use local storage for anything persistent. Local storage is fine for some stuff but not if you want your pods to migrate to other nodes. Put another way, I don't want my bookmarking service to stop working when and only when I update kube02. That defeats the purpose of having actually distributed compute. So I started evaluating NAS offerings.

My first opinion is holy shit most of these vendor boxes cost a lot of money. I bought a Terramaster 4 bay DAS a while ago and foolishly thought, "how much more expensive could a NAS be? Ten dollars?" All I really want is a box that I can throw big drives and ethernet at and someone else on the network can get a SMB share or something. The appeal of using one of these devices also as a host for VMs, containers and in some cases Kubernetes isn't lost on me, but outside of something like a media server that's just not something I want to do.

The only spare compute I had on hand were 3B+ Pis so I threw OpenMediaVault on one and attached an HDD enclosure via USB. That's not really an ideal NAS setup but I was more interesting in evaluating OMV itself than anything else. That relationship was actually quite short because of a combination of OMV's WebUI error handling and something Firefox doesn't one hundred percent like about my lab's DNS. I managed to get some SMB shares working but it was frustrating.⁶

I went back to comparing vendor hardware again, weighing buying into Terramaster because of the DAS or just go with a Synology. If I had kept going down this path, I do think I would've stuck with Terramaster because it looked like more bang for my buck on things I cared about, especially the models with dual 2.5Gb NICs.⁷

The NAS I've ended up with is actually my former personal tower, the aforementioned i5-6600, running TrueNAS Scale. After shuffling drives between the i5-6600, my new personal, and the DAS I was able to retire the DAS⁸ for now. The NAS ended up with both of the 10TB Western Digital Reds and a 4GB Western Digital Blue and retained its SSD boot drive. The Blue probably isn't the best idea but that's a lesson I'm willing to learn. It's serving as kubernetes NFS mounts for a few things right now and nothing so important that a loss hurts. The 10TBs are mirrored and will be used for home network storage. I'm planning on migrating kubernetes data shares onto something more reliable in the midterm.

Setting up the drive pools and turning them into NFS and SMB shares has been easy. I ran into issues when attempting to use their "app store"? This looked like a DNS issue and behaved exactly like a DNS issue, except it was Wombat's wifi radio taking a nap. After that, I was able to download their supported services. Under the hood, this looks like it's a combination of helm charts and some TNS specific information used to generate the web UI. This was only frustrating because I'd rather look at what helm and kubernetes are yelling about directly than find where it lives in a UI. Adjusting some values made it all happy. Currently, this is hosting a barren Plex. This'll end up with a few more things like MinIO that make the most sense to house as close to the drives as possible.

This tower also has a GPU, a lower end nvidia from eightish years ago, and a secondary NIC from an older, use for parts tower. Plex is happy to use the GPU for transcoding and the second ethernet let's me separate the NAS and the few other services running at the IP level.

So far I'm happy with TrueNAS Scale, and not needing to spend more on compute hardware means more drives to throw at this box. And one of the things I'm most excited about is being able to bring back my "cloud" retropi setup, which was a pair of retropis that mounted the same SMB share for games, saved states, and other files retropi needed.

Network

There's six subnets that the lab ends up dealing with. Three of them are UniFi managed networks and the only real concern is which network is allowed to see what. Guest can see guest stuff, home can see home stuff, lab can see everything.

The lab itself maintains three subnets:

10.31.0.0/16 which are lab hosts
10.4.20.0/24 which are non host services in the lab
10.60.22.0/24 which are services the lab is providing to the home network

DHCP only hands out address for 10.31.0.0/16 while metallb is responsible for the other two. These three ranges line up to the burrows.lab.sudonters.com, sdntrs.quest and brrws.xyz domains I mentioned earlier.

Routing between the Unifi and lab networks has been very hit or miss for me. This is definitely due to my relative inexperience with network management. Adding extra fun to this are local VPN clients mucking with routing tables in different ways. But when it works, it works and I get some dopamine when I can talk to lab services from my phone on the patio.

Kubernetes Services

The cluster provides external access primarily via metallb. I encountered this in passing at work and it stayed in the back of my mind. This project fulfills kubernetes LoadBalancer services with virtual IP addresses, this installation is running in Layer2/ARP mode but there is an additional BGP mode.

The primary client of metallb is ingress-nginx. I choose to have one installation per external domain rather than attaching a load balancer to each external subnet. My primary notes here are to remember to update the IngressClass details in all of the places including the leader election locks, and also to be sure you're not looking at F5's nginx ingress documentation by accident. Each ingress has a paired instance of external-dns configured to update its relevant zone; the note again be sure when running multiple instances both get updated with different names, locks, etc.

The Gateway API has had my eye for a while. It's a welcome improvement on the design of Ingress. The added support for non HTTP traffic is also important because I want to run secondary DNS servers in the cluster and this lets me add UDP DNS and figure out DNS-over-HTTPS later.

The cluster also provides TLS, just to cluster hosted services for now, via Let's Encrypt certificates. This works via the DNS-01 challenge. In the lab cluster I run cert-manager which has AWS credentials that can update Route53 public zones. When an ingress is created and requests TLS, the records the challenges are resolved in public DNS and the certificates are issued internally. Having TLS that doesn't anger browsers is really nice.

And for persistent volumes, I have both rancher/local-path-provisioner and kubernetes-sigs/nfs-subdir-external-provisioner deployed. Local path volumes aren't something I particularly want to rely on but they're better than no persistent storage. The NFS driver was a little more troublesome and I unsatisfactorily resolved the issue with chmod a+rw.

Lab Services

There's currently only a few externally accessible services running in the lab. Plex is available from the NAS. The cluster hosts:

Shaarli, a link/note aggregation service
Heimdall, a home lab landing page
Firefly-III, some home financial service we were checking out
Overseerr, a plex request manager

I don't have much to say here since it's just a few things. I plan on adding secondary DNS servers on each external subnet, more to not need routing DNS from all over the place into the lab. The primary server will still be Raccoon for convenience. Expanding the services around Plex is also planned, having a request manager is great but would be nicer to have it handle requests it self.

Other plans include Home Assistant, and git and OCI repositories.

Management

I've currently thrown together my own workflow for managing these hosts and services. I keep DNS zone files, DHCP configuration, iptables, etc in directories that are "zones". Each zone has all of the necessary elements to configure bind, dhcp, routing to its subnets. There is an additional set of directories that define host specific configuration, what services it should run, static IPs, etc. Finally a systemd unit is registered before networking comes up to take all this, create working directories with all of the zone files, or dhcp configuration fragments, etc that are relevant to the host. Each of these services has their own systemd units that are provided over the ones a package manager sets up. The zone and host files are distributed over SSH.

It's not bad and at least the bulk of it has ended up source controlled. It's probably more worth my time to brush up on ansible than keep building this. In addition to ansible, I also want to explore using Flux or Argo for automation.

For kubernetes resources I've been making a shift towards using kustomize over helm whenever possible. I've been at "We deserve better than templated yaml" for a while now. Kustomize's emphasis on declarative management is generally a lower brain tax than helm's templating system. The resource and component system has some bumps but is more intuitive than Helm's subcharts. Writing just yaml instead of templated yaml means the usual tooling like yaml schemas and formatters work. However, there's not good answers for anything that could be dynamic, for example building a commit specific image and including that in the kustomization.

Either we end up using kustomize to procedurally edit a file:

touch kustomization.yaml
kustomize edit set image image=${NEW_REF}

...generate the yaml via script:

cat <<EOF > build.yaml
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component
images:
  - name: my-image
    newTag: ${NEW_REF}
EOF

And I've seen creative usage of configMapGenerator.

I understand kustomize's desire to avoid incorporating a templating engine or handling things like shell style variable expansion. At the same time, the need to run scripts to generate parts of the full kustomization seems very at odds with the promise of kubectl kustomize | kubectl apply -f -. It would've been interesting to see the confusing vars block to have been repurposed to support pulling environment variables and allowing replacements to use it as a source. Plugins look to the primary way to address this and distributing them is mostly the same problem as scripts.

Regardless, kustomize is another huge win for me.

I opted for the "enterprise" edition by choosing the coffee table rather than side table. ↩
Honestly this might be unfair to gaping emptiness and yawning voids ↩
I also attempted to buy some 9U rails to give the equipment and shelves something hardier to grasp but this made the space between the legs way too tight. Worse, when I went to toss the rails in my spare parts toolbox I found a set of rails from when I was I convinced my R710 would be fine mounted on the first table I got some years ago. ↩
And in storage bind them? I need better jokes. ↩
Forgetting about sleepy lil wifi chips was not. ↩
The short of it is the webui aggressively polls for information on CPU usage, including on pages where it isn't shown. However, if any request fails, including these polls, the UI punts to an intimidating SOFTWARE ERROR page. I'm not sure the frequency, but Firefox would occasionally become very upset about calling OMV via hostname and the connection would fail. So on one hand, I can't say I really fault OMV for punting to an error page when there's a connection failed. On the other hand, convenience background polling shouldn't trigger these kinds of error handling responses. I also didn't like venturing to their github repo and being directed to a web forum to file bugs -- maybe that's a me problem but I also have too many different notification sources to keep track of already. ↩
This is more forward looking than immediate benefit tbh ↩
RIP Rae Dunn DAS ↩
Which means I probably want to setup a wireguard host sooner rather than later. ↩
Basically anything that wasn't immediately obvious or answered by the first two results of the internet question machine I choose to do manually. This ended up being very little, I only really recall the disabling of swap permanently. ↩