Wednesday, March 1, 2023

Rancher Suite K8S Adventure - Chapter 013 - Small Harvester Cluster Design

Rancher Suite K8S Adventure - Chapter 013 - Small Harvester Cluster Design

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

Harvester-Small is a micro test cluster for HCI experiments. I know from experience with VMware these servers overheat and get squirrely under high load but its more than adequate for testing, experimentation, and educational purposes. I believe "bursty" low average workload would be fine on this hardware, probably OK to stream music or whatever.

Hardware

Harvester-Small cluster is composed of three 2016 Intel model NUC6i3SYH mini PCs formerly being used as multiple small FreeNAS (now TrueNAS) NUC servers for the former VMware cluster, with newly installed 1TB NVMe for storage using Longhorn.

https://www.intel.com/content/www/us/en/support/products/89189/intel-nuc/intel-nuc-kits/intel-nuc-kit-with-6th-generation-intel-core-processors/intel-nuc-kit-nuc6i3syh.html

These mini PCs had been upgraded to what seemed like a lot of RAM at the time, 32 gigs, back when I was running a complete set of VMware 5.x on them. I don't keep up with desktop PCs I don't know if 32 GB ram is considered a lot in 2023. Probably, yeah.

Four Intel NUC problems, mostly solved, or worked around anyway:

  1. A peculiarity of the on board video hardware, or maybe VMware ESXi, is at some point in the last decade I had to plug in fake HDMI dongles for device boot to work. Something about boot failing if no HDMI was attached, so you can (or used to be able to) buy a little dongle plug that pretended to be a HDMI monitor for just this purpose. The aggravation level of a server that refuses to boot unless its plugged into a working HDMI monitor is immense.  I don't recall if this problem went away with ESXi version 4 (or 5) or if it went away with a BIOS upgrade or went away with some unexpected BIOS setting, regardless I have a pile of HDMI simulation dongles no longer needed for successful boot.
  2. A peculiarity of 2010's NUCs case design is the ethernet port was toleranced with somewhere around zero to negative millimeters of manufacturing clearance on the ethernet port so depending on exact shape and momentary angle and tension of ethernet cable, sometimes the ethernet behaves as if unplugged, its very aggravating for a server to sometimes appear to have a "burned out" ethernet port with a cable plugged in and latched, yet its really just a cable compatibility issue. Generally the higher quality cables, especially shielded patch cables, are less reliable and the cheaper more flexible usually less reliable cables are, when plugged into the back of a '16 NUC, ironically more reliable perhaps because they are more flexible and thus do not interfere with the side of the case being too close to the cable plug when installed.
  3. Another fascinating problem with mid 2010s NUCs is they only support GPT partition table booting via manual selection using the "F10" boot menu. Plain old MBR partition tables boot perfectly normally. Naturally I found this out by doing a perfectly successful installation of Harvester using GPT, which can only boot with manual intervention on every single boot of this model of NUC, so I had the privilege of repeating the entire Harvester installation process, now with MBR boot records which work perfectly upon every reboot.
  4. The final problem with mid 2010s NUCs is USB booting is a bit twitchy and I don't have a full mental model of the problem, therefore USB booting is never 100% reliable, although its probably over 90%.  It can't be a hardware issue unless it affects multiple NUCs and multiple USB drives randomly.  Generally, trying the "F10" manual boot menu seems to work often enough, whereas relying on various BIOS "boot orders" of various devices such that it would automatically boot off the USB only when its inserted, generally do not work or result in much headache. In summary, configure the BIOS to boot off the SDA drive and only the SDA all the time, and when trying to install Harvester using a USB boot drive just accept you'll be hitting F10 at power up and manually selecting the USB for booting, and sometimes trying it a couple times.

IP addressing and DNS

This specific data is not useful to anyone else, but it does demonstrate one working organization strategy and explain a few things about Harvester in general.

10.10.20.80 VIP harvester-small.cedar.mulhollon.com
10.10.20.81 NUC1 harvester-small-1.cedar.mulhollon.com
10.10.20.82 NUC2 harvester-small-2.cedar.mulhollon.com
10.10.20.83 NUC3 harvester-small-3.cedar.mulhollon.com

I think a good host name strategy is cluster name dash number and put the cluster's VIP at what would be "dash zero" for the cluster, as seen above.  At least this is how I set it up in Netbox IPAM and AD DNS.

Note I'm not attempting the adventure of VLANs, I had plenty of experience with that back in the VMware days when it was naturally assumed every decent hypervisor host would have six to eight ethernet interfaces so I simulated that with VLANs, and back when vSAN couldn't encrypt data (and everyone felt that was normal, LOL) so you had to segregate vSAN storage traffic off to a 'private' VLAN no other machines had access to.  Well, times change and the stereotype of K8S is and cloudiness is "just abstract it all away at the K8S level and let the cloud take care of itself" so one ethernet is enough.  Maybe, in the Harvester-Large cluster, I will experiment with VLANs anyway just for fun.

Note that Rancher does its cluster LB by setting up multiple A or CNAME records for a hostname that points to the cluster members, whereas Harvester clusters set up a VIP.  I don't know if its a "real VIP" following full RFC3768 like a router ethernet port would follow the standard or if its some software thing that kinda works well enough.  I suppose a one liner explanation of real VRRP protocol is its something like anycast or multicast but for one "normal" IP address instead of residing in special multicast IP ranges (none of this is technically true but is close enough to get the flavor of that it does across).

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.