Thursday, March 2, 2023

Rancher Suite K8S Adventure - Chapter 014 - Small Harvester Cluster First Node Install

Rancher Suite K8S Adventure - Chapter 014 - Small Harvester Cluster First Node Install

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

As mentioned in the previous posts, Netboot.XYZ and PXEBoot are working flawlessly for many years on this LAN for network installation.  Its been a long time since I installed an OS using an old fashioned USB stick or (GASP!) an optical cdrom.  I go far back enough that I installed Linux off floppy disks back in '94, which took about one box of disks for SLS Linux.  Anyway, combine the reliability and ease of use of network installs, with USB booting being flaky and unreliable on mid 2010's Intel NUC hardware, and obviously I'm going to try and network install Harvester.  That turned out to be a huge mistake for not-entirely-documented reasons.  So before explaining how to successfully USB install Harvester, I'll explain how to unsuccessfully network install Harvester.

How Not to Install Harvester

Netboot.xyz has a net install menu option for Harvester, on my "not recently updated Netboot" that's an ancient 1.0 version.  I used the netboot web UI for netboot.xyz to online upgrade its boot menus to the latest collection, reload PXEBoot, now I have the latest netboot.xyz, which as of today is version 2.0.66, and that recent version CLAIMS it can net install Harvester v1.1.1. Cool! Initially, trying to install via netboot.xyz and PXEBoot seemed to work.  In fact, it installs VERY quickly.

However, after installation and its first reboot, the console sits in "Setting up Harvester" forever and never completes the setup.

Logging into the console and debugging, I watched 'journalctl -f', and until it gets in to a permanent repeated failure loop for bootstrapmanifests/rancherd.yaml, all is well.  If I try to figure out what went wrong with that yaml file, I run "kubectl get pods -A" and that shows harvester-cluster-repo is in ImagePullBackoff mode.  Huh?

Apparently the pod harvester-cluster-repo-random is only available from the cdrom ISO image as a dynamic artifact of the build process and NOT from download, by design and intention.  The USB installer has an extra partition or something full of containers used to "prime the pump" so to speak for air gapped installs or just to make installs faster, and there is some complicated PXE documentation online for Harvester to work around this, but the simpler setup in Netboot.xyz will not provide the "pump priming" collection of containers, and every container EXCEPT harvester-cluster-repo container is obtainable online, so installation appears to work but first boot fails as per:

https://github.com/harvester/harvester/issues/2651

https://github.com/harvester/harvester/issues/2670

So that's just lovely.

I opened a bug on Netboot.xyz to document that the harvester menu option doesn't actually work by default at this time:

https://github.com/netbootxyz/netboot.xyz/issues/1203

Its one of those inter-project compatibility problems where in some sense it's not really either side's fault, but it's their fault together, so it might take awhile to get fixed LOL.

This is why in the end I had to install Harvester with old fashioned USB media instead of PXEboot.

How to Install Harvester

  • The install of Harvester 1.1.1 starts with an install question "Create a new" or "Join an Existing". Created a new cluster on harvester-small-1. 
  • Next it asks for install target, I'm installing Harvester itself to the sda and saving the new 1TB nvme0n1 for storage, using the NON-DEFAULT MBR partition table as per previous discussion of some Intel NUC BIOS issues. 
  • Then it asks where to store VM data, that's going to be nvme0n1 the brand new 1 TB.
  • The hostname for harvester-small-1 is the short hostname it is not, AFAIK, the FQDN.
  • The Management NIC is the only possible option because this NUC only has one NIC.
  • I'm not using VLANs at this time on Harvester (did a lot of that in my VMware and OpenStack days LOL).
  • Bond mode is irrelevant for a one NIC device although it'll be a lot more fun another day on the SuperMicro SYS-E200-8D with their LAG bonded 10G ethernet ports.
  • I am setting the network addressing static not DHCP (LOL).
  • The IPv4 Address is asking for a CIDR like 10.10.20.81/16.
  • The DNS server setting needs commas between DNS servers.
  • My VIP mode is Static and for this cluster it will be 10.10.20.80 and I have the domain name harvester-small.cedar.mulhollon.com pointing to that, and that https URL will be the web interface for the cluster.
  • This is NOT a high security installation and as such my Cluster Token for the harvester-small cluster is harvester-small.  Needless to say this cluster is firewalled off from the internet this is not a public cloud cluster, LOL.
  • The Password is my usual admin LAN password.
  • My NTP server is currently 10.10.5.2 which is a hardware NTP server.
  • It will take awhile to install, after which is auto reboots and continues installation, or sometimes feels like booting back into the USB (but not all the time) after which removal of the USB stick and rebooting will complete the install process.

After rebooting into the OS, hitting F12 on the console status window will toggle between console and a shell. Entered the password, ran top, watched for awhile. Its VERY busy setting up K8S and all kinds of "Harvester Stuff".  Running journalctl -u rancherd -f is pretty interesting to watch as the cluster comes up.  After it sets up RKE you can log out and log back in and kubectl will work and its entertaining to watch that.  You don't have to actually "DO" anything, you can just patiently wait, but its fun to watch the logs and stuff inside.

Eventually, in perhaps ten minutes, the new cluster is "Green" and "Ready" status on the console.  Whoo Hoo!

Initial web login to the VIP address as a https URL, asks to set a new password for the admin user. The process will fail later unless the password is over 12 characters (thanks for not telling us in advance until after entering a pword, LOL) Also agree to the Terms and Conditions and click continue, etc.

There's a whole lot of nothing going on after installing only one node, but, at least no errors or problems reported.  I immediately note that approximately 4 out of 4 of my cores are reserved already LOL, will have to examine that situation. Actual use, as opposed to reserve, is, of course, practically near zero.  Luckily v1.1.1 makes overcommit provisioning a little easier than in the past.  Reportedly Harvester demands have been getting higher over time and I may not be able to run Harvester on this hardware, might need something a little more lightweight, but for experimentation and education with HCI it should be good enough.

I will do a post later on, taking a tour of the Harvester UI, connecting the cluster to Rancher and looking at it in Rancher, etc.  However, for today, the first node of the new cluster is up and all is well.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.