Spring City Solutions LLC: March 2023

Friday, March 10, 2023

Rancher Suite K8S Adventure - Chapter 020 - Prepare Terraform for Harvester

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

I don't like manually configuring things. I like IaaC with templates stored in a nice Git repo, it eliminates errors, deployments are faster, fewer human errors, it's just all around better than mousing and typing a virtual infrastructure. So today we prepare Terraform to work with Harvester, but first, some work with multiple cluster kubeconfig files.

Multiple Cluster Kubectl

Start automating by configuring kubectl to talk to multiple clusters.

Reference:

https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/

The good news is its pretty easy to configure multiple clusters into separate kubectl contexts. The bad news is its easy to select different contexts at runtime, in fact its so easy to select different contexts, that there have been multiple headline news stories about devops who thought they were permanently erasing their test cluster deployment, only to rapidly discovery they were actually in their production context, resulting in some amazing news headlines about outages and deleted data. So, keep your wits about you and be careful. I will set up multiple contexts some other time.

One of the cultural oddities of the K8S community is they like to call the kubectl config file by the generic phrase "your kubeconfig file". What makes that odd is most installs do not have a file named kubeconfig or dot kubeconfig or kubeconfig.conf or whatever. On my Ubuntu system, kubectl's config file, aka the "kubeconfig file" is configured by a file located at ~/.kube/config

I will usually be working with Harvester, so in my ~/.kube directory I keep yaml files named rancher.yaml and harvester.yaml and I can simply copy them over the ~/.kube/config file.

In summary, make certain that running "kubectl get nodes" displays the correct cluster...

Terraform

https://developer.hashicorp.com/terraform

Terraform is similar in concept to CloudFormation from AWS or HEAT templates from OpenStack. You write your infrastructure as source code, run the template, and terraform makes the cloud gradually closely resemble your template. Not a script, so much as a specification.

Install Terraform

I should have installed Terraform back when I was installing support software like kubectl and helm. Better late than never...

https://developer.hashicorp.com/terraform/downloads

https://www.hashicorp.com/official-packaging-guide

The exact version of the Ubuntu package I'm installing is 1.3.9 as seen at

https://releases.hashicorp.com/terraform/

And I'm doing an "apt hold" on it to make sure its not accidentally upgraded.

Here is a link to the Gitlab repo directory for the Ansible helm role:

https://gitlab.com/SpringCitySolutionsLLC/ansible/-/tree/master/roles/terraform

If you look at the Ansible task named packages.yml, the task installs some boring required packages first, then deletes the repo key if its too old, then downloads a new copy of the repo key if its not already present, gpg dearmor the key into 'apt' format, add the local copy of the repo key to apt's list of known good keys, install the sources.list file for the repo, does an apt-get update, takes terraform out of "hold" state, installs terraform version 1.3.9, finally places terraform back on "hold" state so its not magically upgraded to the latest version (1.4 or 1.5 or something by now). Glad I don't have to do that manually by hand on every machine on my LAN, LOL.

Simply add "- terraform" to a machine's Ansible playbook, then run "ansible-playbook --tags terraform playbooks/someHostname.yml" and it works. Ansible is super cool!

As of the time this blog was written, "terraform --version" looks like this:
vince@ubuntu:~$ terraform --version
Terraform v1.3.9
on linux_amd64
vince@ubuntu:~$

References

https://www.suse.com/c/rancher_blog/managing-harvester-with-terraform/

https://docs.harvesterhci.io/v1.1/terraform/

https://github.com/harvester/terraform-provider-harvester

https://registry.terraform.io/providers/harvester/harvester/latest

Thursday, March 9, 2023

Rancher Suite K8S Adventure - Chapter 019 - Provision a VM in Harvester

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

A large proportion of Blog / Youtube videos covering Rancher seem to halt after the UI is working. I intend to extend from that point, and document using the system. Today we provision a VM. Personally I do the hybrid life, I provision a VM in AWS, VMware, OpenStack, or now Harvester, then I use Ansible to automatically integrate the VM into my existing systems WRT Active Directory SSO, Logging, Zabbix and MetricBeats monitoring, NTP, etc. As per that operational style, the goal or demarcation point of provisioning a VM would be successfully SSH in as the Ansible user, beyond that point Ansible takes over configuration.

As a general observation over the years the workload has steadily become container-ized such that in the long run I don't think I'll have many non-infrastructure VMs. I will likely continue to have multiple VMs for DCs, DNS, DHCP, maybe a few other tasks, but workload slowly always moves toward containers. That should work very well with a HCI solution such as Harvester. Use case drives the requirements; for example I will have to bridge the DHCP server interface directly onto the LAN, for example, which was "easy" in OpenStack.

Rancher Project vs K8S Namespace

https://ranchermanager.docs.rancher.com/pages-for-subheaders/manage-projects

https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-namespaces

Projects are a 'new' Rancher concept wedged between existing K8S clusters and existing K8S namespaces. Clusters contain projects which contain namespaces. My plan is to use projects in Rancher similar to how I used projects in OpenStack, so I will configure "infrastructure" "server" "iot" "enduser" and similar project names.

Note that projects are a part of a cluster; project "infrastructure" on cluster "harvester-small" is independent of project "infrastructure" on cluster "harvester-large".

I intend to create roughly one namespace per hostname or system as appropriate. Everything about DHCP as a system, would live in the DHCP namespace in the infrastructure project in my Harvester cluster.

Create a project

Log into Rancher, select virtualization and the harvester-small cluster, hamburger menu "Projects/Namespaces", "Create Project"

I named today's test project "experiment", and added "admin" as a project owner, so there are two owners, user "vince" (me) and the "admin" user.

This is where you would set project resource quotas and limits, if you were planning to use any (which I am not, at this time)

Create a namespace

Log into Rancher, select virtualization and the harvester-small cluster, hamburger menu "Projects/Namespaces", in the "experiment" project area click "Create Namespace".

I named my namespace "vm". This is where you can set container level resource limits. I don't intend at this time to set any limits to this namespace.

Create a VM storage class

I don't need 3 replicas of a test volume on a 3 node cluster, 2 replicas should be fine for my testing.

https://docs.harvesterhci.io/v1.1/advanced/storageclass

Log into Rancher, select virtualization and the harvester-small cluster, click "Advanced", then "Storage Classes".

Note the default SC is "harvester-longhorn" and it keeps 3 replicas.

Click 3-dots, "clone", give it a name "experimental", change the number of replicas to 2, click "create".

Upload an image into the cluster

https://docs.harvesterhci.io/v1.1/upload-image

Let's use Ubuntu's 20.04 focal-server-cloudimg-amd64.img file, I'm using version 20230215 from:

http://cloud-images.ubuntu.com/focal/20230215/

Note this is exactly the same image I run on OpenStack. The similar named "-disk-kvm" optional image has a console compatibility issue with OpenStack such that if you want a working console you can't use "-disk-kvm". For consistencies sake I will use the same image on Harvester. This console works fine in OpenStack and in Harvester.

First we try uploading a previously downloaded cloud image. Spoiler alert, this does not work. Log into Rancher, select virtualization and the harvester-small cluster, click "Images" then "Create". I'm naming my upload "Ubuntu 20.04 20230215". Then select the img file and upload it. It will take awhile to leave "uploading" status. Eventually it failed with "Timeout waiting for the datasource file processing begin". Well OK, let's retry. That fails again. There is an old Harvester bug report in Github about this error message that is closed, but obviously file uploads still don't work:

https://github.com/harvester/harvester/issues/1415

OK then, I will upload via providing a URL instead of download/upload. "Copy Link Address" from the Ubuntu cloud-images page, and do a URL upload directly into Harvester. The image download via URL succeeded after about three minutes. Cool.

For now, I am using "Storage" "Storage Class" the default harvester-longhorn, which makes triple copies of the image, which seems excessive for an internet download; on the other hand I think maybe VMs will instantiate faster if the copy is local to the hardware. I will experiment later on, with creating an image storage class that only keeps one replica to save disk space. After all, I only need read access at the moment of instantiating a new VM which is pretty rare, and if I lose the image I can re-download a new copy from the internet in less than three minutes...

Upload your SSH keys

https://docs.harvesterhci.io/v1.1/vm/create-vm

I have a regular daily use SSO Active Directory user with SSH keys AND an ansible user with SSH keys, so for experiments (like today) I would provision a VM with my "vince" user SSH keys but I would provision a real VM with the "ansible" user SSH keys. So I have two sets of keys to upload although today we're only using the "vince" ssh key.

Log into Rancher, select virtualization and the harvester-small cluster, click "Advanced" then "SSH Keys" then "Create". Note that you can put SSH keys into specific namespaces although I'm using "default". "Read from a File" worked for me.

Repeat the above for the "ansible" user, which we won't be using today but will use sooner or later.

Create a VM Network

By default, if you try to create a VM, the VM will be placed in network "management Network" which is the internal system network, and you'll get some crazy inaccessible 10.50.x.y address that only exists inside the cluster. So you need to configure a "VM Network" that bridges over to the ethernet port, connecting the VM to the real world LAN.

Log into Rancher, select virtualization and the harvester-small cluster, click "Networks" then "VM Networks" and Create.

I named my network "untagged" and left it in the "default" namespace for general use.

Its a type "UntaggedNetwork" (hence the name "untagged") and its on the "Cluster Network" "mgmt" which is the ethernet port of my Harvester nodes.

Cloud-init

https://docs.harvesterhci.io/v1.1/vm/create-vm#cloud-init

https://cloudinit.readthedocs.io/en/latest/

Note OpenStack does drive based and DHCP based cloud-init but Harvester AFAIK only does drive based cloud-init.

One cool feature of Harvester's cloud-init is the SSH keys list seems to support injecting multiple keys into .ssh/authorized_keys. "Back in the Old Days" using OpenStack, cloud-init only supported one key, this is kind of cool that you can inject multiple keys.

So, lose some features, gain some features.

When creating a VM, the cloud-init config can be manually modified in "Advanced Options" "Cloud Config".

In theory it would be possible to set the serial console password for the ubuntu user or set the VM hostname, but in practice it was not possible.

The link for network config options in the Rancher UI is dead.

https://cloudinit.readthedocs.io/en/latest/topics/network-config-format-v1.html

The proper link seems to be

https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v1.html

I submitted a bug report to Harvester at

https://github.com/harvester/harvester/issues/3528

Note this bug was fixed and closed out on March 1st so if you update your Harvester after that date its probably good now. I write these blog posts some weeks in advance...

If you provide no Network Data, you get a DHCP address, which is acceptable for testing but not useful for production. Let's design a Network Data for a statically assigned IP address.

version: 1

config:

- type: physical

name: enp1s0

subnets:

- type: static

address: 10.10.202.202/16

gateway: 10.10.1.1

dns_nameservers:

- 10.10.7.3

- 10.10.7.4

dns_search:

- cedar.mulhollon.com

cloud-init is an incredibly expensive piece of software to use, because the only feedback you'll get upon any errors is a boot log message "Invalid cloud-config provided: Please run 'sudo cloud-init schema --system' to see the schema errors." Of course its impossible to run that because if cloud-init doesn't work all user access is blocked, no SSH keys are set and no passwords are set for the serial console. So best of luck finding your typo or indentation error LOL.

As per the above, I was unable to use cloud-init to set the 'ubuntu' user password for the serial console, or set the VM hostname. Perhaps there is an incompatibility in the password hash CLI program, or the requirements have changed in an undocumented manner, although I attempted to use the plain_text_password option which also failed. The online docs that are guaranteed to work with Ubuntu 14 (support ended in 2019) have a disclaimer that those instructions are known not to work in newer installs. Various attempts to set the hostname silently failed, it was impossible to get a working output from "hostname -f". I tried using both the "helper" functions in cloud-init and "manually" running various command line commands using runcmd and nothing worked. Sometimes automation that saves seconds takes too many hours to set up, especially if it has an awful UI. So just log in as "ubuntu" over SSH and set a password for the serial console manually, and set the hostname manually. Unfortunate, but necessary.

Create a VM

https://docs.harvesterhci.io/v1.1/vm/create-vm

Log into Rancher, select virtualization and the harvester-small cluster, click "Virtual Machines" then "Create".

The selected namespace will be "default". I will change the NS to "vm" which is part of the "experimental" project. How does one know "vm" is part of the "experimental" project in the Create VM UI? That's an excellent question. I intend to use a naming strategy for production deployment that looks like if the project name is "projectname" then the NS name will be "projectname-NS" not "NS". Part of the design advantage of having projects was to permit the re-used of NS names across multiple projects so this naming strategy negates one of the original design goals, but if its unusable, then whatever, do what works as a pragmatism strategy.

I named the test VM "test".

In the "Basics" tab I will provide 1 CPU, 4 GB ram, and the "vince" SSH key. As per above, "real" production would use the "ansible" key for automated provisioning but using the "vince" key makes it easy to simply log in as "ubuntu@someaddrs" from my "vince" account.

In the "Volumes" tab I select the Ubuntu image as my 10 GB disk image. I see in the "Type" dropdown there's an option to add a cdrom, so I could install from cdrom if I don't have a ready to use cloud image. Fun as a virtualized bare metal install would be, which I've done many a time on VMware and OpenStack, in today's experiment, we'll use Ubuntu's cloud image.

In the "Networks" tab the default is type "masquerade" on the management Network. I will change the type to bridge as I will eventually be using this to host DHCP servers, among other things. Also I have not experimented with filtering (if any) on a masquerade type network. The "Network" has to be changed from the internal "management Network", to "default/untagged".

In "Node Scheduling" I do not intend to lock down to any specific node. Its interesting to look at the rule engine for scheduling. I could set a key to force certain workloads to certain nodes. There does not seem to be a facility like "affinity" or "anti-affinity" rules like in VMware, which is too bad.

In "Advanced Options" I see the OS Type was autodetected as "Ubuntu", cool. See the above cloud-init section to cut and paste in the User Data and Network Data.

Click "Create" and wait. The UI stopped for a minute or two but stabilized rapidly...

First Five minutes with a new VM

The "test" VM is in status "running" on node "harvester-small-2".

In Rancher the operational tasks for a VM are in the "three dots" menu. Start, stop, reboot, snapshot, migrate, etc.

First thing I looked at was the logs. Note these are "Harvester" logs not VM logs. Lots of lines to research later on. Main thing I notice is every five seconds I see this log message:

"{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='vm_test' uuid=feec480d-31a1-59fa-9199-a330c83aa404 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6382","subcomponent":"libvirt","thread":"30","timestamp":"2023-02-22T20:11:36.552000Z"}"

The timestamp cannot be cut and pasted from a log message, which is annoying.

There is a "console" drop down for webvnc or serial console emulation. Both seem to work well with this Ubuntu image. Note that the "ubuntu" does not have a password set, have to configure that after logging in via SSH.

Troubleshooting Lore

Here's some troubleshooting lore, some of which might even be true.

Its possible to wedge into a situation where you can't log in as a static IP address and can't log in as serial because cloud init isn't working. Seems frustrating. The solution was to boot up as DHCP, sudo passwd ubuntu, then verify serial console is working, THEN mess around with cloud init network config while logging in over the serial console trying to get static IP addresses working.

I believe network data and/or the user data might only be read on first boot unless something like sudo cloud-init clean is run.

Wednesday, March 8, 2023

Rancher Suite K8S Adventure - Chapter 018 - Tour Harvester Cluster inside Rancher

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

This is similar to Chapter 016 where we toured the Harvester UI directly. Today we compare the differences between a Harvester cluster as seen in Rancher vs the direct Harvester UI.

Log in to Rancher and select "Virtualization Management" from the hamburger menu. "harvester-small" is the only HCI cluster at this time, click it.

Speed

The first thing I notice overall is the Rancher web UI is dramatically faster.

Multiuser RBAC Auth

Note that the "direct" web UI for Harvester has exactly one user, admin, with superuser privs. The Rancher interface could have multiple users, perhaps fifty, accessing multiple clusters, perhaps ten, with extensive RBAC options for each cluster. In rancher I use admin only to set things up, then I add a user for myself "vince". The Rancher UI for the cluster has an addition left side menu option "Cluster Members" and as an admin user I click "Add" and add myself as a cluster owner. The option for "Custom" permissions provides fine grained roles for cluster access.

Namespaces and Projects

The Rancher "Projects/Namespaces" menu corresponds to the Harvester "Namespaces" menu. Note that Harvester has no direct concept of Rancher Projects, obviously. Rancher automatically comes with two projects, "Default" which is prepopulated with Harvester's "Default" namespace, and "Not in a Project" (well, "not in a project" is not really a project, but whatever) and that project is prepopulated with the "harvester-public" namespace. Note that you can click-thru a namespace in Rancher, such as "harvester-public" and see its resources, it has configmaps and secrets and vm templates and stuff like that. However in the Harvester web UI you can not click thru and look at the stuff in a namespace. Probably the weirdest difference I can find between Namespace UI elements is Rancher does not display a "Download YAML" button for a namespace until you checkmark at least one namespace, whereas the Harvester UI displays a grayed out "Download YAML" button until a checkbox for a NS is clicked. So don't panic if you can't find the YAML download in Rancher, just remember to select a NS first before the button will appear...

Versions

Probably the funniest minor difference is the lower right corner of the screen reports the Rancher version on Rancher and the Harvester version on Harvester. Conceptually I initially expected the Rancher screen to display the Harvester version when I clicked thru into the Harvester cluster.

Aside from the above differences, the UIs are more or less identical and going forward I will always use the Rancher web UI to control Harvester, although I'll keep Harvester in mind for emergency type access, perhaps if Rancher crashes or something like that.

Tuesday, March 7, 2023

Rancher Suite K8S Adventure - Chapter 017 - Connect Harvester to Rancher

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

Today we connect the Harvester cluster to the Rancher manager. Its a short, simple task.

https://docs.harvesterhci.io/v1.1/rancher/rancher-integration

https://ranchermanager.docs.rancher.com/integrations-in-rancher/harvester

Harvester clusters are imported on the Virtualization Management page in Rancher.

"Import Existing"

Cluster name "harvester-small" haven't run into any problems with cluster names containing a dash... at least not yet.

I add myself and admin as role "cluster admin" in the "member roles" section.

Rancher will provide a registration link and instructions.

Switch over to a Harvester UI web page, "Advanced" then "Settings"

"Edit Settings" for "cluster-registration-url" then copy and paste the Rancher-provided URL into Harvester. Click "Save".

Around five minutes later the harvester-small appears in Rancher as an "Active" cluster.

As I understand it, the Harvester node driver was integrated into Rancher a long time ago and does not need to be added.

Tomorrow we look at the Web UIs in Rancher and Harvester now that they're linked.

Monday, March 6, 2023

Rancher Suite K8S Adventure - Chapter 016 - Tour Harvester UI

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

Today will be a quick tour of the Harvester v1.1.1 web UI. Not a deep dive just some familiarization with the UI.

Note the console password is my admin password but that password is separate from the overall cluster password configured in the web UI.

To my modest annoyance, the web UI for Harvester is not compatible with the web browser on my Android phone, can't even get past initial login page, too much Javascript, probably.

Dashboard

"Dashboard" - Most of the time you just need to glance here. For some odd reason the graphs started crashing almost immediately.

Hosts

"Hosts" - Reserved vs Used line graphs for CPU, memory and storage for each host in the cluster. Can click thru to an individual host, and that is where you'd enable maint mode on one host or take a look at Ksmtuned statistics. Note that you add additional disks for Longhorn storage in "Hosts" not "Volumes" just an interesting trivia point.

https://docs.harvesterhci.io/v1.1/host/

Virtual Machines

"Virtual Machines" - Note that you'll want to configure your SSH keys elsewhere before creating a VM. Note that you create backups in "Virtual Machines", configure your S3 or NFS backup store in "Advanced" and examine actual backups in "Backup & Snapshot".

Volumes

"Volumes" - This is your window into Longhorn. Note that you want to configure Storage Classes before configuring individual Volumes and SCs are configured in "Advanced" "Storage Classes". Note that the default "harvester-longhorn" SC creates triple replicas which is rough on a three host machine. You can't change the default SC its Helm managed, but you can clone and edit, although we're getting way off topic of a UI tour...

Images

"Images" - No images are loaded by default (kind of expected to see a SUSE here). Here's a link explaining how to upload images.

https://docs.harvesterhci.io/v1.1/upload-image

Namespaces

"Namespaces" - Harvester doesn't really have docs for Namespaces, so here's a k8s link about namespaces in a generic sense:

https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/

Networks

"Networks" - There's cluster networks:

https://docs.harvesterhci.io/v1.1/networking/clusternetwork

and there's VM networks:

https://docs.harvesterhci.io/v1.1/networking/harvester-network

We'll talk about that later in more detail. It's a simplification, but you can pretty much put your stuff under the "mgmt" cluster network by creating an untagged VM network, especially if you only have one ethernet port and are not using VLANs.

Backup & Snapshot

"Backup & Snapshot" - actually has three components not two, VM backups, VM snapshots, and Volume snapshots.

VM backup and snapshot docs:

https://docs.harvesterhci.io/v1.1/vm/backup-restore

vs volume snapshot docs:

https://docs.harvesterhci.io/v1.1/volume/volume-snapshots

For people who've never used VMware or OpenStack or even LVM, a snapshot is a backup that hasn't been saved somewhere and its useful because you can restore it. Or conceptually, a backup IS a snapshot its just that you're saving a copy of it somewhere off-disk for safekeeping. Its interesting how the concept of backup fractured into those two terms maybe 20 years ago, you'd think this idea would go back much further but it does not and is pretty recent in the history of "making backups" which must go all the way back to unit record keeping equipment a century (or more) ago.

Monitoring & Logging

"Monitoring & Logging" - doesn't do much out of the box, we will return to this topic in more detail once we have something to monitor and record.

Monitoring reference docs:

https://docs.harvesterhci.io/v1.1/monitoring/

Logging reference docs:

https://docs.harvesterhci.io/v1.1/logging/

Advanced

"Advanced" - All kinds of stuff from above that would be considered "set it and forget it" or set it one time at initial setup, at least optimistically.

This completes a very fast tour of the Harvester UI

Friday, March 3, 2023

Rancher Suite K8S Adventure - Chapter 015 - Small Harvester Cluster Additional Node Install

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

After installing the first cluster node, subsequent additional nodes are calm and anti-climactic. Which is nice, once in awhile.

BIOS config is the same for all nodes. The rechargeable BIOS battery in node harvester-small-3 was dead, quite annoying but it held charge long enough after charging overnight.

Install of an additional Harvester 1.1.1 node starts with an install question "Create a new" or "Join an Existing". Selected Join Existing on harvester-small-2.
Next it asks for install target, I'm installing Harvester itself to the sda and saving the new 1TB nvme0n1 for storage, must use MBR partition table.
Then it asks where to store VM data, that's going to be nvme0n1.
The hostname for harvester-small-2 is the short hostname it is not, AFAIK, the FQDN.
The Management NIC is the only possible option because this NUC only has one NIC.
I'm not using VLANs at this time on Harvester-Small cluster.
Bond mode is irrelevant for a single NIC device.
I am setting the network addressing static not DHCP.
The IPv4 Address is asking for a CIDR like 10.10.20.82/16.
The DNS server setting needs commas between DNS servers.
It will ask for the existing cluster VIP address which is 10.10.20.80 for registration with the existing cluster.
It will also ask for the cluster token which is harvester-small.
The console Password is my usual admin LAN password.
My NTP server is currently 10.10.5.2 which is a hardware NTP server.
It will take awhile to install and auto reboot, after which it magically adds itself to the cluster with no human intervention required, very cool!
If I log into the console web UI using the admin password (which is separate from the console password which merely coincidentally is also my admin password, although your two passwords might be different), then in the web UI I see my new nodes up and running.

Next blog posts on the roadmap are a tour of the Harvester cluster UI, then connection of Rancher to Harvester, then tour the Harvester cluster inside the Rancher UI, then experiment with provisioning some stuff.

Thursday, March 2, 2023

Rancher Suite K8S Adventure - Chapter 014 - Small Harvester Cluster First Node Install

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

As mentioned in the previous posts, Netboot.XYZ and PXEBoot are working flawlessly for many years on this LAN for network installation. Its been a long time since I installed an OS using an old fashioned USB stick or (GASP!) an optical cdrom. I go far back enough that I installed Linux off floppy disks back in '94, which took about one box of disks for SLS Linux. Anyway, combine the reliability and ease of use of network installs, with USB booting being flaky and unreliable on mid 2010's Intel NUC hardware, and obviously I'm going to try and network install Harvester. That turned out to be a huge mistake for not-entirely-documented reasons. So before explaining how to successfully USB install Harvester, I'll explain how to unsuccessfully network install Harvester.

How Not to Install Harvester

Netboot.xyz has a net install menu option for Harvester, on my "not recently updated Netboot" that's an ancient 1.0 version. I used the netboot web UI for netboot.xyz to online upgrade its boot menus to the latest collection, reload PXEBoot, now I have the latest netboot.xyz, which as of today is version 2.0.66, and that recent version CLAIMS it can net install Harvester v1.1.1. Cool! Initially, trying to install via netboot.xyz and PXEBoot seemed to work. In fact, it installs VERY quickly.

However, after installation and its first reboot, the console sits in "Setting up Harvester" forever and never completes the setup.

Logging into the console and debugging, I watched 'journalctl -f', and until it gets in to a permanent repeated failure loop for bootstrapmanifests/rancherd.yaml, all is well. If I try to figure out what went wrong with that yaml file, I run "kubectl get pods -A" and that shows harvester-cluster-repo is in ImagePullBackoff mode. Huh?

Apparently the pod harvester-cluster-repo-random is only available from the cdrom ISO image as a dynamic artifact of the build process and NOT from download, by design and intention. The USB installer has an extra partition or something full of containers used to "prime the pump" so to speak for air gapped installs or just to make installs faster, and there is some complicated PXE documentation online for Harvester to work around this, but the simpler setup in Netboot.xyz will not provide the "pump priming" collection of containers, and every container EXCEPT harvester-cluster-repo container is obtainable online, so installation appears to work but first boot fails as per:

https://github.com/harvester/harvester/issues/2651

https://github.com/harvester/harvester/issues/2670

So that's just lovely.

I opened a bug on Netboot.xyz to document that the harvester menu option doesn't actually work by default at this time:

https://github.com/netbootxyz/netboot.xyz/issues/1203

Its one of those inter-project compatibility problems where in some sense it's not really either side's fault, but it's their fault together, so it might take awhile to get fixed LOL.

This is why in the end I had to install Harvester with old fashioned USB media instead of PXEboot.

How to Install Harvester

The install of Harvester 1.1.1 starts with an install question "Create a new" or "Join an Existing". Created a new cluster on harvester-small-1.
Next it asks for install target, I'm installing Harvester itself to the sda and saving the new 1TB nvme0n1 for storage, using the NON-DEFAULT MBR partition table as per previous discussion of some Intel NUC BIOS issues.
Then it asks where to store VM data, that's going to be nvme0n1 the brand new 1 TB.
The hostname for harvester-small-1 is the short hostname it is not, AFAIK, the FQDN.
The Management NIC is the only possible option because this NUC only has one NIC.
I'm not using VLANs at this time on Harvester (did a lot of that in my VMware and OpenStack days LOL).
Bond mode is irrelevant for a one NIC device although it'll be a lot more fun another day on the SuperMicro SYS-E200-8D with their LAG bonded 10G ethernet ports.
I am setting the network addressing static not DHCP (LOL).
The IPv4 Address is asking for a CIDR like 10.10.20.81/16.
The DNS server setting needs commas between DNS servers.
My VIP mode is Static and for this cluster it will be 10.10.20.80 and I have the domain name harvester-small.cedar.mulhollon.com pointing to that, and that https URL will be the web interface for the cluster.
This is NOT a high security installation and as such my Cluster Token for the harvester-small cluster is harvester-small. Needless to say this cluster is firewalled off from the internet this is not a public cloud cluster, LOL.
The Password is my usual admin LAN password.
My NTP server is currently 10.10.5.2 which is a hardware NTP server.
It will take awhile to install, after which is auto reboots and continues installation, or sometimes feels like booting back into the USB (but not all the time) after which removal of the USB stick and rebooting will complete the install process.

After rebooting into the OS, hitting F12 on the console status window will toggle between console and a shell. Entered the password, ran top, watched for awhile. Its VERY busy setting up K8S and all kinds of "Harvester Stuff". Running journalctl -u rancherd -f is pretty interesting to watch as the cluster comes up. After it sets up RKE you can log out and log back in and kubectl will work and its entertaining to watch that. You don't have to actually "DO" anything, you can just patiently wait, but its fun to watch the logs and stuff inside.

Eventually, in perhaps ten minutes, the new cluster is "Green" and "Ready" status on the console. Whoo Hoo!

Initial web login to the VIP address as a https URL, asks to set a new password for the admin user. The process will fail later unless the password is over 12 characters (thanks for not telling us in advance until after entering a pword, LOL) Also agree to the Terms and Conditions and click continue, etc.

There's a whole lot of nothing going on after installing only one node, but, at least no errors or problems reported. I immediately note that approximately 4 out of 4 of my cores are reserved already LOL, will have to examine that situation. Actual use, as opposed to reserve, is, of course, practically near zero. Luckily v1.1.1 makes overcommit provisioning a little easier than in the past. Reportedly Harvester demands have been getting higher over time and I may not be able to run Harvester on this hardware, might need something a little more lightweight, but for experimentation and education with HCI it should be good enough.

I will do a post later on, taking a tour of the Harvester UI, connecting the cluster to Rancher and looking at it in Rancher, etc. However, for today, the first node of the new cluster is up and all is well.

Wednesday, March 1, 2023

Rancher Suite K8S Adventure - Chapter 013 - Small Harvester Cluster Design

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

Harvester-Small is a micro test cluster for HCI experiments. I know from experience with VMware these servers overheat and get squirrely under high load but its more than adequate for testing, experimentation, and educational purposes. I believe "bursty" low average workload would be fine on this hardware, probably OK to stream music or whatever.

Hardware

Harvester-Small cluster is composed of three 2016 Intel model NUC6i3SYH mini PCs formerly being used as multiple small FreeNAS (now TrueNAS) NUC servers for the former VMware cluster, with newly installed 1TB NVMe for storage using Longhorn.

https://www.intel.com/content/www/us/en/support/products/89189/intel-nuc/intel-nuc-kits/intel-nuc-kit-with-6th-generation-intel-core-processors/intel-nuc-kit-nuc6i3syh.html

These mini PCs had been upgraded to what seemed like a lot of RAM at the time, 32 gigs, back when I was running a complete set of VMware 5.x on them. I don't keep up with desktop PCs I don't know if 32 GB ram is considered a lot in 2023. Probably, yeah.

Four Intel NUC problems, mostly solved, or worked around anyway:

A peculiarity of the on board video hardware, or maybe VMware ESXi, is at some point in the last decade I had to plug in fake HDMI dongles for device boot to work. Something about boot failing if no HDMI was attached, so you can (or used to be able to) buy a little dongle plug that pretended to be a HDMI monitor for just this purpose. The aggravation level of a server that refuses to boot unless its plugged into a working HDMI monitor is immense. I don't recall if this problem went away with ESXi version 4 (or 5) or if it went away with a BIOS upgrade or went away with some unexpected BIOS setting, regardless I have a pile of HDMI simulation dongles no longer needed for successful boot.
A peculiarity of 2010's NUCs case design is the ethernet port was toleranced with somewhere around zero to negative millimeters of manufacturing clearance on the ethernet port so depending on exact shape and momentary angle and tension of ethernet cable, sometimes the ethernet behaves as if unplugged, its very aggravating for a server to sometimes appear to have a "burned out" ethernet port with a cable plugged in and latched, yet its really just a cable compatibility issue. Generally the higher quality cables, especially shielded patch cables, are less reliable and the cheaper more flexible usually less reliable cables are, when plugged into the back of a '16 NUC, ironically more reliable perhaps because they are more flexible and thus do not interfere with the side of the case being too close to the cable plug when installed.
Another fascinating problem with mid 2010s NUCs is they only support GPT partition table booting via manual selection using the "F10" boot menu. Plain old MBR partition tables boot perfectly normally. Naturally I found this out by doing a perfectly successful installation of Harvester using GPT, which can only boot with manual intervention on every single boot of this model of NUC, so I had the privilege of repeating the entire Harvester installation process, now with MBR boot records which work perfectly upon every reboot.
The final problem with mid 2010s NUCs is USB booting is a bit twitchy and I don't have a full mental model of the problem, therefore USB booting is never 100% reliable, although its probably over 90%. It can't be a hardware issue unless it affects multiple NUCs and multiple USB drives randomly. Generally, trying the "F10" manual boot menu seems to work often enough, whereas relying on various BIOS "boot orders" of various devices such that it would automatically boot off the USB only when its inserted, generally do not work or result in much headache. In summary, configure the BIOS to boot off the SDA drive and only the SDA all the time, and when trying to install Harvester using a USB boot drive just accept you'll be hitting F10 at power up and manually selecting the USB for booting, and sometimes trying it a couple times.

IP addressing and DNS

This specific data is not useful to anyone else, but it does demonstrate one working organization strategy and explain a few things about Harvester in general.

10.10.20.80 VIP harvester-small.cedar.mulhollon.com
10.10.20.81 NUC1 harvester-small-1.cedar.mulhollon.com
10.10.20.82 NUC2 harvester-small-2.cedar.mulhollon.com
10.10.20.83 NUC3 harvester-small-3.cedar.mulhollon.com

I think a good host name strategy is cluster name dash number and put the cluster's VIP at what would be "dash zero" for the cluster, as seen above. At least this is how I set it up in Netbox IPAM and AD DNS.

Note I'm not attempting the adventure of VLANs, I had plenty of experience with that back in the VMware days when it was naturally assumed every decent hypervisor host would have six to eight ethernet interfaces so I simulated that with VLANs, and back when vSAN couldn't encrypt data (and everyone felt that was normal, LOL) so you had to segregate vSAN storage traffic off to a 'private' VLAN no other machines had access to. Well, times change and the stereotype of K8S is and cloudiness is "just abstract it all away at the K8S level and let the cloud take care of itself" so one ethernet is enough. Maybe, in the Harvester-Large cluster, I will experiment with VLANs anyway just for fun.

Note that Rancher does its cluster LB by setting up multiple A or CNAME records for a hostname that points to the cluster members, whereas Harvester clusters set up a VIP. I don't know if its a "real VIP" following full RFC3768 like a router ethernet port would follow the standard or if its some software thing that kinda works well enough. I suppose a one liner explanation of real VRRP protocol is its something like anycast or multicast but for one "normal" IP address instead of residing in special multicast IP ranges (none of this is technically true but is close enough to get the flavor of that it does across).