Showing posts with label ansible. Show all posts
Showing posts with label ansible. Show all posts

Thursday, March 7, 2024

Why Kubernetes Takes a Long Time

Why Kubernetes Takes a Long Time.

The Problem

Let's test something simple in Kubernetes on a fresh new bare-metal (running under Proxmox) RKE2 cluster, and deploy the classic intro app "numbers" from the book "Kubernetes in a Month of Lunches".  Other simple test apps will behave identically for the purposes of this blog post, such as the "google-samples/hello-app" application.

If you look at the YAML files, you'll see a "kind: Service" that has a "spec type LoadBalancer" and some port info.  After an apparently successful application deployment, if you run "kubectl set svc numbers-web" you will see a TYPE LoadBalancer with an EXTERNAL-IP listed as "<pending>" that will never exit the pending state and the service will be inaccessible from the outside world.

NodePorts do work out of the box with no extra software and no extra configuration, but you don't have to be limited to NodePorts forever.

The Solution

Kubernetes is a container orchestrator and it is willing to cooperate with external load-balancing systems, but it does not implement a load balancer.

That's OK.

If K8S can virtualize anything, why not virtualize its external load balancer?  This is not as bad of an idea as a VMware cluster getting its DHCP addresses set by a DHCP server running inside the cluster; if the cluster is impaired or down enough that the LB isn't working, the app probably isn't working either, so no loss.

We can install MetalLB in Kubernetes, which implements a virtual external load balancer system. https://metallb.universe.tf/

The Implementation

  1. Let's read about how to install MetalLB.  https://metallb.universe.tf/installation/  I see we are strongly encouraged to use IPVS instead of iptables.
  2. Research why we're encouraged to use IPVS instead of iptables.  I found https://www.tigera.io/blog/comparing-kube-proxy-modes-iptables-or-ipvs/ which explains IPVS scales roughly o(1) constant with traffic where the iptables version scales roughly o(n) with traffic.  OK, we have to use IPVS, which is an in-kernel load balancer that runs in front of or with kube-proxy and MetalLB.  Additionally, the K8S docs discussing kube-proxy are at https://kubernetes.io/docs/reference/networking/virtual-ips/
  3. Next, research IPVS.  Aggravating that every Google search for IPVS is autocorrected to IPv6, EVERY TIME.  Found http://www.linuxvirtualserver.org/software/ipvs.html
  4. Will this work with RKE2?  It's reported that both iptables and IPVS work fine with Calico.  RKE2 runs Canal by default which is Flannel between nodes and Calico for network policies, so I guess it's OK?  https://docs.rke2.io/install/network_options
  5. Time to set up IPVS on all RKE2 nodes.  The usual song and dance with automation, set up the first node completely manually, then set up in Ansible, test on the second node, then roll out slowly and carefully.  First IPVS setup step, install ipvsadm so I can examine the operation of the overall IPVS system, "apt install ipvsadm".  Not much to test in this step, success would be running "ipvsadm" and nothing weird seen.
  6. IPVS needs a kernel module, so without rebooting, modprobe the kernel ip_vs module, then try "ipvsadm" again, then if it works, create a /etc/modules-load.d/ip_vs.conf file to automatically load the ip_vs module during node reboots.
  7. Finally, add the IPVS config for kube-proxy to the end of the RKE2 config.yaml, merely tell kube-proxy-arg to use ipvs mode, and ipvs needs strict-arp.
  8. After a node reboot, RKE2 should have kube-proxy running in an IPVS compatible mode.  Success looks like running "ipvsadm" outputs sane-appearing mappings and "ps aux | grep kube-proxy" should show the options --proxy-mode=ipvs and --ipvs-strict-arp=true.  None of this manual work was straightforward and required some time to nail down.
  9. Set up automation in Ansible to roll out to the second node.  This was pretty uneventful and the branch merge on Gitlab can be seen here: https://gitlab.com/SpringCitySolutionsLLC/ansible/-/commit/65445fd473e5421461c4e20ae5d6b0fe1fe28dc4
  10. Finally, complete the IPVS conversion by rolling out and testing each node in the RKE2 cluster.  The first node done manually with a lot of experimentation took about half a day, the second took an hour, and the remaining nodes took a couple minutes each.  Cool, I have an RKE2 cluster running kube-proxy in IPVS mode, exactly what I wanted.
  11. Do I run MetalLB in BGP or L2 mode?  https://metallb.universe.tf/concepts/  I don't have my BGP router set up so it has to be L2 for now.  In the long run, I plan to set up BGP but I can spare a /24 for L2 right now.  Note that dual-stack IPv4 and IPv6, which I plan to eventually use, requires FRR-mode BGP connections, which is a problem for future-me, not today.
  12. Allocate some IP space in my IPAM.  I use Netbox as an IPAM.  Reserve an unused VLAN and allocate L2 and future BGP prefixes.  I decided to use IPv4 and 150 in my RFC1918 address space, I will add IPv6 "later".  I do almost all of my Netbox configuration automatically via Ansible, which has a great plugin for Netbox.  Ansible's Netbox integration can be seen at https://netbox-ansible-collection.readthedocs.io/en/latest/ The Ansible branch merge to allocate IP space looks like this: https://gitlab.com/SpringCitySolutionsLLC/ansible/-/commit/1d9a1e6298ce6f041ab4e98ad374850faf4a1412
  13. It is time to actually install MetalLB.  I use Rancher to wrangle my K8S clusters, it's a nice web UI, although I could do all the helm work with a couple lines of CLI work.  Log into Rancher, RKE cluster, "Apps", "Charts", search for metallb and click on it, "Install", "Install into Project" "System", "Next", "Install", and watch the logs. It'll sit in Pending-Install for a while.
  14. Verify the operation of MetalLB.  "kubectl get all --namespace metallb-system" should display a reasonable output.  Using rancher, "RKE" cluster, "Apps", "Installed Apps", namespace metallb-system should contain a metallb with reasonable status results.
  15. Configure an IPAddressPool for MetalLB as per the IPAM allocation in Netbox.  Here is a link to the docs for IPAddressPools: https://metallb.universe.tf/apis/#ipaddresspool Currently, I only have a "l2-pool" but I will eventually have to add a "bgp-pool".
  16. Configure an L2Advertisement for MetalLB to use the IPAddressPool above.  Here is a link to the docs for L2Advertisements: https://metallb.universe.tf/apis/#l2advertisement  Currently, I'm feeding "default" to "l2-pool" which will probably default to "bgp-pool" after I get BGP working.
  17. Try provisioning an application using a Service type LoadBalancer.  I used numbers-web as per the intro.  In the CLI, "kubectl get svc numbers-web" should show a TYPE "LoadBalancer" and an "EXTERNAL-IP" in your L2 IPAM allocation, and even list the PORT(S) mapping.
  18. Check operation in Rancher.  "RKE", "Service Discovery", "Services", click thru on numbers-web, the top of the page should contain a "Load Balancer" IP address, the tab "Recent Events", should see nodeAssigned and IPAllocated events, and the tab "Ports" should tell you the ports in use.
  19. Test in a web browser from the desktop.  Remember that the numbers-web app runs on port 8080 not the default 80.
  20. You can specify statically assigned IP addresses using a custom annotation described at: https://metallb.universe.tf/usage/#requesting-specific-ips  This is useful because I can add DNS entries in Active Directory using Ansible pointing to addresses of my choice.
For reference, a bare-bones ipaddresspool.yaml looks like this:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: l2-pool
namespace: metallb-system
spec:
addresses:
  - 10.10.150.0/24

And an equally bare-bones l2advertisement.yaml looks like this:

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: default
namespace: metallb-system
spec:
ipAddressPools:
  - l2-pool

The Summary

This took 20 logically distinct steps.  Don't get me wrong, K8S is awesome, MetalLB is awesome, RKE2 is awesome, however, everything takes longer with Kubernetes...  On the bright side, so far, operation and reliability has been flawless, so it's worth every minute of deployment effort.

Trivia

There are only two types of K8S admins, the ones who admit that at least one time they thought metallb was spelled with only one letter "L", and the ones who are liars LOL haha.  This is right up there in comedic value with RKE2 pretending that .yml files are invisible and only processing .yaml files.

Friday, March 10, 2023

Rancher Suite K8S Adventure - Chapter 020 - Prepare Terraform for Harvester

Rancher Suite K8S Adventure - Chapter 020 - Prepare Terraform for Harvester

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

I don't like manually configuring things.  I like IaaC with templates stored in a nice Git repo, it eliminates errors, deployments are faster, fewer human errors, it's just all around better than mousing and typing a virtual infrastructure.  So today we prepare Terraform to work with Harvester, but first, some work with multiple cluster kubeconfig files.

Multiple Cluster Kubectl

Start automating by configuring kubectl to talk to multiple clusters.

Reference:

https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/

The good news is its pretty easy to configure multiple clusters into separate kubectl contexts.  The bad news is its easy to select different contexts at runtime, in fact its so easy to select different contexts, that there have been multiple headline news stories about devops who thought they were permanently erasing their test cluster deployment, only to rapidly discovery they were actually in their production context, resulting in some amazing news headlines about outages and deleted data.  So, keep your wits about you and be careful.  I will set up multiple contexts some other time.

One of the cultural oddities of the K8S community is they like to call the kubectl config file by the generic phrase "your kubeconfig file".  What makes that odd is most installs do not have a file named kubeconfig or dot kubeconfig or kubeconfig.conf or whatever.  On my Ubuntu system, kubectl's config file, aka the "kubeconfig file" is configured by a file located at ~/.kube/config

I will usually be working with Harvester, so in my ~/.kube directory I keep yaml files named rancher.yaml and harvester.yaml and I can simply copy them over the ~/.kube/config file.

In summary, make certain that running "kubectl get nodes" displays the correct cluster... 

Terraform

https://developer.hashicorp.com/terraform

Terraform is similar in concept to CloudFormation from AWS or HEAT templates from OpenStack.  You write your infrastructure as source code, run the template, and terraform makes the cloud gradually closely resemble your template.  Not a script, so much as a specification.

Install Terraform

I should have installed Terraform back when I was installing support software like kubectl and helm.  Better late than never...

https://developer.hashicorp.com/terraform/downloads

https://www.hashicorp.com/official-packaging-guide

The exact version of the Ubuntu package I'm installing is 1.3.9 as seen at

https://releases.hashicorp.com/terraform/

And I'm doing an "apt hold" on it to make sure its not accidentally upgraded.

Here is a link to the Gitlab repo directory for the Ansible helm role:

https://gitlab.com/SpringCitySolutionsLLC/ansible/-/tree/master/roles/terraform

If you look at the Ansible task named packages.yml, the task installs some boring required packages first, then deletes the repo key if its too old, then downloads a new copy of the repo key if its not already present, gpg dearmor the key into 'apt' format, add the local copy of the repo key to apt's list of known good keys, install the sources.list file for the repo, does an apt-get update, takes terraform out of "hold" state, installs terraform version 1.3.9, finally places terraform back on "hold" state so its not magically upgraded to the latest version (1.4 or 1.5 or something by now).  Glad I don't have to do that manually by hand on every machine on my LAN, LOL.

Simply add "- terraform" to a machine's Ansible playbook, then run "ansible-playbook --tags terraform playbooks/someHostname.yml" and it works.  Ansible is super cool!

As of the time this blog was written, "terraform --version" looks like this:
vince@ubuntu:~$ terraform --version
Terraform v1.3.9
on linux_amd64
vince@ubuntu:~$ 

References

https://www.suse.com/c/rancher_blog/managing-harvester-with-terraform/

https://docs.harvesterhci.io/v1.1/terraform/

https://github.com/harvester/terraform-provider-harvester

https://registry.terraform.io/providers/harvester/harvester/latest


Tuesday, February 21, 2023

Rancher Suite K8S Adventure - Chapter 007 - Helm

Rancher Suite K8S Adventure - Chapter 007 - Helm

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

Helm version 3.11 is installed on all members of the Rancher RKE2 cluster and on my Ubuntu experimentation box using Ansible.  Honestly this is almost identical to the process for installing kubectl yesterday, it's just a different repo and different package.

https://helm.sh/docs/intro/install/

The exact version of the Ubuntu package I'm installing is 1.24.10-00 as seen at

https://helm.baltorepo.com/stable/debian/packages/helm/releases/

And I'm doing an "apt hold" on it to make sure its not accidentally upgraded.

Here is a link to the gitlab repo directory for the Ansible helm role:

https://gitlab.com/SpringCitySolutionsLLC/ansible/-/tree/master/roles/helm

If you look at the Ansible task named packages.yml, the task installs some boring required packages first, then deletes the repo key if its too old, then downloads a new copy of the repo key if its not already present, add the local copy of the repo key to apt's list of known good keys, installs the sources.list file for the repo, does an apt-get update, takes helm out of "hold" state, installs the latest package for helm version 3.11, finally places helm back on "hold" state so its not magically upgraded to the latest version (3.12 or 3.13 or something by now).  Glad I don't have to do that manually by hand on every machine.

Simply add "- helm" to a machine's Ansible playbook, then run "ansible-playbook --tags helm playbooks/someHostname.yml" and it works.

As of the time this blog was written, "helm version" looks like this:

vince@ubuntu:~$ helm version
version.BuildInfo{Version:"v3.11.1", GitCommit:"293b50c65d4d56187cd4e2f390f0ada46b4c4737", GitTreeState:"clean", GoVersion:"go1.18.10"}
vince@ubuntu:~$ 

Monday, February 20, 2023

Rancher Suite K8S Adventure - Chapter 006 - Kubectl

Rancher Suite K8S Adventure - Chapter 006 - Kubectl

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

Kubectl version 1.24 is installed on all members of the Rancher RKE2 cluster and on my Ubuntu experimentation box using Ansible.

https://kubernetes.io/docs/reference/kubectl/

https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands

https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/

The exact version of the Ubuntu package I'm installing is 1.24.10-00 as seen at

https://packages.cloud.google.com/apt/dists/kubernetes-xenial/main/binary-amd64/Packages

And I'm doing an "apt hold" on it to make sure its not accidentally upgraded.

Here is a link to the gitlab repo directory for the Ansible kubectl role:

https://gitlab.com/SpringCitySolutionsLLC/ansible/-/tree/master/roles/kubectl

If you look at the Ansible task named packages.yml, the task installs some boring required packages first, then deletes the Google K8S repo key if its too old, then downloads a new copy of the Google K8S repo key if its not already present, add the local copy of the Google K8S repo key to apt's list of known good keys, installs the sources.list file for Google's K8S repo, does an apt-get update, takes kubectl out of "hold" state, installs the latest package for kubectl version 1.24, finally places kubectl back on "hold" state so its not magically upgraded to the latest version (1.26 or 1.27 or something by now).  Glad I don't have to do that manually by hand on every machine, LOL!

Ansible makes life easy, all I need to do to have the most recent kubectl installed on a Ubuntu system is add "- kubectl" to that system's playbook, then run "ansible-playbook --tags kubectl playbooks/someHostname.yml" and like magic in seconds it'll work.

As of the time this blog was written, "kubectl version --short" looks like this:

vince@ubuntu:~$ kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.24.10
Kustomize Version: v4.5.4
The connection to the server localhost:8080 was refused - did you specify the right host or port?
vince@ubuntu:~$ 

Note that the last step is you probably want to enable bash autocompletion for kubectl in .bashrc for whatever username you log in as. My .bashrc file has a line like this:

source <(kubectl completion bash)

Mine is actually wrapped by some if $HOSTNAME lines, but whatever.

After you do this and log back in, you can type "kubectl" and hit tab a couple times and autocompletion will work. Pretty cool!

Friday, February 17, 2023

Rancher Suite K8S Adventure - Chapter 005 - Ubuntu 20.04 install on a Beelink Mini S 5095

Rancher Suite K8S Adventure - Chapter 005 - Ubuntu 20.04 install on a Beelink Mini S 5095

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

I do all installs using MatterMost Playbooks, this particular example is named "Bare Metal Ubuntu 20".  Life is easier with MatterMost.  When I figure out a convenient way to share Mattermost playbooks I'll add a link here.  If you've never used this software, you're missing out... I would describe it as similar to Slack meets an outline oriented todo app.

https://mattermost.com/

Hardware

The hardware I selected for my three node Rancher RKE2 cluster is Beelink Mini S 5095.  It's considerably cheaper than a Raspberry Pi, easier to get, Intel CPU based, much faster, much more storage, sadly the Raspberry Pi platform has been eliminated from the market by heavy competition and supply chain problems.  The Raspberry Pi was cool tech, for it's day, but its unavailable and/or too expensive now.  The Beelink is simply a mini size PC.  This particular model seems very popular in the set top box media player subculture, often used as a Plex or Emby front end instead of using Roku-type hardware.

https://www.bee-link.com/beelink-mini-s-n5095-mini-pc

BIOS configuration was uneventful.

Hit Del while booting to enter BIOS setup

Menu "Main" - Set hwclock to UTC time

Menu "Advanced" "MAC whatever IPv4 Network Configuration" - Configured Enabled, Enable DHCP

"Security" "Secure Boot" "Disable"

"Boot" - Setup Prompt Timeout change from 1 to 3, Quiet Boot Disabled

"Save and Exit" - "Save and Reset"

Reboot, hit del again to enter setup again (can't save and do a pxeboot in the same step, don't know why, doesn't really matter in the long run)

"Save and Exit" Boot Override "UEFI PXE"

I have a netboot.xyz installation on the LAN so I can PXE boot for OS installations.

https://netboot.xyz/

An example of how to configure the ISC DHCP server for PXE based netboot.xyz:

https://gitlab.com/SpringCitySolutionsLLC/dhcp/-/blob/master/header.dhcpd.conf.dhcp11

Likewise, if you use OpenStack and its HEAT template system, you can install netboot.xyz on Zun container service using this example:

https://gitlab.com/SpringCitySolutionsLLC/openstack-scripts/-/blob/master/projects/infrastructure/netbootxyz/netbootxyz.yml

OS

The Ubuntu 20.04 install was mostly uneventful, aside from the usual annoyances revolving around timezones settings, avoiding DHCP incorrect autoconfiguration, etc.  It's the usual Ubuntu experience.

In the Netboot.xyz menu: "Linux Network Installs (64-bit)"

"Ubuntu"

"Ubuntu 20.04 LTS Focal Fossa (Legacy)"

Don't use: "Ubuntu 20.04 LTS Focal Fossa (Subiquity)" - Install seems to hang at "hdaudio hdaudioCOD2: Unable to bind the codec"

"Install"

Reasonable defaults as usual

Full name for the new user: Ubuntu

Password for ubuntu user is "the standard LAN password", doesn't matter the username will be deleted after ansible connects it to AD anyway.

Force timezone to "Central" I don't live in Chicago LOL

The only software to install is OpenSSH server

Note that upon bootup it looks like a failed boot but ctrl-alt-f1 etc will work, very annoying.

Super annoying that it autoconfigures the enp2s0 ethernet as DHCP with no option to change. You can crash out of the DHCP setting and enter manual config mode.  If that fails and it installs in DHCP mode (super annoying) then:

boot, log in as ubuntu, sudo vi /etc/netplan/01-netcfg.yaml and do something like this:

network:
  version: 2
  renderer: networkd
  ethernets:
    enp2s0:
      dhcp4: no
      addresses: [10.10.20.71/16]
      gateway4: 10.10.1.1
      nameservers:
        addresses: [10.10.7.3,10.10.7.4]

Then a quick "sudo netplan apply" and "ip addr" to verify and of course ssh in over the LAN to verify.

sudo reboot now

There is some weird bug where Ubuntu looks like the boot failed but as soon as you hit C-A-F1 you see a login, who knows.  Weird text console bug at bootup doesn't seem to matter.

verify SSH works over the lan as the ubuntu user which ansible will bootstrap into an AD connection

sudo shutdown now

At this point I physically installed the new server in the data center rack.  Properly label ethernet cables on both sides using the BradyLabel model 41 (yeah, its a bit of a brag, I really like this label maker), update the port name so the Observium installation makes pretty graphs with the correct server name, all the usual tasks.

Here is a link to the Ansible playbook for rancher1.  There's nothing special or unusual about it, its just a very small desktop PC being configured into a server.

https://gitlab.com/SpringCitySolutionsLLC/ansible/-/blob/master/playbooks/rancher1.yml

At this point the server is completely integrated in my infrastructure, although no "K8S specific" software has been installed.  AD SSO works, NTP works, Elasticsearch logging and metrics work, Zabbix monitoring works, etc.

Saturday, October 23, 2021

Upgraded three servers from Devuan Beowulf OS version to Devuan Chimaera OS version

Upgraded three servers from Devuan Beowulf version to Devuan Chimaera version today.

A gold OS template for deployment, and two DNS servers.

The procedure to turn a gold OS image into a DNS server is handled entirely by some Ansible scripts I wrote; I used to use Puppet a long time ago, but tired of restarting Puppet agents to resolve misconfigured systems, and I prefer the "push" configuration of Ansible over "pull" technique of Puppet.  I can deploy a template in VMware, change its name and IP address, reboot it, connect it to the SSH web of trust and the Active Directory web of trust, run Ansible against it, and it turns into a fully featured DNS server in a couple minutes.  I could have downed the old version servers and brought up new, but the upgrade process was so flawless and fast on the template that I upgraded the pair of DNS servers instead of making new and reload; it only took minutes either way and I was interested to see what happens (given that nearly instant rollback is possible with VMware, and I'm alone on a Saturday morning, its not like there's any risk LOL)

First, in VMware vSphere, my back out plan, in case the upgrade went poorly, was to shut down the images, make a duplicate, and upgrade the dupes, and keep the untouched originals around in case something went wrong.  I've seen performance problem due to forgetting VMware snapshots were left up; less headache and "shut down the new one and start up the old one" is faster than VMware snapshot rollbacks, and I only use FOSS software on these servers, so there are no licensing issues like windows would require.  I can leave untouched images running and connect/disconnect image network interfaces in mere seconds...

These are resolution DNS servers not authoritative DNS servers, so a simpler plan is a better plan.  If they were authoritative I'd spin up new servers and test using 'dig' that they work properly.  But I'm the only person using these resolution servers on a Saturday morning, so its pretty safe.  The simplest plan that gets the job done is the plan most likely to be successful.  I would have to allocate two more routable IP addresses to run both test and production images simultaneously; its not really worth it to log into NetBox and justify the allocation.

After the VMware work, I upgraded the Devuan Beowulf packages to the latest/last versions.  The usual "apt-get update" "apt-get dist-upgrade" "apt-get autoremove" finally "apt-get clean" then run the ansible-playbook for each server against it, and test everything works.  Not much happened in the upgrades (I generally maintain each server every two months)

I do not store or configure major version configurations like Apt 'list' files in Ansible as the chance of a "surprise upgrade" is not worth the risk.  I only had a couple servers to upgrade so I removed the old /etc/apt/sources.list.d/beowulf.list file and set up a new /etc/apt/sources.list.d/chimaera.list file along the lines of the Devuan suggested file. 

The "apt-get upgrade" and "apt-get dist-upgrade" as per the Devuan suggested upgrade path was completely uneventful.  I have apt-listchanges configured to send changelogs and news to me via email.  I will save those emails for later reference in case any problem develops, but usually those upgrade logs end up deleted after a couple months.

According to those upgrade emails, recently Exim, the main transport agent, has undergone a substantial major upgrade possibly requiring configuration changes, and Gnupg now no longer uses ~/.gnupg/options file in favor of ~/.gnupg/gpg.conf.  For me, everything is fine, others may find those changes more relevant.

I ran "apt-get autoremove" and "apt-get clean" to clean up the upgrade.  Interesting to see Devuan no longer uses Python version 2 (although it is installable) so I had to update my Ansible configuration system inventory to specify Devuan based operating systems have a python path of "ansible_python_interpreter=/usr/bin/python3" instead of "ansible_python_interpreter=/usr/bin/python" for legacy python2.  I keep my Ansible scripts in a Git repository so I committed documented and uploaded my small change.

I ran my Ansible configuration script on the DNS server.  Aside from the previously mentioned upgrade from python2 to python3, it was uneventful.

I did a server reboot (technically un-necessary) to verify everything starts up correctly after a reboot, which it did.

I verified everything working on the DNS servers (note, I have a cluster and did one server at a time).  They both do forward and reverse for ipv4 and ipv6, and also forward a subdomain to an Active Directory domain controller cluster I also maintain, that's based on Samba, and it all works quite well.

I cleared any alerts in Zabbix, a LAN server monitoring system.  I run Zabbix using Docker images; it works well and alerts me to any server failures (such as reboots).  I could set a maintenance interval in Zabbix to silence alerting, but I believe it counterproductive; if the software upgrade fails and DNS queries no longer resolve, I want to know immediately rather than at the end of a scheduled maintenance interval...  Zabbix caught the server reboot, and also automatically opened a problem ticket "Operating system description has changed".  I acknowledged and closed that automatically opened problem ticket.

After the servers ran for an hour I checked the Zabbix performance graphs and there's no substantial change in performance.  Much less granular VMware monitoring more or less matched what I saw in Zabbix.  Always worrisome if CPU use or disk space go wildly higher OR lower after an upgrade.  Everything seems to be working normally.

Finally I updated the three runbooks I maintain in Todoist, an online web and mobile app for to do lists.  I set the next date to check up on the servers for two months from now, as usual for these servers, documented the upgrade in the server log, let the users know I'm done and how to reach me if necessary, etc.

In the future I will clean up and remove the old stuff in VMware, assuming the new DNS servers work fine and there's no reason to roll back.  Nice to know I can rollback almost instantly although typically there's no need.

Hilariously the only problem I had with the entire major version upgrade was the spelling of Chimera has apparently changed since my Dungeons and Dragons days, and is now spelled Chimaera.

My primary reference for the project was:

https://www.devuan.org/os/documentation/install-guides/chimaera/upgrade-to-chimaera

And that, in summary, is how to spend about two hours painlessly upgrading three Devuan servers to the latest version.