Proxmox VE Cluster - Chapter 019 - Proxmox Operations
A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.
Proxmox Operations is a broad and complicated topic.
Day to day operations are performed ALMOST entirely in the web GUI, with very few visits to the CLI. I have years of experience with VMware and OpenStack, and weeks, maybe even months of experience with Proxmox, so let's compare the experience:
- VMware: vSphere is installed on the cluster as an image, and as an incredibly expensive piece of licensed software, you get one (maybe two, depending on HA success) installation of vSphere and you get to hope it works. Backup, restore, upgrades, and installation work about as well as you expect for "enterprise" grade software.
- OpenStack: Horizon is installed on the controller and the controller is NOT part of the cluster. It's free, feel free to install multiple controllers although I never operated that way. Its expensive in terms of hardware as the core assumptions of the design assume you're throwing a rather large cloud, not a couple hosts in a rack. Upgrades are terrifying and moderately painful and long process. The kolla-ansible solution of running it all in containers is interesting although it replaces the un-troubleshoot-able complication of bare metal installation with an equal level of un-troubleshoot-able complication of Docker containers.
- Proxmox VE: Every VE node has a web front end to do CRUD operations against the shared cluster configuration database. The VE system magically synchronizes the hardware to match the configuration database. Very cool design and 100% reliable so far. Scalability is excellent; whereas OpenStack assumes you're rolling in with a minimum of a dozen or so nodes, Proxmox works from as low as one isolated node.
An interesting operational note is the UI on Proxmox is more "polished" and "professional" and "complete" than either alternative. Usually FOSS has a reputation for inadequate UI but Proxmox has the best UI of the three.
Upgrades
Lets consider one operational task. Upgrades. Proxmox is essentially a Debian Linux installation with a bunch of Proxmox specific packages installed on top of it. Not all that different from installing Docker or ElasticSearch from upstream. I try to upgrade every node in the cluster at least monthly, the less stuff that changes per upgrade the less "exciting" the upgrade. The level of excitement and drama and stress scales exponentially with the number of upgraded software packages with Debian-based operating systems in general.
The official Proxmox process for upgrades is just hit it, maybe have to reboot, all good.
As you'd expect, there are complications, IRL.
First I make a plan, upgrading all the hosts in one sitting because I don't want cross-version compatibility cluster issues, and I start with the least sensitive cluster host. Note that if you log into proxmox001 and upgrade/reboot proxmox002, you stay logged into the cluster. However if you log into proxmox001 and upgrade and reboot proxmox001, you lose web access to the rest of the cluster during the reboot (as a work around, simply log into the proxmox002 webui while rebooting proxmox001).
Next I verify the backups of the VMs on a node, and generally poke thru the logs. If I'm getting hardware errors or something I want to know before I start changing software. Yes this blog post series is non-linear and I haven't mentioned backups or the Proxmox Backup Server product but those posts are coming soon.
I generally shutdown clustered VMs and unimportant VMs and migrate "important" VMs to other hosts.
There are special notes about Beelink DKMS process for the custom ethernet driver using non-free firmware. Basically Proxmox 8.0 shipped with a Linux kernel that could be modified to use the DKMS driver for the broken Realtek ethernet driver, however, the DKMS driver does NOT seem compatible with the kernel shipped with Proxmox 8.1, so after some completely fruitless hours of effort, I simply removed my three Beelink microservers from the cluster. "Life's too short to use Realtek". You'd think Linux compatibility would be better in 2023 than 1993 when I got started, but really there isn't much difference between 2023 and 1993 and plenty of stuff just doesn't work. So, here's a URL to remove nodes from a cluster, which is a bit more involved than adding nodes LOL:
Other than fully completing and verifying operation of exactly one node at a time, I have no serious advice. Upgrades on Proxmox generally just work, somehow even less drama than VMware upgrades. Lightyears less stress than an OpenStack upgrade. Don't forget to update the Runbook docs and due date in Redmine after each node upgrade.
Note that upgrading the Proxmox VE software is only half the job, once that's done entirely across the cluster its time to look at CEPH. Again I mention these blog posts are being written long after the action, and I haven't mentioned CEPH in a blog post. Those posts are on the way.
Shortly after I rough drafted these blog posts, Proxmox 8.1 dropped along with an upgrade from CEPH Quincy to CEPH Reef. AFAIK any CEPH upgrade even a minor version number is basically the same as a major upgrade, just much less exciting and stressful. I do everything for a minor upgrade in the same order and process, more or less, as a major CEPH version upgrade, and that may even be correct. It does work, at least so far.
Next post, a summary and evaluation of "Architecture Level 1.0" where we've been and where we're going.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.