Monday, October 30, 2023

Proxmox VE Cluster - Chapter 004 - Infrastructure Prep

Proxmox VE Cluster - Chapter 004 - Infrastructure Prep


A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


Before I change anything, I wanted to prepare as much infrastructure as possible.  I did not want to interrupt a conversion step via the realization I forgot to allocate IP addresses or forgot to configure the DNS server.


For day-to-day documentation, such as lists of IP address or URL links to services, I use Dokuwiki which is the most basic and simple wiki software I can find.  I created a new wiki page and added the entire IP allocation for the new cluster:

  • proxmox001 (old os1 node in OS1 cluster) SuperMicro SYS-E200-8D 10.10.8.1
  • proxmox002 (old os2 node in OS1 cluster) SuperMicro SYS-E200-8D 10.10.8.2
  • proxmox003 (old os3 node in OS1 cluster) SuperMicro SYS-E200-8D 10.10.8.3
  • proxmox004 (old os4 node in OS2 cluster) SuperMicro SYS-E200-8D 10.10.8.4
  • proxmox005 (old os5 node in OS2 cluster) SuperMicro SYS-E200-8D 10.10.8.5
  • proxmox006 (old os6 node in OS2 cluster) SuperMicro SYS-E200-8D 10.10.8.6
  • proxmox011 (old harvester-small-1) Intel NUC6i3SYH 10.10.8.11
  • proxmox012 (old harvester-small-2) Intel NUC6i3SYH 10.10.8.12
  • proxmox013 (old harvester-small-3) Intel NUC6i3SYH 10.10.8.13
  • proxmox021 (old rancher1) Beelink N5095 10.10.8.21
  • proxmox022 (old rancher2) Beelink N5095 10.10.8.22
  • proxmox023 (old rancher3) Beelink N5095 10.10.8.23
  • proxmox031 (old docker) Intel NUC6i3SYH 10.10.8.31
  • proxmoxbackup (old server bare metal hardware) 10.10.8.254
  • freenas.cedar.mulhollon.com 10.10.20.4 (use IP addresses for NFS so as to not rely on DNS)
As systems came online, I added hyperlinks to their web GUI and generally keep the wiki page up-to-date with the current configuration.  If I'm working on the cluster, I probably have this wiki page open.


Set up Redmine project for Proxmox:

I use Redmine for project level long term, large scale documentation.  I created a project in Redmine to basically be a more detailed version of the wiki page.


Set up simple insecure NFS on TrueNAS for non-production short term testing:

https://pve.proxmox.com/wiki/Storage:_NFS

https://pve.proxmox.com/pve-docs/chapter-pvesm.html#storage_nfs

A list of six content types to create shares for in TrueNAS (as documented in the Wiki and in Redmine, of course):

  1. proxmox-containers
  2. proxmox-containertemplates
  3. proxmox-diskimages
  4. proxmox-isoimages
  5. proxmox-snippets
  6. proxmox-vzdumps

Creating the six NFS exports in TrueNAS for Promox:

  1. First create the datasets, for later NFS export.  
  2. "Storage", "Pools", in freenas-pool, three dots "Add Dataset".
  3. Name: the id from the list
  4. Comments: "something that makes sense"
  5. Compression Level: off
  6. "Submit"
  7. Then export the six datasets.
  8. "Sharing", "Unix Shares (NFS)",  "Add", "Advanced Options"
  9. Path: /mnt/freenas-pool/proxmox-diskimages (or similar)
  10. Maproot User blank
  11. Maproot Group blank
  12. Mapall User: root
  13. Mapall Group: wheel
  14. Optionally add authorized networks and hosts, later.  No need to access outside 10.0.0.0/8, obviously.
  15. Add symlinks in my homedir to the automounter locations "ln -s /net/freenas/mnt/freenas-pool/proxmox-diskimages ~"
  16. Test by creating and deleting some files.


Add sysadmin issue tasks in Redmine in the "Systems Administration" project for all cluster nodes.


Document all the changes and allocations in Netbox

Here is the link to the Port / Services list for Proxmox VE nodes:

https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_requirements


Set up a USB flash boot drive for servers that can't PXEboot

Download Proxmox VE 8.0-2 and put it on a properly labeled USB flash drive.

https://www.proxmox.com/en/downloads/proxmox-virtual-environment/iso

https://pve.proxmox.com/pve-docs/chapter-pve-installation.html#installation_prepare_media


At this point I think I've prepared everything possible.  In the next post, I start the conversion work.

Friday, October 27, 2023

Proxmox VE Cluster - Chapter 003 - Order of Operations for Level 1.0

Proxmox VE Cluster - Chapter 003 - Order of Operations for Level 1.0


A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


The order of operations is complicated because I intend to keep most of the workload fully operational during the conversion.  Its kind of like rebuilding a car engine while driving the car down the road.


  1. Convert the old Rancher "RKE2" K8S cluster into a very small Proxmox VE cluster.  This step will be successful if I have a working three node Proxmox cluster.
  2. Move everything off the small Rancher Harvester test cluster, which is currently slowly running an old version of Harvester, and add those nodes into the small Proxmox VE cluster.  The measure of success will be having six clustered Proxmox nodes.
  3. Get some practice with VMs on the small test cluster.  Success will be defined as some working scratch (test load only) Ubuntu servers with a couple days "burn-in" and operational experimentation.
  4. Move a minimal set of very small production services on the small Proxmox VE cluster.  Maybe start with the wiki server, one of the multiple Active Directory Domain Controllers, just enough to prove out the operation of the cluster.  I consider this step a success if everything "important" on the old OS1 cluster is minimally running on the new Proxmox server reliably for a couple days.
  5. Migrate all production workload off the OS1 OpenStack cluster then add the former OS1 nodes into the now medium-sized Proxmox VE cluster.  Success at this step looks like a Proxmox cluster of nine nodes running a minimal production workload for a couple uninterrupted days.
  6. Roll ALL production workload off the OS2 OpenStack cluster into the now medium sized Proxmox VE cluster, likely a tight fit.  Success looks like OS2 having zero load and Proxmox carrying the entire production load, although in theory if Proxmox crashed I have the OS2 cluster has a hot-backup to Proxmox.
  7. Convert the remaining OS2 OpenStack cluster into even more Proxmox cluster capacity.  This step is a success if the Proxmox cluster has twelve operating nodes holding the entire production load, and I'm no longer running RKE2 or Harvester or OpenStack or any other cluster system on bare metal.
  8. Verify Operation, load balancing, run it for awhile before working on Architecture level 2.0.  Success looks like no crashing, no bugs, no issues, optimized CPU/memory settings and optimized workload across the dozen cluster nodes.

Next post will be about Infrastructure preparation efforts, get as much stuff ready as possible before starting the big conversion project.

Wednesday, October 25, 2023

Proxmox VE Cluster - Chapter 002 - Plan for Architecture Level 1.0

Proxmox VE Cluster - Chapter 002 - Plan for Architecture Level 1.0


A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


Here's my detailed plan for Architecture Level 1.0:


In summary, Level 1.0 means drop everything from multiple old clusters into a single large, simple-as-possible Proxmox VE cluster via several conversion phases, while making absolute minimum changes to design and workload.


A list of Goals for Architecture Level 1.0: 

  1. All production storage on the giant TrueNAS over NFS.  Cluster-wide filesystems implemented later.
  2. VMs manually provisioned, like the older VMWare era.  I like Terraform and Ansible as tools to provision IaaS, but I will implement that later.
  3. Generally document and make minimal changes in Ansible with respect to the workload VMs and containers.


Some of the eternal ongoing projects such as re-IP addressing will continue as part of Arch 1.0, which is ambitious.  In a way it makes the conversion from OpenStack and Harvester to Proxmox VE simpler, if the new VM has a new IP address.  As usual I will polish and refine my Netbox information, clean up runbooks stored in Redmine, but the theme will be making minimum-possible changes rather than implementing ambitious new ideas at the same time as the cluster conversion.


The question of Docker Containers...


https://pve.proxmox.com/wiki/Linux_Container

"If you want to run application containers, for example, Docker images, it is recommended that you run them inside a Proxmox QEMU VM. This will give you all the advantages of application containerization, while also providing the benefits that VMs offer, such as strong isolation from the host and the ability to live-migrate, which otherwise isn’t possible with containers."


OpenStack Zen containers were cool, and K8S obviously runs container workloads very well, however the above Proxmox information implies I will have to go back to the VMware era of setting up "container containers" to hold my Docker containers.  No big deal, but it will take some time to roll everything.  Generally I do "container containers" by installing a simple Ubuntu server, then running Docker off a NFS mount so there is no local state stored on the Ubuntu server (making it trivial to rebuild, and also making backups very simple as all state is just files on the NFS server).


In the next post, I will discuss the complicated order of operations due to various dependencies and operations requirements.

Monday, October 23, 2023

Proxmox VE Cluster - Chapter 001 - Why switch to Proxmox VE?

Proxmox VE Cluster - Chapter 001 - Why switch to Proxmox VE?


A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


Why switch from OpenStack / Harvester / RKE2 K8S, VMware, and other tech to a standard data center baseline of Proxmox VE?

  • The hardware load and requirement is high for Harvester.  Harvester is awesome but some of my smaller nodes spend most of their CPU cycles and memory running Harvester itself rather than running my workloads.  A full Rancher cluster to control Harvester is very cool technology, but the hardware load is expensive.
  • Harvester upgrades fail because the hardware load is too high.  Related to the above, I'm having trouble upgrading the more heavily loaded nodes because they can barely run Harvester at all, much less afford the extra system load to upgrade K8S.
  • I can't really go backward to VMware.  Hardware compatibility lists, etc.  Technically I could spend the money but I don't think it would be worth it.
  • I've learned everything I can learn from OpenStack and looking at trends it's time to 'jump ship' from OpenStack.
  • Upgrades for OpenStack kolla-ansible are non-trivial and a little more interactive than I would prefer.  I'm running two OS clusters and will push the workload over to one cluster temporarily while upgrading the other cluster.  It takes a lot of time and sysadmin effort.
  • Proxmox has some cool new features to experiment with, like native CEPH distributed cluster filesystem integrated into the system, and the very cool looking Proxmox Backup Server system.
  • I want a "single pane of glass" to manage my cluster hardware with respect to monitoring, control, backup, etc.  I will put everything on Proxmox and control everything via a single Proxmox cluster.  I don't want a RKE2 K8S cluster AND a Harvester cluster AND two OpenStack clusters to manage, just one big Proxmox cluster, ideally.


Very high level plan for the overall conversion project:

  1. Architecture level 1.0 will be a phased conversion of all workload into a large Proxmox VE cluster.  This will be an OpenStack / Harvester type design and workload wedged into fitting in to Proxmox VE.
  2. Architecture level 2.0 will be system integration to make this a Proxmox-styled cluster, integrating with monitoring and automation and generally adapting the workload to make everything feel integrated rather than a different system's workload being temporarily run on Proxmox.
  3. Architecture level 3.0 will be more R+D focused, advancing into interesting extra features that are Proxmox-specific, such as a cluster wide filesystem, the Proxmox Backup Server system, some interesting advanced networking ideas, new stuff in general.


How did I get here?

Can't figure out how to get where you're going, unless you know how you got where you are right now.

  • Around the turn of the century, had bare metal Linux servers running LXC and also the FreeBSD equivalent.
  • Around the 2010s, had some sysadmin level experience with VMware at work, and I signed up for the "ESXi Evaluation Experience" which gives you limited non-commercial license to pretty much the entire collection of VMware software.  This was pretty awesome for several years, although the continual drift in the hardware compatibility list and hardware requirements increasing dramatically over time, and generally being tired of paying for even a discounted VMware license, meant I moved away from VMware.
  • Around the late 2010s / 2020 timeframe, replaced the VMware cluster with OpenStack.  OpenStack is FOSS, but the labor required to keep it up is expensive.
  • In the early 2020s I started experimenting with RKE2 K8S on bare metal, and Rancher's Harvester bare metal virtualization solution.  Nice tech and works well, but it's designed for "larger" individual nodes than I can afford.

The above leads me to being interested in Proxmox VE to underlie my entire mini-datacenter.  I will eventually put everything on Proxmox except for my NAS and a stand alone Proxmox backup server.

Something to note about this series is the blog posts appear some time after "the action".  So as you read a new blog post, this all happened some weeks / months ago.  This gives me time to document things I've missed, circle back around, etc.  On the bad side I might miss some details of something I did months ago.  On the good side, I've circled back to document bugs and workarounds and any other areas of friction, which should save you, as the reader, some time if you implement a Proxmox cluster at your site.

Next post in the series will be a more detailed description of my Architecture level 1.0.