Monday, November 27, 2023

Proxmox VE Cluster - Chapter 016 - Hardware Prep Work on the OS2 Cluster

Proxmox VE Cluster - Chapter 016 - Hardware Prep Work on the OS2 Cluster


A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


This will be similar to Chapter 012 although different hardware.

These microservers are three old SuperMicro SYS-E200-8D that were used for Homelab workloads.  They will become Proxmox cluster nodes proxmox004, proxmox005, and proxmox006.  

This server hardware was stereotypical for a late 2010's "VMware ESXi Eval Experience"-licensed cluster, and later worked very well under OpenStack.  1.90 GHz Xeon D-1528 with six cores and 96 GB of ram, 1 TB SATA SSD for boot and local storage, new 1 TB M2 NVME SSD for eventual CEPH cluster storage.

Hardware reliability history

Proxmox004 had its AC power brick replaced 2022-07-10

Proxmox005 had a NVME failure 2020-05-24, took advantage of that outage to also upgrade its SSD to a new 1TB (on suspicion, the old one was working fine although wearout measurement was getting to a high percentage per SMART reports) on 2020-05-27.

Proxmox006 had a NVME failure 2021-02-20, and had its AC power brick replaced 2022-06-10

Previously in Chapter012 I claimed that 5/6 of the power supplies had failed on my E800 microservers, but I made a mistake, and it seems "only" TWO THIRDS of the power supplies have failed as of late 2023, currently proxmox001 and proxmox005 are still running on the original mid 2010s power supplies.  I will keep a close eye on the output voltages (monitorable via IMPI using Observium and probably Zabbix and maybe somehow via Elasticsearch)

FIVE Ethernet ports

Even the official manufacturer's operating manual fails to explain the layout of the five ethernet ports on this server.  Looking at the back of the server, the lone port on the left side is the IPMI, then:

eno1 1G ethernet bottom left corner, 9000 byte MTU

eno2 1G ethernet top left corner, 9000 byte MTU

eno3 10G ethernet bottom right corner, 9000 byte MTU

eno4 10G ethernet top right corner, 9000 byte MTU

eno1 and eno2 are combined into bond12, which uses balance-xor mode to provide 2 GB of bandwidth.

eno3 and eno4 are combined into bond34, which uses balance-xor mode to provide 20 GB of bandwidth.  20 GB ethernet is pretty fast!

I run the VLANs as subinterfaces of the bond interfaces.  So, "Production" VLAN 10, has an interface name of "bond34.10"

Hardware Preparation task list

  1. Clean and wipe old servers, both installed software and physical dusting.
  2. Relabel ethernet cables and servers.
  3. Update port names in the managed Netgear ethernet switch.  VLAN and LAG configs remain the same, making installation "exciting" and "interesting".
  4. Remove monitoring of old server in Zabbix.
  5. Verify IPAM information in Netbox.
  6. Test and verify new server DNS entries.
  7. Install new 1TB M.2/NVME SSDs.
  8. Replace old CMOS CR2032 battery as it's probably 5 to 7 years old.  This is child's-play compared to replacing the battery on a hyper-compact Intel-NUC.
  9. Reconfigure the BIOS in each server.  For a variety of reasons, PXE netboot requires UEFI and BIOS initialization of the network, so I used that in the OpenStack era which was installed on top of Ubuntu.  However, I could not force the UEFI bios to boot the SATA SSD it insisted on booting the M.2 only, which is odd because it worked fine under older, USB-stick installed Ubuntu.  Another problem with the BIOS config was "something" about pre-initializing the ethernet system for PXEBoot messes up the bridge configuration on Proxmox's Debian OS, resulting in traffic not flowing; I experimented with manually adding other interfaces to the bridge; no go; symptoms were no packets flowing in (brctl showmac is essentially the bridge's ARP table) also no packets out, although link light up and everything looks OK.  Anyway, in summary, disable PXEboot entirely and convert entirely from UEFI to Legacy BIOS booting.  This was typical of the UEFI experience in the late 2010s, it doesn't really work most of the time, but Legacy BIOS booting always works.  Things are better now.

Next post will be about installing Proxmox VE on the old OS2 cluster hardware.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.