Proxmox VE Cluster - Chapter 012 - Hardware Prep Work on OS1 cluster
A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.
These microservers are three old SuperMicro SYS-E200-8D that were used for Homelab workloads. They will become Proxmox cluster nodes proxmox001, proxmox002, and proxmox003.
This server hardware was stereotypical for a late 2010's "VMware ESXi Eval Experience"-licensed cluster, and later worked very well under OpenStack. 1.90 GHz Xeon D-1528 with six cores and 96 GB of ram, 1 TB SATA SSD for boot and local storage, new 1 TB M2 NVME SSD for eventual CEPH cluster storage.
Hardware reliability history
Proxmox001 is the only server out of six that is still running the original AC power supply. The other five required replacement. Voltage would sag lower and lower until there were random reboots under heavy load, and eventually the supplies would fail completely. Thankfully its just a dead power supply and the rest of the hardware has been extremely reliable. I can't recommend SuperMicro hardware enough, its really good stuff... other than the power supplies from the late 2010s.
Proxmox002 had its AC power brick replaced 2020-11-19 and AGAIN on 2023-07-01
Proxmox003 had a NVME failure 2021-02-20, AC power brick replaced 2022-05-31
FIVE Ethernet ports
Even the official manufacturer's operating manual fails to explain the layout of the five ethernet ports on this server. Looking at the back of the server, the lone port on the left side is the IPMI, then:
eno1 1G ethernet bottom left corner, 9000 byte MTU
eno2 1G ethernet top left corner, 9000 byte MTU
eno3 10G ethernet bottom right corner, 9000 byte MTU
eno4 10G ethernet top right corner, 9000 byte MTU
eno1 and eno2 are combined into bond12, which uses balance-xor mode to provide 2 GB of bandwidth.
eno3 and eno4 are combined into bond34, which uses balance-xor mode to provide 20 GB of bandwidth. 20 GB ethernet is pretty fast!
I run the VLANs as subinterfaces of the bond interfaces. So, "Production" VLAN 10, has an interface name of "bond34.10"
Hardware Preparation task list
- Clean and wipe old servers, both installed software and physical dusting.
- Relabel ethernet cables and servers.
- Update port names in the managed Netgear ethernet switch. VLAN and LAG configs remain the same, making installation "exciting" and "interesting".
- Remove monitoring of old server in Zabbix.
- Verify IPAM information in Netbox.
- Test and verify new server DNS entries.
- Install new 1TB M.2/NVME SSDs.
- Replace old CMOS CR2032 battery as it's probably 5 to 7 years old. This is child's-play compared to replacing the battery on a hyper-compact Intel-NUC.
- Reconfigure the BIOS in each server. For a variety of reasons, PXE netboot requires UEFI and BIOS initialization of the network, so I used that in the OpenStack era which was installed on top of Ubuntu. However, I could not force the UEFI bios to boot the SATA SSD it insisted on booting the M.2 only, which is odd because it worked fine under older, USB-stick installed Ubuntu. Another problem with the BIOS config was "something" about pre-initializing the ethernet system for PXEBoot messes up the bridge configuration on Proxmox's Debian OS, resulting in traffic not flowing; I experimented with manually adding other interfaces to the bridge; no go; symptoms were no packets flowing in (brctl showmac is essentially the bridge's ARP table) also no packets out, although link light up and everything looks OK. Anyway, in summary, disable PXEboot entirely and convert entirely from UEFI to Legacy BIOS booting. This was typical of the UEFI experience in the late 2010s, it doesn't really work most of the time, but Legacy BIOS booting always works. Things are better now.
In the next post, we install Proxmox VE on the old OS1 cluster hardware. It'll be interesting with all those VLANs and LAGs.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.