Wednesday, July 6, 2022

Adventures of a Small Time OpenStack Sysadmin Chapter 008 - Hardware decommission the old VMware ESXi hosts

Adventures of a Small Time OpenStack Sysadmin relate the experience of converting a small VMware cluster into two small OpenStack clusters, and the adventures and friends I made along the way.

Adventures of a Small Time OpenStack Sysadmin Chapter 008 - Hardware decommission the old VMware ESXi hosts

Removing hardware from a working cluster is, in some ways, harder and more nerve wracking than installing hardware.  I only accidentally pulled one unrelated connection loose.  There's only two types of sysadmin out there, ones that will tell you they accidentally disconnect the wrong cable once in awhile and those that will lie when they claim the never done that.  Well that's what "maintenance windows" are for, along with automated monitoring systems like Zabbix, and documentation, and proper cable labeling from my good ole Brady BMP-41 which is older than my kids but still works great.  You can't control random human accidents, but you CAN control how professionally and quickly you respond and resolve them... And its not that hard to do a better than average job.  Whatever, its fixed now, anyway.

I thermal camera scanned the innards of the hosts, nothing unexpected seen, powered them down, take the hardware out of the rack, great opportunity to blow all the dust out, wiggle and tap down all the connectors, generally inspect for any problems, none found.

I did run in to a problem some years ago with the SuperMicro SYS-E200-8D ten gigabit ports specifically (oddly, not the 1G ports right next to them) and some really nice looking CAT-6, CAT-7, CAT-8 whatever fancy ethernet cables I bought specifically for my fancy 10 gig switch because its cool, or at least expensive, so I have to use only the coolest cables.  Turns out some combination of mechanical tolerances on the server jacks and the ethernet plugs conspired such that with a minimal force on the plugs in certain directions even just the weight of the cable itself hanging, they would electronically disconnect although they were physically latched and that would drop the ethernet link causing me much headache until I tracked down the problem.  Luckily I don't use token ring networks or there would be tokens lying all over my wanna-be-data-center floor (that's supposed to be a joke...).  Anyway I made a point of buying mechanically better, although cheaper cables (LOL) that work perfectly under any condition and whenever I find one of those old faulty cables I chop it in half like a venomous snake and toss the halves out.  Somehow years later, today I found one.  An old-style unreliable cable, not a snake, I mean.  So I was happy to dispose of that unreliable cable.  Obviously it worked for many years, but I don't trust them, at least a quarter of them randomly disconnected in the past and I don't need that headache.  Does anyone periodically inspect their ethernet cables, at least on unclassified LANs?  I don't, but maybe I should?  Who has time for that is a better question... 

Anyway, the old $5 USB boot flash drives were starting to fail after half a decade of continuous use, so I pulled them, I'll be booting off the internal SSD.  Interesting anecdote, this model of SuperMicro motherboard and BIOS can NOT boot off the M.2 NVME device but can boot off the SSD.  I never ran into that situation before.  Interesting.

The hosts were taken apart, cleaned, re-labeled, dusted, generally serviced and reconditioned, I even scanned them with my thermal camera before pulling the plug to see if there's anything overheating to worrisome extent, now that its all reconditioned I treat it as if it's new hardware, time for some burn in testing.

Stay tuned for the next chapter!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.