Saturday, July 2, 2022

Adventures of a Small Time OpenStack Sysadmin Chapter 005 - The Big Cleanup

Adventures of a Small Time OpenStack Sysadmin relate the experience of converting a small VMware cluster into two small OpenStack clusters, and the adventures and friends I made along the way.

Adventures of a Small Time OpenStack Sysadmin Chapter 005 - The Big Cleanup

The primary narrative about the big cleanup of the old VMware cluster, is it was circular and forgetful.

If I had to do a big cleanup again, I'd apply more effort ahead or time to more fully organize two checklists, then wait a couple days to think it all over in the background, finally implement all at once in a sequential organized manner, ideally with no interruptions.

The first checklist would be all the virtual machines to be eliminated or consolidated or worked around.  Over time its very easy with "free" cluster resources to implement an entire machine to, essentially, do as little as replace a symlink or cronjob on another more logically located and designed machine.  There is also the eternal dilemma of an experiment that isn't working completely; should I wipe it and start fresh later, or save its current config and wipe it until later, or try to patch it up into working, or throw some effort into doing it correctly now?  Finally we all know that in reality there will be virtual machines that cannot be completely replicated by existing Ansible scripts, even if that is the standard or at least the goal.  So:  Put in the time to document and automated it now?  or Hope for the best and fix it later, I mean, I put it together once, I'm always more skilled over time, so I'm sure I can do it again probably faster and easier?  Also need a dependency tree... If you plan to wipe Apache Guacamole much later, and it was relying on a dedicated OpenLDAP server to provide its config, then you can't wipe the OpenLDAP today it has to go after the Guac server goes (back in the olden days this was the easiest way to configure Guac, the new web config is great and very usable...).

The second checklist would be an inverse of the provisioning procedure.  People are usually pretty good about documenting their provisioning procedure, usually not so good at documenting their DEprovisioning procedure.  It's very easy when working on this as a secondary task to forget or skip over steps due to an interruption.  Remove (or reassign?) the IP allocation in netbox.  Remove the VM in netbox.  Remove the computer from Active Directory.  Remove the DNS, forward and reverse or maybe none, from Active Directory DNS.  Remove from Todoist runbooks and scheduled maintenance planner.  Remove backup scripts and maybe immediately trash the backups or place for safekeeping.  Remove the Ansible playbook, maybe remove a no longer in use Ansible role, maybe other stuff.  If there was a link to it on the wiki, remove the link or place a note that its down until later.  Zabbix monitoring will go crazy when the host disappears, remove it.  If it hosted Docker containers the Portainer monitoring system will get agitated when the host disappears.  If you plan to ever reimplement the service/system, it needs to be ADDED to the Todoist project planner, or even worst and more confusing, maybe it needs to be added to the Redmine or maybe Asana, because the only thing better than having one project planning software tool is spreading tasks across three depending on higher-level project needs.

Anyway, took a couple days, part time, to properly and completely "spring clean" everything that could be deleted, even if only temporarily.

This should pay dividends later on; I'm not going to put time and effort into moving something that doesn't need to be moved because its not there anymore.

Stay tuned for the next chapter!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.