Sunday, August 21, 2022

Adventures of a Small Time OpenStack Sysadmin Chapter 053 - Plan 3.0

Adventures of a Small Time OpenStack Sysadmin relate the experience of converting a small VMware cluster into two small OpenStack clusters, and the adventures and friends I made along the way.

Adventures of a Small Time OpenStack Sysadmin Chapter 053 - Plan 3.0

Here is a list of required tasks and optimistic goals for the Plan 3.0 era which involves redeploying the Plan 1.0 Cluster 1 hardware into a new Kolla-Ansible deployed cluster 1.  At the conclusion of Plan 3.0 there will be two mostly-identical Kolla-Ansible clusters.  I'm not sure there will be a Plan 4.0.  I will be able to upgrade the clusters one at a time and via the power of load balancing and redundancy its no big deal if I have upgrade "issues" on one small cluster at a time.  Kolla-Ansible seems pretty reliable and relatively low stress WRT upgrades-in-place... so far.

Main Task

The main task of Plan 3.0 is Cluster 1 will be reinstalled using Kolla-Ansible instead of hand rolling like the old Plan 1.0 installation, with the following minor exceptions from how cluster 2 was set up during the Plan 2.0 era:

Bare Metal

Root partition on the controller should be more like 200G than 100G based on experience.

At the end of the Plan 2.0 era, my 100G root partition utilization looked like this:

Compute node os4: 23.9 gigs

Compute node os5: 13.1 gigs

Controller aka host os6: 82.8 gigs (on a 100G root partition, eeeeeeek!)

So I would feel more confident with a 200G root partition on the controller.   The compute nodes seem stable, low % use, and may as well save their disk space for Cinder to use.

Bare Metal

Configure Zabbix BEFORE deploying Kolla-Ansible, then disable the automatic creation in Zabbix of drive and interface graphing, because using the cluster will spam Zabbix full of unusable drives and interfaces and polling that slows it down.

Ansible on Bare Metal

During the era of Plan 2.0, I improved my LAN-Ansible configuration for bare metal on OS4, OS5, and OS6.  Using my system-wide LAN Ansible to set my default "vim" editor options and similar tasks has not interfered with whatever Kolla-Ansible is doing to set up Lib-Kuryr and Cinder and whatever else Kolla-Ansible does to spawn an OpenStack cluster.  I have to be "reasonably" careful going forward but this idea of two Ansibles is now a production meme rather than an experimental meme.

Keystone

I can federate the two cluster's Keystones together, so they imply...

Cinder

At some point I would like to get NFS working with Cinder.  I have three NFS servers, one "real" backed up NAS, and two test/experimental smaller NFS servers.

Cinder

Push Cinder traffic off the management VLAN and onto the pre-existing Storage VLAN.

Swift

Push Swift traffic off the management VLAN and onto the pre-existing Storage VLAN.  I would imagine this would require new rings to be deployed etc so do this before trying to do anything important with Swift.

Upgrade Kolla-Ansible

I've upgraded Kolla-Ansible in the Yoga series (not across releases, yet) and I'd like to document the process.  Its mostly painless?

Neutron

Push Overlay traffic off the management VLAN and onto the pre-existing Overlay VLAN previously used for VMware and NSX and so on.

Designate

Modify DNS such that os1 domains on os2 cluster will be secondary DNS relationship with each other.  They will back each other up rather than ignore each other.

Mistral

Mistral is so cool and works so well, I want to go out of my way to find a use case for it.  I intend to completely automate provisioning down to one push of a button.  In the Plan 1.0 era, provisioning was completely manual followed by Ansible playbook-based configuration.  In the Plan 2.0 era, provisioning was handed by Heat Orchestration Templates and after that completed, Ansible playbook configuration was run, so its kind of one button, pause, second button.  My hope for Plan 3.0 is to use Mistral to automate the complete provisioning process down to one button press, possibly to even include updating status in my Netbox IPAM system.

Prometheus

Use it or lose it.  By the end of the Plan 2.0 era I have Zabbix monitoring everything at bare metal OS level and below which includes IPMI hardware sensors, and Centralized Logging more or less successfully and efficiently funneling all the Docker container logs into a working Kibana.  I also have Prometheus installed and am not using Prometheus in any fashion; so find a use for it or wipe it to save the disk space / cpu cycles.

Backups

Backups in general should be a focus of the Plan 3.0 cycle.

ELK for endusers

At some point in this conversion process from VMware to OpenStack I have to admit I need to replace VMware's LogInsight, which combined cluster and enduser logs, with something, and currently I intend to use a simple dockerized ELK stack for enduser only.  This will be just another HEAT Orchestration Template in the server project, and some Ansible changes to point syslog outputs to the ELK stack.  The only reason this was not done in Plan 2.0 era was simple hardware capacity, which I do have in the Plan 3.0 era.

Neutron

Plan 1.0 had each instance dynamically assigning IP addresses from the pool.  Plan 2.0 mostly orchestrates the same address regardless which cluster an instance is running on today.  In Plan 3.0 I still intend to keep pools on each cluster, only for testing and experimenting use, trying out new OS images or similar, "production" will continue the Plan 2.0 tradition of using "real" ip address assignments rather than pool assignments.

*aaS

I'm basically getting rid of almost everything *aaS provided by OpenStack as per previous discussion.

Bare metal Docker

I have three Docker containers that need to talk to bare metal USB interfaces connected to obscure hardware, the explanation and details are a long story...  I was running docker on OS2, OS3 for these containers in the Plan 1.0 era, but Kolla-Ansible wants to run its own Docker and something about it does not cooperate with my Portainer monitoring and something about Kolla's installation of Zun and Kuryr makes manual installed docker containers not work, its all so tedious sometimes.  "In the old days" on VMware, I completely successfully relied on VMware's USB passthrough service, but the OpenStack Nova does not do USB passthrough AFAIK.  So as part of Plan 3.0 I set up a tiny Intel NUC bare metal hardware server with all the USB ports stuffed full of interesting hardware devices and I run my bare metal Docker containers on that physical server.  Sometimes to go cloudy, or cloudier, you have to un-cloud some unusual workloads; weird but true.

Tomorrow, reinstall bare metal OS on hosts 1, 2, 3.

Stay tuned for the next chapter!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.