Wednesday, July 27, 2022

Adventures of a Small Time OpenStack Sysadmin Chapter 029 - The Plan, v 2.0

Adventures of a Small Time OpenStack Sysadmin relate the experience of converting a small VMware cluster into two small OpenStack clusters, and the adventures and friends I made along the way.

Adventures of a Small Time OpenStack Sysadmin Chapter 029 - The Plan, v 2.0

In the "good old days" back when Fred Brooks from "The Mythical Man-Month" and I were kids, there was the concept of the dreaded "Second System Effect". I am very familiar with this hazardous illness that affects software projects and aim to avoid it all costs while designing my new Plan 2.0 for the second cluster.

https://en.wikipedia.org/wiki/Second-system_effect

Many aspects of Plan 2.0 are reactions to experiences implementing Plan 1.0.

One exciting event during the middle of the Plan 1.0 era was my Unifi hardware firewall died completely after a reboot, in the process of moving it from a docker container hosted controller on VMware to a new docker container host for the controller on OpenStack.  Nothing to do with OpenStack directly, it was just that piece of hardware's time to go.  So it went.  At a most inopportune time.  My old VMware cluster had access to all VLANs on my network, including the layer 2 link VLAN between my cablemodem and the dearly departed hardware firewall.  With respect to networking, this is not my first rodeo and in the old days I used to set up Linux based internet firewalls using ipchains, later iptables, and a variety of software router appliance solutions.  Normally I'd spin up a linux image with two interfaces, one connected to the cablemodem using DHCP the other static IP address on the main LAN, and use about three lines of iptables rules to NAT between those interfaces and my entire LAN would be back on line.  I cannot do that with Plan 1.0 of OpenStack because I set up a single flat provider network on my LAN, so I set this up on the old VMware cluster, and it worked fine until I acquired and installed a new internet firewall a couple days later.  The summary of this long story is Plan 2.0 will include VLAN access to ALL of my VLANs just like VMware had, instead of a single flat provider interface.  Then I could implement this workaround on OpenStack just like I did on VMware.  Flexibility is the key to a happy infrastructure solution.  VLANs on OpenStack ended up requiring a remarkable amount of extra Neutron Service work, details in a later post...

Everything in Plan 1.0 was about hand configuring instances then automating with Ansible.  That got tiring pretty fast and I planned to automate with the CLI, which changed during the Plan 2.0 era into orchestrating with OpenStack Heat Templates which work AMAZINGLY flawlessly well.

One interesting problem with OpenStack networking is the default port security groups are semi-smart and "know" what IP addresses the instance should be using, which is great if you properly configure the IP address at initial port configuration time, but if you slam ports around between self-service and provider networks while experimenting and learning, or convert on-the-fly from DHCP to static addresses, or similar modifications, built in port security will oft get confused then block all that "fake" traffic and your instances will be unreachable, or you can completely shut off port security, which isn't any worse than using bare VMware (without NSX).  As a reaction to that experience, part of Plan 2.0 is configuring and using security groups for each instance.  For some applications this is pretty trivial, OK allow TCP port 80 for HTTP based services, fine whatever no great effort there.  For other applications like Samba or old versions of NFS, its a bit more work, LOL.  Details in a later post.

In Plan 1.0, hand rolling my own homemade installation "worked" but poorly and at great effort.  Looks like Plan 2.0 will use Kolla-Ansible, everyone implies it "just works".  Turns out there's a lot undocumented about it, or poorly documented, anyway.  It turns out that taking a VERY complicated system and then layering another VERY complicated automation system on top of it doesn't make anything simpler or easier to learn, although eventually I was successful.  Details later on.

I figured it would be exciting to watch the Ansible on my network and Kolla-Ansible try to "fight it out" WRT OpenStack server configuration.  Despite writing this series as a retrospective, even months later I'm still a little apprehensive about the two systems fighting each other.  So far, mostly so good?  Nothing terrible has happened ... yet.  I still worry that some completely well intentioned change in my LAN Ansible config, maybe for NTP or iSCSI or something, will totally blow up my Kolla-Ansible and it will be a bear to fix something I did not write and is extremely large and complicated.

My DNS design will be completely revamped based on experiences with Designate service on Plan 1.0.  Basically, instead of trying to run six bare metal auth and resolution hosts on all six hosts, I will go back to having four resolver instances for my four AD domain controllers, and auth for Designate in Kolla-Ansible is handed by a Docker container set up by Kolla-Ansible and running in a new domain name and my resolvers forward to those auth servers for the one specific domain.  Which seems complicated but it is, in some ways, simpler, and it works quite well.  Details in a later post.

There were some other changes in Plan 2.0 required while implementing Plan 2.0, but to "keep things fair" I will stop here, with the plan as it was optimistically designed at the start of the Plan 2.0 project, and explain what had to be "emergency changed" later on, as it happened in the narrative.  Turns out Kolla-Ansible has some interesting opinions, some in direct opposition to the documentation for hand-rolled installations.  I thought Kolla-Ansible would simply be "automated manual installation instructions" but its definitely its own separate flavor of OpenStack, which led to some interesting conflicts later on.

Tomorrow we shut down ESXi and vCenter and decommission the hardware on the old VMware cluster.

Stay tuned for the next chapter!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.