Friday, July 8, 2022

Adventures of a Small Time OpenStack Sysadmin Chapter 010 - Bare Metal OS install on OpenStack hosts 1, 2, 3

Adventures of a Small Time OpenStack Sysadmin relate the experience of converting a small VMware cluster into two small OpenStack clusters, and the adventures and friends I made along the way.

Adventures of a Small Time OpenStack Sysadmin Chapter 010 - Bare Metal OS install on OpenStack hosts 1, 2, 3

The actual OS installation is uneventful.  Its just Ubuntu 20.04 LTS aka the "Subiquity" network install, using Netboot.xyz.  As part of the install process you get to pick a default user and password, I uncreatively entered the username "test".

However, there are tasks to perform between installing the OS and installing OpenStack software.

After installation, I like to log in and do a standard upgrade process. 

sudo apt-get update

sudo apt-get dist-upgrade

sudo apt-get autoremove

sudo apt-get clean 

For some weird reason, I configured a swap partition in LVM but Ubuntu adds its own swapfile in the filesystem.  I get it; if you're running RAID you want to have your swap protected against drive failure, and allow hot swapping.  But its a little slower and I already set swap up in a LVM LV anyway.  So, "swapoff /swap.img" "rm /swap.img", remove swap from /etc/fstab.  This shrank my root partition size from 34 or so gigs down to about 2 gigs.

Some OpenStack docs imply apparmor is incompatible with OpenStack, some internet posts claim it doesn't matter, and Kolla-Ansible takes care of it automatically as part of the "bootstrap-servers" stage of installation.  For better or worse I shut down apparmor, "systemctl stop apparmor" "systemctl disable apparmor" "systemctl mask apparmor".

There's kind of a standard process to configure networking on new Ubuntu servers.  I set the hostname to the FQDN (important for ActiveDirectory join later on).  I set up the host for Ansible, life is too short to configure NTP and DNS resolvers and standard VIM options and SSHD options and stuff like that by hand.  After Ansible is done with basic server configuration, I usually re-generate my SSH host keys (because I have some specific requirements).  Anyway the point of this paragraph is now I have a generic Ubuntu 20.04 host that's integrated fully into my network environment, but it doesn't actually "do" anything, yet.

Pre-OpenStack, I had some rando hosts acting as a NTP cluster and a nice raspberry-pi based GPS clock, which worked OK.  My idea is to set up NTP on every host in both OpenStack clusters and have everything connect to that new NTP cluster.  This works really well!  Because almost everything on my network is configured by Ansible, I was able to push out the NTP config changes to all devices with minimal effort.  So I set up, tested, and modified my Ansible configs to make the OpenStack cluster host NTP on the "bare metal" level (as opposed to running a virtual machine instance on each host or something)

I used to do DNS by having all user devices point to the Samba AD DCs, which have a forwarder to two virtual machines I had on VMware but I hoped having six hosts doing DNS on bare metal would work even better.  For the use of OpenStack Designate, I set up bind9 DNS on each OpenStack host, fully recursive PLUS authoritative for the OpenStack Designate domains I had not at that time set up yet.  This eventually ended up not working very well at all because the Active Directory internal DNS resolver in Samba on my domain controllers is not particularly smart and will NOT cooperate with NS records pointing to other subdomains in the existing domain for use with Designate, but I'm getting WAY ahead of the story here.  It seemed a good design at the time.

Previously, on VMware, I ran both my DHCP servers as FreeBSD images with high availability and all that.  Experimented with VMware Fault Tolerance but it's really kind of overkill for the benefit.  Initially I intended to run DHCP on bare metal on my six hosts, of course ISC-DHCPD only supports dual host clusters so I intended to set up DHCP on all six hosts, twice, once as a primary and once as a secondary, and then I could manually log in and run a script or something to set a host as primary or secondary DHCP as needed.  This all seems like a lot of work to do by hand, and it is, but via the magic of Ansible scripting it was really almost no work at all!  However, for various reasons, later on I changed this architecture quite a bit and now have four DHCP servers, which is a long story I'll get to later on.

I had the interesting idea to replace USB pass-thru on VMware with running Docker on bare metal on the OpenStack cluster.  No need for openwebrx or homeassistant to have anything to do with the innards of OpenStack just run on the bare metal.  I always stored my volumes in NFS so the docker containers don't care where they're running as long as the host has NFS access, which my entire network does...  This works GREAT and is very fast.  There is a tiny problem in that Kolla-Ansible does not seem to be 100% compatible with this solution.  But we will get to that part of the story later on.  In the short term, this worked really well, very fast and efficient!

There was a minor problem discovered later on, where I manage Docker hosts using Portainer, which has a default port number of 9001.  Which, on an OpenStack cluster, is the port number of Designate.  So I reconfigured Portainer to run just the OS hosts on port 19001 instead of 9001 and that worked.  The good news about Kolla-Ansible is it puts the APIs on the virtual IP address so port 9001 should be open, I think.  The bad news is Kolla-Ansible is not entirely compatible with bare metal Docker especially after installing Zun and friends. 

Stay tuned for the next chapter!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.