Adventures of a Small Time OpenStack Sysadmin relate the experience of converting a small VMware cluster into two small OpenStack clusters, and the adventures and friends I made along the way.
Adventures of a Small Time OpenStack Sysadmin Chapter 032 - Bare Metal Install of OpenStack Hosts 4, 5, 6
By "bare metal install" I mean a bootable Ubuntu 20.04 server installation on bare metal hardware (not inside the OpenStack that isn't even installed yet LOL). Kolla-Ansible is installed on top of a cluster of working Ubuntu servers. So this chapter is where I install, configure, and test those bare metal hardware servers, tomorrow to become controllers and compute nodes and network nodes and stuff like that.
This is somewhat repetitive of Chapter 010 where I installed Ubuntu 20.04 on hosts 1, 2 and 3.
https://springcitysolutions.blogspot.com/2022/07/adventures-of-small-time-openstack_01240235463.html
First, here are the Kolla-Ansible reference documents I used for the "Yoga" release. By the time you read this retrospective, maybe "Zebra" will be the newest release, I don't know.
https://docs.openstack.org/kolla-ansible/yoga/user/quickstart.html
https://docs.openstack.org/kolla-ansible/yoga/reference/index.html
In retrospect, I should have left the M2 SSD alone; I had to reconfigure it later based upon:
https://docs.openstack.org/kolla-ansible/yoga/reference/storage/swift-guide.html
There's a "kind of bug" in that doc, admittedly it is written for a different OS; anyway the partition on Ubuntu in the mkfs ends in p1, not just 1, which I think would be obvious to an Ubuntu sysadmin? Anyway, pay close attention to that.
I'm continuing with my strategy from Plan 1.0 in Plan 2.0 to use swap partitions instead of swap files as installed. Writing this as a retrospective, I still do this even in Plan 3.0 and it works great!
I use Ansible (not Kolla-Ansible, just an install on my LAN) to configure hosts for common configs, think of things like Active Directory SSO or NTP or even just a sensible option file for the VIM editor or the agent for Zabbix monitoring. This was all uneventful. I still worry if I run my LAN Ansible against my Kolla-Ansible configured hosts, that'll mess something up, but so far, so good! I wish Ansible as a tool had some functionality to compare two recipes and report if there's any conflict between them.
Originally, in Plan 2.0 I intended to connect my existing Portainer Docker monitoring system to my own Portainer-Agent containers on the Kolla-Ansible docker installation, as I thought it would be interesting to monitor Kolla-Ansible Docker containers via the magic of Portainer. Eventually when setting up Kuryr networking and Zun container management "something" messed up Portainer Agent connectivity; maybe in the future if I run something like a Zun container of Portainer Agent connected directly to a provider network, it would work. I simply haven't explored this alternative in depth, not yet anyway. It would be VERY helpful as a front end for Kolla troubleshooting, if I could make it work.
This Kuryr/Zun effect also seems to stop me from running "bare metal" containers outside the Zun ecosystem; I have some hardware devices plugged into USB ports connected to containers which VMware did quite easily, as it supports USB passthru. However I will need a new solution for USB hardware when I roll out Plan 3.0. Will discuss that later; it involves an Intel NUC and the unusual direction of taking software loads OFF the cluster and putting onto bare metal.
The physical hardware has five ethernet ports, two 1G, two 10G, and an IPMI port. The way I'm doing physical networking on Plan 2.0 is to bond the two 1G ethernets and use those for provider network VLAN trunking, and the two 10G ethernets together as a simple 802.1 Access single VLAN management port, and then let Kolla-Ansible run "everything" over the resulting 20G management interface, which has worked out VERY well.
This results in a couple installation issues.
First, there is the interface dance. PXEboot by default does not speak VLAN 802.1q protocols so you have to configure eth0 as a simple access port on the management LAN, then do a full install over that single access port, then configure the bonded 10G ports and move all management traffic to them, then configure bonded 1G provider LAN ports instead of being a single access port. It sounds more complicated than it is; it's the "interface dance". It just takes some time.
The second installation issue is the stereotypical NetGear managed ethernet switch headache with bonded interfaces. Logically one would think you admin up the individual ports, configure them as members of a LAG, then admin up the LAG and you're up and running. And it will look great in the monitoring, and will simply not pass any traffic at all, very frustrating. For some reason that likely seemed important, configuring a LAG on this old version of NetGear firmware will admin down the LAG-as-a-port. So the complete NetGear checklist includes admin up the port as a port, admin up the LAG as a (virtual?) port, configure the LAG and admin up as a LAG, double check everything is STILL admined up, then it'll likely work and pass traffic, assuming you set your VLAN correctly and the other side (the E200 server hardware) is also configured correctly. Don't forget other NetGear hilarity like being able in the web user interface to select a MTU of 9198 bytes or whatever it is, but the hardware only passes a little over 9000 bytes. NetGear certainly puts the excitement back into networking! Although not necessarily exciting in a good way LOL.
The third installation issue is there is not much documentation out there on configuring multiple bonded ethernets with VLANs for Ubuntu. You'd think that being a very popular OS, someone has posted a solution for every possible network config, however that is not so.
Here's what /etc/netplan/os6.yaml looks like on my sixth OpenStack host as of the conclusion of Plan 2.0 but before rolling out Plan 3.0 as you can see its still using the Plan 1.0 AD domain controllers for DNS instead of the freshly installed Plan 2.0 AD domain controllers, or it would be if I didn't fix it in my resolv.conf then "chattr +i /etc/resolv.conf" like most sysadmins end up having to do, LOL:
network:
version: 2
renderer: networkd
ethernets:
eno1:
mtu: 9000
eno2:
mtu: 9000
eno3:
mtu: 9000
eno4:
mtu: 9000
bonds:
bond12:
mtu: 9000
dhcp4: false
dhcp6: false
interfaces: [ eno1, eno2 ]
parameters:
mode: balance-xor
mii-monitor-interval: 100
bond34:
mtu: 9000
dhcp4: false
dhcp6: false
interfaces: [ eno3, eno4 ]
parameters:
mode: balance-xor
mii-monitor-interval: 100
vlans:
bond34.10:
id: 10
link: bond34
mtu: 9000
addresses : [ 10.10.20.56/16 ]
gateway4: 10.10.1.1
critical: true
nameservers:
addresses:
- 10.10.250.168
- 10.10.249.196
search:
- cedar.mulhollon.com
- mulhollon.com
bond34.30:
id: 30
link: bond34
mtu: 9000
addresses: [ 10.30.20.56/16 ]
bond34.60:
id: 60
link: bond34
mtu: 9000
addresses: [ 10.60.20.56/16 ]
#
And don't forget that netplan file is a .yaml file so every spacebar is critical, one off and nothing works. This works for me; best of luck to you!
A final note is in the subinterfaces like bond34.30, the VLAN is selected by the id: parameter (in this case, 30) not by the name of the subinterface as is commonly incorrectly believed. I would advise never having them not match on purpose, but the id does override the subinterface name if they do not match.
Tomorrow, the Cluster 2.0 hardware meets Kolla-Ansible, and it will be a LONG post.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.