Tuesday, July 19, 2022

Adventures of a Small Time OpenStack Sysadmin Chapter 021 - OpenStack Nova Compute Service

Adventures of a Small Time OpenStack Sysadmin relate the experience of converting a small VMware cluster into two small OpenStack clusters, and the adventures and friends I made along the way.

Adventures of a Small Time OpenStack Sysadmin Chapter 021 - OpenStack Nova Compute Service

My reference for installing Nova:

https://docs.openstack.org/nova/yoga/install/

Nova-conductor is required on a controller, and can not be installed on a compute node. That seems to be what prevents the controller node os3 from hosting compute services.  Until I discovered this, I planned on making my controller a compute node, in at least a limited sense.  The docs alluded to controllers not being compute nodes but I thought that was merely a best practice or some kind of system-load limitation.

For some reason the default template provides a region_name = regionOne and I found out the hard way that region names are capitalization sensitive.  So that was weird and moderately annoying and definitely time consuming.  I did not file a bug report as I'm not entirely clear if its reproducible and what caused the weird template region_name.

There is a LOT of config in Nova that relates to Cinder and Neutron, also some config in Cinder and Neutron that relates to Nova.  So the configuration process is a little circular over the last couple days.  Most services in OpenStack do not have these circular configuration dependencies; it does get better.

As a practical example of the above paragraph, if you mismatch your metadata_procy_shared_secret passwords, you will have a difficult time troubleshooting the resulting cloudinit problem.  However, there is surprisingly little configuration needed for a minimal system so I strongly recommend trying to administer OpenStack on a multi-monitor desktop; its much easier when you display configurations that need to match simultaneously, and you notice they don't match.

I ran into an interesting bug related to OpenStack Nova and FreeBSD:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255231

There's an incompatibility between FreeBSD 13.0 and OpenStack WRT UEFI, such that OpenStack can successfully signal to the operating system to shut down, but after shutting down the OS, the instance can't shut down.  I was able to verify that yes indeed this does not work at all on FreeBSD 13.0 necessitating manually shutting down the instance in OpenStack.  However I was also able to verify its fixed in FreeBSD 13.1.  FreeBSD 13.1 was released on 12 May 2022 right around the time I was doing this OpenStack project, so that's convenient.  After I verified it was fixed, someone else double checked, and closed the FreeBSD bug report.  Perhaps I'm biased, but to me, it seems FreeBSD people are the nicest people, no sarcasm, its just how it is with everyone I've interacted with over the years.  FreeBSD is the chillest software project.

If you want OpenStack compute migrations to work, you need to manually enable SSH between the hosts as detailed in:

https://docs.openstack.org/nova/yoga/admin/ssh-configuration.html

This is not mentioned in the simplified instructions above.  Even more excitingly, if you attempt to migrate an instance, and the migration fails, OpenStack kills the instance permanently.  Hope you have backups or extensive orchestration or it was just a test image!  The above paragraph relating to SSH not being configured will permanently kill any instance you try to migrate.  However, I also permanently killed a couple migrating instances due to a temporary Neutron problem that went away (probably was some weird startup order dependency, but that's just a slightly educated guess). 

"ERROR oslo_messaging.rpc.server nova.exception.InternalError: Failure running os_vif plugin plug method: No VIF plugin was found with the name linux_bridge"

Seriously, are you kidding, its there, right there, look, and it seems to work all the rest of the time that its not migrating something.  After multiple days of frustration (sadly not exaggerating) I moved on to other struggles temporarily, which is a legit troubleshooting technique, and when I returned to this problem, after not knowingly changing anything related to this problem, migrations were working perfectly.  Sometimes having things work perfectly can be Mildly Frustrating.  Did not file a bug because I could not even remotely reproduce the problem.  Never after did I have even the smallest problem with migrations, fun and trouble free beyond this point.  OS migrations, especially live ones, are not nearly as smooth as VMware vMotion, but they certainly do work. I have a gut level guess that some tangential distant software component maybe in Glance or Placement got out of sync with the configs or live data in Neutron and they fought it out and this was the result, but I have no proof. I had focused my troubleshooting HARD on the big three, Neutron, Nova, and Cinder, and found absolutely nothing wrong, so I'm just assuming the problem was somewhere else. Who knows maybe Designate locked some data while experimenting with setting DNS names, causing a sync issue in Neutron. I really don't think the problem was in the big three.

So, yeah.  Migrations.  Either they work flawlessly, or they zorch your image permanently and I hope you keep good backups because you're gonna need them.  Not exactly the VMware vMotion flawless experience you might hope for.

I did not document it, but I burned several days kind of having fun with Nova, just pushing thru various operating systems and trying stuff.  Nova is fun.  Maybe not as fun as Heat, but its fun.

Speaking of Heat, that's tomorrow's topic.

Stay tuned for the next chapter!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.