Wednesday, August 17, 2022

Adventures of a Small Time OpenStack Sysadmin Chapter 049 - Centralized Logging Redux

Adventures of a Small Time OpenStack Sysadmin relate the experience of converting a small VMware cluster into two small OpenStack clusters, and the adventures and friends I made along the way.

Adventures of a Small Time OpenStack Sysadmin Chapter 049 - Centralized Logging Redux

Intro

Note these blog posts are written as a retrospective long after the work was accomplished.  My initial post about Central Logging was written very soon after I discovered it, early in the Plan 2.0 era.  This is a follow up post about what I learned along the path of the Plan 2.0 era, ending with description of the entire monitoring system at that time, at least as I remember it.

Narrative

Last time I wrote about central logging service, adding Monasca to a Yoga cluster kills central logging because Monasca refuses to connect to a version of Elasticsearch newer than very ancient, as seen at this link:

https://bugs.launchpad.net/kolla-ansible/+bug/1980554

I noticed that leaving Monasca installed and running with the crashing log importer shut down, results in a backlog forming in Kafka and Zookeeper on the order of quite a few gigs per week, slowly and inevitably filling my controller root partition.  Annoying.  40+ gigs is not a big deal in 2022 in the grand scheme of things, but it is a big deal on a 100 G root partition or if it seemingly uncontrollably grows 10+ gigs per week.

My solution to the slow log flood, was to remove Monasca from Cluster 2, and as a result remove Grafana (I use Zabbix mostly for IPMI level monitoring) and remove the related Kafka and Zookeeper and Vitrage.  Quite a cleanup.

Now it is important to realize that Kolla-Ansible only adds and installs following a dependency tree; it is NOT repeat NOT an orchestration system.  Remove a service, then kolla-ansible reconfigure, and it will reconfigure EVERYTHING ELSE still enabled around the changes necessary to no longer use or point to the removed service, but leave the legacy service up and running and using disk space.  So I ran a reconfigure and only afterwards did I manually stop the Docker containers for monasca, kafka, zookeeper, grafana, vitrage, etc.  Then its safe to run a prune and delete the storage volumes and so forth.

There's unfortunately a stalled out task to add a kolla-ansible service destroyer, to make it a sort-of-orchestrator, at the following links:

https://bugs.launchpad.net/kolla-ansible/+bug/1874044

https://review.opendev.org/c/openstack/kolla-ansible/+/504592

Interesting observation:  Central logging was knocked out by Monasca's logger being dead because it can only connect to truly ancient versions of elasticsearch.  By removing monasca and redeploying, I'm now getting centralized logging messages.  Cool!

To create your default Kibana indexing pattern:

https://docs.openstack.org/kolla-ansible/yoga/reference/logging-and-monitoring/central-logging-guide.html

Of course, now that elasticsearch is working on centralized logging, need to configure curator or the drive will inevitably eventually fill.  The docs seem to imply that /etc/kolla/globals.yml has options for Elasticsearch-curator, which seems not to be the case.  I have a personal TODO item to submit a patch to the docs to fix the docs.

Superficially a sysadmin would predict the config file /etc/kolla/elasticsearch-curator/elasticsearch-curator-actions.yml would be overridden in the directory /etc/kolla/config/elasticsearch-curator, however, per the docs that file should actually be stored in /etc/kolla/config/elasticsearch.  Well, OK, whatever, Kolla-Ansible sometimes gets unpredictable about peculiar filenames WRT overrides as I discovered back in my early Neutron Networking days.  Anyway, for a probably-working example configuration with somewhat shorter data retention than the rather long defaults, see this URL for inspiration, or if it ends up not working, for the LOLs:

https://gitlab.com/SpringCitySolutionsLLC/openstack-scripts/-/blob/master/backups/os6.cedar.mulhollon.com/configs/elasticsearch/elasticsearch-curator-actions.yml

Logging, Monitoring, and Telemetry around the end of the Plan 2.0 era

This is how monitoring looked on the Cluster toward the end of the Plan 2.0 era:

Zabbix monitors the OS underneath OpenStack to include everything the zabbix-agent sees down to lower level stuff like IPMI fan speeds and temps.  Skydive component of OpenStack provides a beautiful GUI for real-time-ish live network traffic flows, at some effort to decode interface names, etc.  Centralized monitoring aka Kibana monitors all docker logs.  I have Prometheus installed although I'm not actually doing anything at all with it.  I am not at this time experimenting with the optional connection between prometheus and elasticsearch, but I will get around to it eventually.

On VMware I found Log Insight worked really well as a merged monitoring tool holding both cluster logs AND enduser logs from server installs, etc.  However I don't think that merger works with OpenStack.  I will likely go back to using a dedicated ELK stack to monitor my enduser apps.  I did not set that up during Plan 2.0 for simple hardware capacity reasons.  So that is why monitoring at the end of the Plan 2.0 era seems to be missing a key component, that being end user logging; it is indeed missing and will be added later.

Tomorrow, a doubleheader of Mistral and Masakari

Stay tuned for the next chapter!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.