Thursday, March 9, 2023

Rancher Suite K8S Adventure - Chapter 019 - Provision a VM in Harvester

Rancher Suite K8S Adventure - Chapter 019 - Provision a VM in Harvester

A travelogue of converting from OpenStack to Suse's Rancher Suite for K8S including RKE2, Harvester, kubectl, helm.

A large proportion of Blog / Youtube videos covering Rancher seem to halt after the UI is working.  I intend to extend from that point, and document using the system.  Today we provision a VM.  Personally I do the hybrid life, I provision a VM in AWS, VMware, OpenStack, or now Harvester, then I use Ansible to automatically integrate the VM into my existing systems WRT Active Directory SSO, Logging, Zabbix and MetricBeats monitoring, NTP, etc.  As per that operational style, the goal or demarcation point of provisioning a VM would be successfully SSH in as the Ansible user, beyond that point Ansible takes over configuration.

As a general observation over the years the workload has steadily become container-ized such that in the long run I don't think I'll have many non-infrastructure VMs.  I will likely continue to have multiple VMs for DCs, DNS, DHCP, maybe a few other tasks, but workload slowly always moves toward containers.  That should work very well with a HCI solution such as Harvester.  Use case drives the requirements; for example I will have to bridge the DHCP server interface directly onto the LAN, for example, which was "easy" in OpenStack.

Rancher Project vs K8S Namespace

https://ranchermanager.docs.rancher.com/pages-for-subheaders/manage-projects

https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-namespaces

Projects are a 'new' Rancher concept wedged between existing K8S clusters and existing K8S namespaces.  Clusters contain projects which contain namespaces.  My plan is to use projects in Rancher similar to how I used projects in OpenStack, so I will configure "infrastructure" "server" "iot" "enduser" and similar project names.

Note that projects are a part of a cluster; project "infrastructure" on cluster "harvester-small" is independent of project "infrastructure" on cluster "harvester-large".

I intend to create roughly one namespace per hostname or system as appropriate.  Everything about DHCP as a system, would live in the DHCP namespace in the infrastructure project in my Harvester cluster.

Create a project

Log into Rancher, select virtualization and the harvester-small cluster, hamburger menu "Projects/Namespaces", "Create Project"

I named today's test project "experiment", and added "admin" as a project owner, so there are two owners, user "vince" (me) and the "admin" user.

This is where you would set project resource quotas and limits, if you were planning to use any (which I am not, at this time)

Create a namespace

Log into Rancher, select virtualization and the harvester-small cluster, hamburger menu "Projects/Namespaces", in the "experiment" project area click "Create Namespace".

I named my namespace "vm".  This is where you can set container level resource limits.  I don't intend at this time to set any limits to this namespace.

Create a VM storage class

I don't need 3 replicas of a test volume on a 3 node cluster, 2 replicas should be fine for my testing.

https://docs.harvesterhci.io/v1.1/advanced/storageclass

Log into Rancher, select virtualization and the harvester-small cluster, click "Advanced", then "Storage Classes".

Note the default SC is "harvester-longhorn" and it keeps 3 replicas.

Click 3-dots, "clone", give it a name "experimental", change the number of replicas to 2, click "create".

Upload an image into the cluster

https://docs.harvesterhci.io/v1.1/upload-image

Let's use Ubuntu's 20.04 focal-server-cloudimg-amd64.img file, I'm using version 20230215 from:

http://cloud-images.ubuntu.com/focal/20230215/

Note this is exactly the same image I run on OpenStack.  The similar named "-disk-kvm" optional image has a console compatibility issue with OpenStack such that if you want a working console you can't use "-disk-kvm".  For consistencies sake I will use the same image on Harvester.  This console works fine in OpenStack and in Harvester.

First we try uploading a previously downloaded cloud image.  Spoiler alert, this does not work.  Log into Rancher, select virtualization and the harvester-small cluster, click "Images" then "Create".  I'm naming my upload "Ubuntu 20.04 20230215".  Then select the img file and upload it.  It will take awhile to leave "uploading" status.  Eventually it failed with "Timeout waiting for the datasource file processing begin".  Well OK, let's retry.  That fails again.  There is an old Harvester bug report in Github about this error message that is closed, but obviously file uploads still don't work:

https://github.com/harvester/harvester/issues/1415

OK then, I will upload via providing a URL instead of download/upload.  "Copy Link Address" from the Ubuntu cloud-images page, and do a URL upload directly into Harvester.  The image download via URL succeeded after about three minutes.  Cool.

For now, I am using "Storage" "Storage Class" the default harvester-longhorn, which makes triple copies of the image, which seems excessive for an internet download; on the other hand I think maybe VMs will instantiate faster if the copy is local to the hardware.  I will experiment later on, with creating an image storage class that only keeps one replica to save disk space.  After all, I only need read access at the moment of instantiating a new VM which is pretty rare, and if I lose the image I can re-download a new copy from the internet in less than three minutes...

Upload your SSH keys

https://docs.harvesterhci.io/v1.1/vm/create-vm

I have a regular daily use SSO Active Directory user with SSH keys AND an ansible user with SSH keys, so for experiments (like today) I would provision a VM with my "vince" user SSH keys but I would provision a real VM with the "ansible" user SSH keys.  So I have two sets of keys to upload although today we're only using the "vince" ssh key.

Log into Rancher, select virtualization and the harvester-small cluster, click "Advanced" then "SSH Keys" then "Create".  Note that you can put SSH keys into specific namespaces although I'm using "default".  "Read from a File" worked for me.

Repeat the above for the "ansible" user, which we won't be using today but will use sooner or later.

Create a VM Network

By default, if you try to create a VM, the VM will be placed in network "management Network" which is the internal system network, and you'll get some crazy inaccessible 10.50.x.y address that only exists inside the cluster.  So you need to configure a "VM Network" that bridges over to the ethernet port, connecting the VM to the real world LAN.

Log into Rancher, select virtualization and the harvester-small cluster, click "Networks" then "VM Networks" and Create.

I named my network "untagged" and left it in the "default" namespace for general use.

Its a type "UntaggedNetwork" (hence the name "untagged") and its on the "Cluster Network" "mgmt" which is the ethernet port of my Harvester nodes.

Cloud-init

https://docs.harvesterhci.io/v1.1/vm/create-vm#cloud-init

https://cloudinit.readthedocs.io/en/latest/

Note OpenStack does drive based and DHCP based cloud-init but Harvester AFAIK only does drive based cloud-init.

One cool feature of Harvester's cloud-init is the SSH keys list seems to support injecting multiple keys into .ssh/authorized_keys.  "Back in the Old Days" using OpenStack, cloud-init only supported one key, this is kind of cool that you can inject multiple keys.

So, lose some features, gain some features.

When creating a VM, the cloud-init config can be manually modified in "Advanced Options" "Cloud Config".

In theory it would be possible to set the serial console password for the ubuntu user or set the VM hostname, but in practice it was not possible.

The link for network config options in the Rancher UI is dead.  

https://cloudinit.readthedocs.io/en/latest/topics/network-config-format-v1.html

The proper link seems to be

https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v1.html

I submitted a bug report to Harvester at

https://github.com/harvester/harvester/issues/3528

Note this bug was fixed and closed out on March 1st so if you update your Harvester after that date its probably good now.  I write these blog posts some weeks in advance...

If you provide no Network Data, you get a DHCP address, which is acceptable for testing but not useful for production.  Let's design a Network Data for a statically assigned IP address.

version: 1
config:
  - type: physical
    name: enp1s0
    subnets:
      - type: static
        address: 10.10.202.202/16
        gateway: 10.10.1.1
        dns_nameservers:
          - 10.10.7.3
          - 10.10.7.4
        dns_search:
          - cedar.mulhollon.com

cloud-init is an incredibly expensive piece of software to use, because the only feedback you'll get upon any errors is a boot log message "Invalid cloud-config provided: Please run 'sudo cloud-init schema --system' to see the schema errors."  Of course its impossible to run that because if cloud-init doesn't work all user access is blocked, no SSH keys are set and no passwords are set for the serial console.  So best of luck finding your typo or indentation error LOL.

As per the above, I was unable to use cloud-init to set the 'ubuntu' user password for the serial console, or set the VM hostname.  Perhaps there is an incompatibility in the password hash CLI program, or the requirements have changed in an undocumented manner, although I attempted to use the plain_text_password option which also failed.  The online docs that are guaranteed to work with Ubuntu 14 (support ended in 2019) have a disclaimer that those instructions are known not to work in newer installs.  Various attempts to set the hostname silently failed, it was impossible to get a working output from "hostname -f".  I tried using both the "helper" functions in cloud-init and "manually" running various command line commands using runcmd and nothing worked.  Sometimes automation that saves seconds takes too many hours to set up, especially if it has an awful UI.  So just log in as "ubuntu" over SSH and set a password for the serial console manually, and set the hostname manually.  Unfortunate, but necessary.

Create a VM

https://docs.harvesterhci.io/v1.1/vm/create-vm

Log into Rancher, select virtualization and the harvester-small cluster, click "Virtual Machines" then "Create".

The selected namespace will be "default".  I will change the NS to "vm" which is part of the "experimental" project.  How does one know "vm" is part of the "experimental" project in the Create VM UI?  That's an excellent question.  I intend to use a naming strategy for production deployment that looks like if the project name is "projectname" then the NS name will be "projectname-NS" not "NS".  Part of the design advantage of having projects was to permit the re-used of NS names across multiple projects so this naming strategy negates one of the original design goals, but if its unusable, then whatever, do what works as a pragmatism strategy.

I named the test VM "test".

In the "Basics" tab I will provide 1 CPU, 4 GB ram, and the "vince" SSH key.  As per above, "real" production would use the "ansible" key for automated provisioning but using the "vince" key makes it easy to simply log in as "ubuntu@someaddrs" from my "vince" account.

In the "Volumes" tab I select the Ubuntu image as my 10 GB disk image.  I see in the "Type" dropdown there's an option to add a cdrom, so I could install from cdrom if I don't have a ready to use cloud image.  Fun as a virtualized bare metal install would be, which I've done many a time on VMware and OpenStack, in today's experiment, we'll use Ubuntu's cloud image.

In the "Networks" tab the default is type "masquerade" on the management Network.  I will change the type to bridge as I will eventually be using this to host DHCP servers, among other things.  Also I have not experimented with filtering (if any) on a masquerade type network.  The "Network" has to be changed from the internal "management Network", to "default/untagged".

In "Node Scheduling" I do not intend to lock down to any specific node.  Its interesting to look at the rule engine for scheduling.  I could set a key to force certain workloads to certain nodes.   There does not seem to be a facility like "affinity" or "anti-affinity" rules like in VMware, which is too bad.

In "Advanced Options" I see the OS Type was autodetected as "Ubuntu", cool.  See the above cloud-init section to cut and paste in the User Data and Network Data.

Click "Create" and wait.  The UI stopped for a minute or two but stabilized rapidly...

First Five minutes with a new VM

The "test" VM is in status "running" on node "harvester-small-2".

In Rancher the operational tasks for a VM are in the "three dots" menu.  Start, stop, reboot, snapshot, migrate, etc.

First thing I looked at was the logs.  Note these are "Harvester" logs not VM logs.  Lots of lines to research later on.  Main thing I notice is every five seconds I see this log message:

"{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='vm_test' uuid=feec480d-31a1-59fa-9199-a330c83aa404 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6382","subcomponent":"libvirt","thread":"30","timestamp":"2023-02-22T20:11:36.552000Z"}"

The timestamp cannot be cut and pasted from a log message, which is annoying.

There is a "console" drop down for webvnc or serial console emulation.  Both seem to work well with this Ubuntu image.  Note that the "ubuntu" does not have a password set, have to configure that after logging in via SSH.

Troubleshooting Lore

Here's some troubleshooting lore, some of which might even be true.

Its possible to wedge into a situation where you can't log in as a static IP address and can't log in as serial because cloud init isn't working.  Seems frustrating.  The solution was to boot up as DHCP, sudo passwd ubuntu, then verify serial console is working, THEN mess around with cloud init network config while logging in over the serial console trying to get static IP addresses working.

I believe network data and/or the user data might only be read on first boot unless something like sudo cloud-init clean is run.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.