Tuesday, April 28, 2026

Don't send jumbo ethernet frames to Roku devices.

The title summarizes it all.

I've had ethernet at home for 35 years (started with thinnet back in the day after getting rid of some Arcnet), and I've had jumbo frame support (8K to 9K frame length instead of the default 1518) for about a quarter century.

Never had a compatibility problem; things just work a little faster.  Until spring of 2026.

Multiple models of Roku streamers plugged into multiple models of Netgear managed prosumer ethernet switches, all experiencing the same symptom at about the same rate.  The switch logs the port drops and comes back up a second later, causing minor performance degradation to the Roku and spam in my ethernet switch logs.  I theorize it's renegotiating the autodetected port speed when it drops "enough" 9K ethernet frames.  I can lock my ethernet switch port to 100M speed; however I can't lock the Roku streamer port, and that's the device that, what I theorize, is demanding a speed renegotiation and bouncing the port.

I set the MTU to 1518 bytes, and all devices on all switches are now stable.

Wild to think in a quarter century the only hardware/firmware/software I've EVER owned that's incompatible with jumbo ethernet frames is Roku streamer boxes.

I posted it to the blog because it was a fun and satisfying problem to diagnose and fix, and I have no idea where to file a real bug report.  I won't deal with consumer grade customer support LOL.

Monday, September 23, 2024

Time Sync on Windows Domains

Time synchronization is vital for active directory domain-connected PCs because the Kerberos ticket protocol required to log into the domain, will not work if the two clocks are more than "five or so" minutes apart (the exact value being a GPO configurable option)

Most of the machines on my network connected to the domain run Linux, and the three domain controllers also run Linux, so the handful of Windows devices and VMs are the odd ones out, unusually enough.  Most Windows domain networks are mostly or even exclusively full of Windows servers and desktops.

Between Linux servers, time sync is unimaginably simple.  Add a "server" line or two to the /etc/ntp.conf file, and run some diagnostic commands to verify, if necessary the text based log files in /var/log/ntp are self-explanatory.  SystemD makes things much more difficult and complicated, but its still relatively easy to get times synced between Linux machines.

Windows... is a little more elaborate, a little less documented, and the CLI is much harder to use.  I had to assemble perhaps 20 to 30 web page query results to generate this little "cheat sheet" for your reading entertainment.

Autoconfigure Windows time service from the AD Domain

On Unix if you want NTP protocol, you install NTP, or if you want some other protocol you install some other suite.  On Windows, there is a single multi-protocol subsystem.  Possibly you could install some 3rd party software to implement alternative time sync protocols.  So regardless of selected protocol, everything in the w32time subsystem is controlled via the w32tm command (not a typo, and yes it's aggravating)

Who knows what happened in the past, Windows is somewhat nondeterministic, so a reasonable first step is to implement a solid base configuration over whatever may be left behind from the past; there may be legacy config data stuck in the system.  Let's point the Windows desktop to autoconfigure its time sync from the domain.

Run CMD.EXE as administrator, and take a look at the output of

  • w32tm /query /configuration
To update the config you probably need to run something like:

  • w32tm /config /syncfromflags:domhier /update
  • net stop w32time
  • net start w32time
  • w32tm /resync /rediscover /force

You will find many claims online that this will work, and the old fashioned "net time /set /y" is obsolete and unneeded, unsurprisingly, that internet karma farming advice is wrong and I needed to use the old "net time" commands initially.  NTP will phase itself into synchronization if the times are too far apart. However, it takes forever and seemingly does not work sometimes.  It's easier to reset the clock to a known accurate time using the "net time" process above, at least to start.

(insert ominous foreshadowing music here) The default time sync protocol for a domain-connected PC is NT5DS protocol, not NTP, and it will point to a single domain controller for time sync, not all three DCs in my DC cluster.  In the long run I did not want either of those configuration options, LOL.

Verify Windows time sync subsystem operation

This is enormously more complicated than running "ntp" on linux and typing "help" a couple of times.  Sorry.

Run CMD.EXE as administrator, then:

  • w32tm /monitor

Outputs a report of the DCs ntp status, very cool, "walks up the tree" reporting the upstream server's information.

  • w32tm /stripchart /computer:dc03.cedar.mulhollon.com

Reports a live continuous stream of NTP polls against an individual computer, in this example, my third DC in the DC cluster.

  • w32tm /monitor /computers:10.10.8.221

Reports data on one computer (offset, refid, etc) in this case DC01 aka 10.10.8.221

  • w32tm /query /source

Lists where the clock is currently synced. Probably should NOT be "Local CMOS Clock"

  • w32tm /query /status /verbose

Lists timers and recent errors/results.  State machine should be 2 (Sync) not 0 (Unset)

  • w32tm /query /peers /verbose

Lists individual peer data, including reachability counters and state

  • w32tm /query /configuration

Dumps the current time sync subsystem config. Note the line describing "type" NT5DS (Local) instead of using the NTP protocol.  Note this is not like a modern Unix config yaml file where you can edit it and feed it back to the config, you will have a hilarious fun time trying to figure out how to change the NtpServer line or the MinPollInterval line, and good luck figuring out what "CompatibilityFlags: 2147483648 (Local)" means.  Windows is not exactly user-friendly, LOL, but it can be badgered into working eventually.

Configure NTP on a UNIX machine to supply NT5DS time sync

You need to follow the instructions on the samba wiki page, but take them with a grain of salt in 2024.

ntpsigndsocket needs to be /var/lib/samba/ntp_signd/ on a 'modern' Ubuntu server, not the  /usr/local/samba/var/lib/ntp_signd/ suggested by the Samba wiki.

In the "restrict" lines, you need to add mssntp to enable the NT5DS protocol

Note that despite my optimistic instructions above, I can't get NT5DS to work at this time; just get a meaningless error message and the logs look "OK" on the DCs and nothing ever syncs up, so it's unclear exactly what is broken.

Give up and configure the Windows machines to use world standard NTP protocol instead of NT5DS protocol

I only have three machines running Windows on the domain, so there are diminishing returns to continuing to burn hours troubleshooting the NT5DS time protocol.  Windows machines can be configured to use the NTP protocol via the following process:

    Run CMD.EXE as administrator, then:

    • w32tm /config /manualpeerlist:"10.10.8.221,0x8 10.10.8.222,0x8 10.10.8.223,0x8" /syncfromflags:MANUAL
    • net stop w32time
    • net start w32time
    • w32tm /resync
    The ',0x8' stuff in the config above is to force a client-type NTP connection and not make a bidirectional peer-to-peer NTP connection.  There's a MS knowledge base article about the MS side crashing if the NTP side fails to cooperate, if it doesn't allow p2p time sync connections.  I trust my server clocks more than I trust my end-user desktop clocks anyway, LOL.  IP addresses .221 .222 and .223 are my three DC cluster machines in the example above.  Yes, multiple NTP servers separated by spaces and that's why the option has to be wrapped in quotes, don't need quotes if you only have one NTP server...  Yes in theory you will be told you don't need to "net stop" "net start" but in practice I found I had to, to "kick" the subsystem into properly reacting to the new config.

    Summary and the Future

    Now, things "just work" for the three or so Windows machines on my network using NTP protocol for time sync.  It would be interesting to fix NT5DS time sync protocol someday, but I'll use the more standard NTP protocol for now.

    I think my NTP servers are OK, and it's probably a Windows thing, probably something weird about the crypto security keys, or maybe Samba is not talking to NTP in some way.

    In the long run, this should all be a centrally configured GPO but I only have three machines to hand configure, so it seems OK... for now.

    Monday, April 8, 2024

    Troubleshooting Strategies for Kubernetes, ndots: 5, and DNS blackholes

    Troubleshooting Strategies for Kubernetes, ndots: 5, and DNS blackholes

    Based solely on the title, some K8S old-timers already know the problem, and how I solved it.  Regardless, it's useful to discuss DNS troubleshooting strategies in K8S.

    Troubleshooting strategy 1: Clearly define the problem as a goal.

    I am trying to bring up a new RKE2 K8S cluster and roll some new workload onto it.  The cluster had no workload because it was brand new.  Three days ago I upgraded to K8S 1.28.8 and Rancher v2.8.3.  I deployed several Node-RED systems, the deployments, pods, services, PVCs, and PVs all look great and I can access the systems.  However, Node-RED is a node.js application and when I try to install additional packages to the palette, I get a long timeout followed by an installation failure in the GUI.  The goal is, I would like to be able to install node-red-node-aws and node-red-contrib-protobuf in Node-RED without errors.

    Troubleshooting strategy 2: Gather related data.

    The GUI is not much help; all we know is it didn't work.  Open a shell in Rancher on the K8S pod, and poke around in /data/.npm/_logs until I find the exact error message "FetchError: request to https://registry.npmjs.org/node-red-contrib-protobuf failed, reason: getaddrinfo ENOTFOUND registry.npmjs.org"  This looks bad, very bad; the pod can't resolve DNS for registry.npmjs.org.

    Troubleshooting strategy 3: Think about nearby components to isolate the problem.

    OK, DNS does not work.  However, the rest of the LAN is up, and the RKE2 system resolved hub.docker.com to initially install the container images, so at a reasonably high level, DNS appears to be working.  The problem seems isolated to the pods (not the entire LAN) and exactly one website.  Nest, stretch out, and try pinging Google.

    ping www.google.com

    Fails to resolve "no IP address found".  Let's try some other tools, sometimes nslookup will report better errors.  Ping asks the OS to resolve a domain name, nslookup asks a specific DNS server to resolve a domain name for it, a subtle difference.

    nslookup www.google.com

    Wow, that worked.  Let's combine the observations so far:  the operating system DNS resolver (the local thing using /etc/resolv.conf) failed yet asking the K8S DNS server directly, worked.  Local DNS resolution (inside the cedar.mulhollon.com domain) works perfectly all the time.  Interesting.

    Troubleshooting strategy 4: Hope this is not your first rodeo.

    This is not my first rodeo for solving sysadmin and DNS problems.  Always try the FQDN and non-FQDN hostname.  Surprisingly:

    ping www.google.com.

    That worked.  A DNS hostname that does not end in a period (plus or minus some ndots options) will check the search path before hitting the internet.  That's how a DNS lookup in a webbrowser on the LAN to nodered1 ends up pointing to the same place as nodered1.cedar.mulhollon.com, there are not two entries with different names, there's just a cedar.mulhollon.com in the DNS search path and a nodered1 A type record.  

    When working through a puzzling IT problem, if this is your first rodeo, you will be in for a rough grind, but I've seen a lot over the years, making life a little easier.  My hunch about hosts with FQDN vs non-FQDN paid off.

    Back to troubleshooting strategy 2, let's gather more data.  Inside the pod, it's /etc/resolv.conf looks like this:

    search nodered.svc.cluster.local svc.cluster.local cluster.local cedar.mulhollon.com mulhollon.com
    nameserver 10.43.0.10
    options ndots:5

    This is the longest search path and highest ndots option I've ever seen, which based on some internet research is apparently normal in K8S situations, and the nameserver IP looks reasonable.

    Troubleshooting strategy 5: Enumerate your possibilities.

    1. Is 10.43.0.10 a working coredns resolver?
    2. The search path is long.  UDP packet size limit?  Some limit in the DNS resolver?  Could it be timing out trying so many possible search paths?
    3. What's up with the huge ndots option setting?  Problem?  Symptom?  Cause?  Effect? Irrelevant?
    At this stage of the troubleshooting process, I had not yet guessed the root cause of the problem, but I had some reasonable possibilities to follow up upon.

    Troubleshooting strategy 6: Search the internet after you have something specific to search for.

    When you have described your problem in enough detail, it's time to find out how "the internet" solved the problem, someone else probably already solved the problem and is happily telling everyone.  Similar to this blog post.  I found a couple leads:
    1. Some people with weird ndots settings have massive DNS leaks causing lots of extraneous DNS traffic and running the famous K8S nodelocal DNS cache helped them cut the DNS traffic overload.  Is that my problem?  Not sure, but its "a problem".  I probably need to set up nodelocal DNS caching eventually, according to everything I read.  Make a note that because my RKE2 has IPVS installed, I need a slightly modified nodelocal DNS cache config.
    2. Some people with weird ndots settings had DNS queries timeout because they made too many DNS search queries.  Note that the "overall Node-RED palette installer" fails after 75 seconds of trying, and the command line "npm install node-red-contrib-protobuf" times out and fails in about 90 seconds, but my DNS test queries at the shell instantly fails so their problem is likely not my problem, also I have very low traffic on this new cluster and they have large clusters with huge DNS traffic loads which I would not have.  Also, there is a large amount of opposition from the general K8S community to 'fixing' high ndots option settings due to various internal traffic issues related to K8S, so we're not doing that.  I think this is a dead end to pursue.
    3. Rancher and RKE2 and K8S all have pretty awesome wiki DNS troubleshooting guides.  I plan to try a few!
    Troubleshooting strategy 7: Know when to cowboy and when not to cowboy.

    If the system is in production or you're operating during a maintenance window then you have a written documented meeting-approved change management and risk plan and a window to work, along with written prepared back-out plans.  However, if it's a pre-production experimental burn-in test system, then it's the Wild West, just make sure to write good docs about what you do to justify your learning time.  This particular example was an experimental new RKE2 cluster, perfectly safe to take some risks upon, I need to set up nodelocal DNS sooner or later anyway, and less DNS traffic might help this problem or at least can't make it worse.  So I talked myself into cowboy style installing Nodelocal DNS caching on this RKE2 cluster using IPVS, during the process of working an "issue".  I felt it was related "enough" and safe "enough" and in the end, I was correct, even though it did not solve the immediate problem.


    The DNS cache worked with no change in results (although nothing is worse; I think). Note you have to add the ipvs option if you have ipvs enabled (which I do).

    I switch my troubleshooting strategy back to "Gather related data.".  If my first and only real workload of Node-RED containers fails, let's try a different workload.  I spin up a linuxserver.io dokuwiki container.  Works perfectly except it can't resolve external DNS either.  At least it's consistent, and consistent problems get fixed quickly.  This removes the containers from the cause of the problem, it is unlikely to be a problem unique to Node-RED containers if the identical problem appears in a Dokuwiki container from another vendor...

    Back to Troubleshooting strategy 6 of when in doubt search the internet.  I methodically worked through Rancher's DNS troubleshooting guide as found at:


    kubectl -n kube-system get pods -l k8s-app=kube-dns

    This works.

    kubectl -n kube-system get svc -l k8s-app=kube-dns

    This works.

    kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default

    "nslookup: can't resolve 'kubernetes.default'"
    This fails which seems to be a big problem.

    Time to circle back around to Troubleshooting Strategy 7, and try some cowboy stuff.  This is just a testing burn-in and experimentation cluster, let's restart some pods, it's plausible they have historical or out-of-date config issues.  I restart the currently running pod.  It was restarted 3 days ago during a K8S upgrade and restarting the CoreDNS pods on a test cluster should be pretty safe, and it was.  The two rke2-coredns-rke2-coredns pods are maintained by the replicaset rke2-coredns-rke2-coredns.  I restarted one pod, and nothing interesting happened.  The good news is logs look normal on the newly started pod.  The bad news is busybox DNS query to kubernetes.default still fails.  I restarted the other pod, so now I have two freshly restarted CoreDNS pods.  Logs look normal and boring on the second restarted pod.  The pod images are rancher/hardened-coredns:v1.11.1-build20240305 The busybox query to kubernetes.default continues to fail same as before.  Nothing worse, nothing better.

    I return to troubleshooting strategy 2, "Gather more data".

    kubectl -n kube-system get pods -l k8s-app=kube-dns

    NAME                                         READY   STATUS    RESTARTS   AGE
    rke2-coredns-rke2-coredns-864fbd7785-5lmgs   1/1     Running   0          4m1s
    rke2-coredns-rke2-coredns-864fbd7785-kv5zq   1/1     Running   0          6m26s

    Looks normal, I have two pods, these are the pods I just restarted in the CoreDNS replicaset.

    kubectl -n kube-system get svc -l k8s-app=kube-dns

    NAME                        TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
    kube-dns-upstream           ClusterIP   10.43.71.75   <none>        53/UDP,53/TCP   51m
    rke2-coredns-rke2-coredns   ClusterIP   10.43.0.10    <none>        53/UDP,53/TCP   50d

    OK yes I set up nodelocal caching as part of the troubleshooting probably 51 minutes ago, very reasonable output.

    kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default

    Server:    10.43.0.10
    Address 1: 10.43.0.10 rke2-coredns-rke2-coredns.kube-system.svc.cluster.local
    nslookup: can't resolve 'kubernetes.default'
    pod "busybox" deleted
    pod default/busybox terminated (Error)

    Time to analyze the new data.  After upgrading and restarting "Everything" it's still not working, so it's probably not cached data or old configurations or similar, there's something organically wrong with the DNS system itself that only presents in pods running on RKE2, everything else is OK.  Its almost like there's nothing wrong with RKE2... which eventually turned out to be correct...

    Time for some more of Troubleshooting strategy 6, trust the internet.  Methodically going through the "CoreDNS specific" DNS troubleshooting steps:

    kubectl -n kube-system logs -l k8s-app=kube-dns

    .:53
    [INFO] plugin/reload: Running configuration SHA512 = c18591e7950724fe7f26bd172b7e98b6d72581b4a8fc4e5fc4cfd08229eea
    58f4ad043c9fd3dbd1110a11499c4aa3164cdd63ca0dd5ee59651d61756c4f671b7
    CoreDNS-1.11.1
    linux/amd64, go1.20.14 X:boringcrypto, ae2bbc29
    .:53
    [INFO] plugin/reload: Running configuration SHA512 = c18591e7950724fe7f26bd172b7e98b6d72581b4a8fc4e5fc4cfd08229eea
    58f4ad043c9fd3dbd1110a11499c4aa3164cdd63ca0dd5ee59651d61756c4f671b7
    CoreDNS-1.11.1
    linux/amd64, go1.20.14 X:boringcrypto, ae2bbc29

    kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}

    Error from server (NotFound): configmaps "coredns" not found

    That could be a problem?  No coredns configmaps?  Of course these are some old instructions and I have a new cluster with a fresh caching node-local DNS resolver, and DNS is mostly working so a major misconfiguration like this can't be the problem, so I poke around on my own a little.

    I checked the node-local-dns configmap and that looks reasonable.

    kubectl -n kube-system get configmap node-local-dns -o go-template={{.data.Corefile}}

    (It would be a very long cut and paste, but it seems to forward to 10.43.0.10, which admittedly doesn't work, and this ended up being irrelevant to the story anyway so its not included in this blog post)

    Ah, I see in the installed helm app for rke2-coredns the configmap name is actually named rke2-coredns-rke2-coredns, OK that makes sense now.

    kubectl -n kube-system get configmap rke2-coredns-rke2-coredns -o go-template={{.data.Corefile}}

    .:53 {
        errors 
        health  {
            lameduck 5s
        }
        ready 
        kubernetes   cluster.local  cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        prometheus   0.0.0.0:9153
        forward   . /etc/resolv.conf
        cache   30
        loop 
        reload 
        loadbalance 
    }

    The above seems reasonable.

    Docs suggest checking the upstream nameservers

    kubectl run -i --restart=Never --rm test-${RANDOM} --image=ubuntu --overrides='{"kind":"Pod", "apiVersion":"v1", "spec": {"dnsPolicy":"Default"}}' -- sh -c 'cat /etc/resolv.conf'

    The nameserver info matches the configuration successfully used by 78 hosts configured by Ansible operating on this LAN, and superficially looks good (insert ominous music here; not to ruin the story, but the cause of the problem was the config was not good after all, it just mostly worked everywhere except K8S for reasons I'll get to later)

    Let's contemplate Troubleshooting strategy 7: Know when to cowboy and when not to cowboy.  On one hand, I feel like enabling query logging and watching one specific query process at a time.  On the other hand, I feel pretty confident in my ability to enable logging, less confident in the infrastructure's ability to survive an accidental unintended log flood, and very unconfident about my ability to shut OFF query logging instantly if some crazy flood occurs.  I feel overall, that going cowboy by enabling query logging is a net negative risk/reward ratio at this time.  Someday I will experiment safely with query logging but today is not that day.

    Well, I seem stuck.  Not out of ideas but the well is starting to run dry.  I haven't tried "Troubleshooting Strategy 1: Clearly define the problem as a goal" in awhile, so I will try that again.  I re-documented the problem again.  It looks like the story I placed at the beginning of this blog post, no real change.  It was still a good idea to focus on the goal and what I've done so far.  Maybe unconsciously this review time helped me solve the problem.

    Let's try some more Troubleshooting strategy 2, and gather more data.  As previously discussed, I did not feel like going all cowboy by enabling cluster-wide DNS query logging.  As per Troubleshooting Strategy 4, hope this is not your first rodeo, I am quite skilled at analyzing individual DNS queries, so let's try what I'm good at:  We will pretend to be a K8S pod on a VM, and try all the search paths just to see what they look like.

    From a VM unrelated to all this K8S stuff we've been doing, let's try the google.com.cedar.mulhollon.com search path.  That is my Active Directory domain controller and it returns a NXDOMAIN, this is normal and expected.

    Following troubleshooting strategy 3, think about nearby components to isolate the problem, let's try the last DNS search path.  This will be google.com.mulhollon.com.  That domain is hosted by Google and it returns a valid NOERROR but no answer.  

    Wait what, is that even legal according to the DNS RFCs?

    Following troubleshooting strategy 5, enumerate your possibilities, I think it's quite plausible this weird "NOERROR header but empty data" DNS response from Google could be the problem.  This isn't my first rodeo troubleshooting DNS, and I know the search protocol for DNS takes the first answer it gets, so when internal resolution fails its last search path for host "whatever" will be whatever.mulhollon.com and Google will blackhole all incoming queries so it'll never try external resolution.  This certainly seems to fit the symptoms.  As a cowboy experiment on the test cluster, I could remove that domain from the DNS search path in /etc/resolv.conf and try again.  In summary, I can now repeatedly and reliably replicate a problem directly related to the issue in a VM, and I have a reasonable experiment plan to try.

    Before I change anything, gather some more data under Troubleshooting Strategy 2.  I can now replicate the solution in a K8S pod.  I don't have root and can't edit the /etc/resolv.conf file in my NodeRED containers is mildly annoying, it's just how the docker containers are designed.

    I found a container that I can successfully log into as root and modify the /etc config files.  With mulhollon.com (hosted at Google) if I try to ping www.google.com I get "bad address" because Google domain hosting blackholes missing A records, so weird but so true.
    If I edit /etc/resolv.conf in this container, and remove mulhollon.com from the search path, SUCCESS AT LAST! I can now resolve and ping www.google.com immediately with no problems.  I can also ping registry.npmjs.org so that implies I can probably use it (although this test container isn't a NodeJS container or a Node-RED container)

    Well, my small cowboy experiment worked, let's try a larger-scale experiment next.  But first, some explanation of why the system has this design.  In the old days, I had everything in the domain mulhollon.com, then I gradually rolled everything internal into active directory hosted cedar.mulhollon.com and now I have nothing but external internet services on mulhollon.com.  In the interim, while I was setting up AD I needed both domains in my DNS search path for internal hosts, but I don't think I need that any longer and it hasn't been needed for many years.

    Time for some more troubleshooting strategy 7, cowboy changes on a test cluster.  Some quality Ansible time resulted in the entire LAN having its DNS search path "adjusted".  I had Ansible apply the /etc/resolv.conf changes to the entire RKE2 cluster in a couple minutes.  Verified the changes at the RKE2 host level, changes look good, DNS continues to work at the RKE2 and host OS level so nothing has been made worse.

    I ran a "kubectl rollout restart deployment -n nodered" which wiped and recreated the NodeRED container farm (without deleting the PVs or PVCs, K8S is cool).  Connect to the shell of a random container, the container's /etc/resolv.conf inherited "live" from the host /etc/resolv.conf without any RKE2 reboot or other system software level restart required or anything weird, looks like at container startup time it simply copies in the current host resolv.conf file, simple and effective.  "ping www.google.com" works now that the DNS blackhole is no longer in the search path.  And I can install NodeJS nodes into Node-RED from the CLI and the web GUI and containers in general in the new RKE2 cluster have full outgoing internet access, which was the goal of the issue.

    Troubleshooting strategy 8: If you didn't document it, it didn't happen.

    I saved a large amount of time by keeping detailed notes in the Redmine issue, using it as a constantly up-to-date project plan for fixing the problem and reaching the goal.  Ironically I spent twice as much time writing this beautiful blog post as I spent initially solving the problem.

    I will list my troubleshooting strategies below.  These overall strategies will get you through some dark times.  Don't panic, keep grinding, switch to a new strategy when progress on the old strategy slows, and eventually, things will start working.
    • Clearly define the problem as a goal.
    • Gather related data.
    • Think about nearby components to isolate the problem.
    • Hope this is not your first rodeo.
    • Enumerate your possibilities.
    • Search the internet after you have something specific to search for.
    • Know when to cowboy and when not to cowboy.
    • If you didn't document it, it didn't happen.
    Good Luck out there!

    Thursday, March 7, 2024

    Why Kubernetes Takes a Long Time

    Why Kubernetes Takes a Long Time.

    The Problem

    Let's test something simple in Kubernetes on a fresh new bare-metal (running under Proxmox) RKE2 cluster, and deploy the classic intro app "numbers" from the book "Kubernetes in a Month of Lunches".  Other simple test apps will behave identically for the purposes of this blog post, such as the "google-samples/hello-app" application.

    If you look at the YAML files, you'll see a "kind: Service" that has a "spec type LoadBalancer" and some port info.  After an apparently successful application deployment, if you run "kubectl set svc numbers-web" you will see a TYPE LoadBalancer with an EXTERNAL-IP listed as "<pending>" that will never exit the pending state and the service will be inaccessible from the outside world.

    NodePorts do work out of the box with no extra software and no extra configuration, but you don't have to be limited to NodePorts forever.

    The Solution

    Kubernetes is a container orchestrator and it is willing to cooperate with external load-balancing systems, but it does not implement a load balancer.

    That's OK.

    If K8S can virtualize anything, why not virtualize its external load balancer?  This is not as bad of an idea as a VMware cluster getting its DHCP addresses set by a DHCP server running inside the cluster; if the cluster is impaired or down enough that the LB isn't working, the app probably isn't working either, so no loss.

    We can install MetalLB in Kubernetes, which implements a virtual external load balancer system. https://metallb.universe.tf/

    The Implementation

    1. Let's read about how to install MetalLB.  https://metallb.universe.tf/installation/  I see we are strongly encouraged to use IPVS instead of iptables.
    2. Research why we're encouraged to use IPVS instead of iptables.  I found https://www.tigera.io/blog/comparing-kube-proxy-modes-iptables-or-ipvs/ which explains IPVS scales roughly o(1) constant with traffic where the iptables version scales roughly o(n) with traffic.  OK, we have to use IPVS, which is an in-kernel load balancer that runs in front of or with kube-proxy and MetalLB.  Additionally, the K8S docs discussing kube-proxy are at https://kubernetes.io/docs/reference/networking/virtual-ips/
    3. Next, research IPVS.  Aggravating that every Google search for IPVS is autocorrected to IPv6, EVERY TIME.  Found http://www.linuxvirtualserver.org/software/ipvs.html
    4. Will this work with RKE2?  It's reported that both iptables and IPVS work fine with Calico.  RKE2 runs Canal by default which is Flannel between nodes and Calico for network policies, so I guess it's OK?  https://docs.rke2.io/install/network_options
    5. Time to set up IPVS on all RKE2 nodes.  The usual song and dance with automation, set up the first node completely manually, then set up in Ansible, test on the second node, then roll out slowly and carefully.  First IPVS setup step, install ipvsadm so I can examine the operation of the overall IPVS system, "apt install ipvsadm".  Not much to test in this step, success would be running "ipvsadm" and nothing weird seen.
    6. IPVS needs a kernel module, so without rebooting, modprobe the kernel ip_vs module, then try "ipvsadm" again, then if it works, create a /etc/modules-load.d/ip_vs.conf file to automatically load the ip_vs module during node reboots.
    7. Finally, add the IPVS config for kube-proxy to the end of the RKE2 config.yaml, merely tell kube-proxy-arg to use ipvs mode, and ipvs needs strict-arp.
    8. After a node reboot, RKE2 should have kube-proxy running in an IPVS compatible mode.  Success looks like running "ipvsadm" outputs sane-appearing mappings and "ps aux | grep kube-proxy" should show the options --proxy-mode=ipvs and --ipvs-strict-arp=true.  None of this manual work was straightforward and required some time to nail down.
    9. Set up automation in Ansible to roll out to the second node.  This was pretty uneventful and the branch merge on Gitlab can be seen here: https://gitlab.com/SpringCitySolutionsLLC/ansible/-/commit/65445fd473e5421461c4e20ae5d6b0fe1fe28dc4
    10. Finally, complete the IPVS conversion by rolling out and testing each node in the RKE2 cluster.  The first node done manually with a lot of experimentation took about half a day, the second took an hour, and the remaining nodes took a couple minutes each.  Cool, I have an RKE2 cluster running kube-proxy in IPVS mode, exactly what I wanted.
    11. Do I run MetalLB in BGP or L2 mode?  https://metallb.universe.tf/concepts/  I don't have my BGP router set up so it has to be L2 for now.  In the long run, I plan to set up BGP but I can spare a /24 for L2 right now.  Note that dual-stack IPv4 and IPv6, which I plan to eventually use, requires FRR-mode BGP connections, which is a problem for future-me, not today.
    12. Allocate some IP space in my IPAM.  I use Netbox as an IPAM.  Reserve an unused VLAN and allocate L2 and future BGP prefixes.  I decided to use IPv4 and 150 in my RFC1918 address space, I will add IPv6 "later".  I do almost all of my Netbox configuration automatically via Ansible, which has a great plugin for Netbox.  Ansible's Netbox integration can be seen at https://netbox-ansible-collection.readthedocs.io/en/latest/ The Ansible branch merge to allocate IP space looks like this: https://gitlab.com/SpringCitySolutionsLLC/ansible/-/commit/1d9a1e6298ce6f041ab4e98ad374850faf4a1412
    13. It is time to actually install MetalLB.  I use Rancher to wrangle my K8S clusters, it's a nice web UI, although I could do all the helm work with a couple lines of CLI work.  Log into Rancher, RKE cluster, "Apps", "Charts", search for metallb and click on it, "Install", "Install into Project" "System", "Next", "Install", and watch the logs. It'll sit in Pending-Install for a while.
    14. Verify the operation of MetalLB.  "kubectl get all --namespace metallb-system" should display a reasonable output.  Using rancher, "RKE" cluster, "Apps", "Installed Apps", namespace metallb-system should contain a metallb with reasonable status results.
    15. Configure an IPAddressPool for MetalLB as per the IPAM allocation in Netbox.  Here is a link to the docs for IPAddressPools: https://metallb.universe.tf/apis/#ipaddresspool Currently, I only have a "l2-pool" but I will eventually have to add a "bgp-pool".
    16. Configure an L2Advertisement for MetalLB to use the IPAddressPool above.  Here is a link to the docs for L2Advertisements: https://metallb.universe.tf/apis/#l2advertisement  Currently, I'm feeding "default" to "l2-pool" which will probably default to "bgp-pool" after I get BGP working.
    17. Try provisioning an application using a Service type LoadBalancer.  I used numbers-web as per the intro.  In the CLI, "kubectl get svc numbers-web" should show a TYPE "LoadBalancer" and an "EXTERNAL-IP" in your L2 IPAM allocation, and even list the PORT(S) mapping.
    18. Check operation in Rancher.  "RKE", "Service Discovery", "Services", click thru on numbers-web, the top of the page should contain a "Load Balancer" IP address, the tab "Recent Events", should see nodeAssigned and IPAllocated events, and the tab "Ports" should tell you the ports in use.
    19. Test in a web browser from the desktop.  Remember that the numbers-web app runs on port 8080 not the default 80.
    20. You can specify statically assigned IP addresses using a custom annotation described at: https://metallb.universe.tf/usage/#requesting-specific-ips  This is useful because I can add DNS entries in Active Directory using Ansible pointing to addresses of my choice.
    For reference, a bare-bones ipaddresspool.yaml looks like this:

    apiVersion: metallb.io/v1beta1
    kind: IPAddressPool
    metadata:
    name: l2-pool
    namespace: metallb-system
    spec:
    addresses:
      - 10.10.150.0/24

    And an equally bare-bones l2advertisement.yaml looks like this:

    apiVersion: metallb.io/v1beta1
    kind: L2Advertisement
    metadata:
    name: default
    namespace: metallb-system
    spec:
    ipAddressPools:
      - l2-pool

    The Summary

    This took 20 logically distinct steps.  Don't get me wrong, K8S is awesome, MetalLB is awesome, RKE2 is awesome, however, everything takes longer with Kubernetes...  On the bright side, so far, operation and reliability has been flawless, so it's worth every minute of deployment effort.

    Trivia

    There are only two types of K8S admins, the ones who admit that at least one time they thought metallb was spelled with only one letter "L", and the ones who are liars LOL haha.  This is right up there in comedic value with RKE2 pretending that .yml files are invisible and only processing .yaml files.

    Friday, December 8, 2023

    Proxmox VE Cluster - Chapter 020 - Architecture 1.0 Review and Future Directions

    Proxmox VE Cluster - Chapter 020 - Architecture 1.0 Review and Future Directions


    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


    What worked

    So far, everything.  And it all works better than I expected, and was generally less of a headache than I anticipated.  Performance is vastly better than OpenStack for various design and overall architectural reasons.

    How long it took

    I am writing these blog posts in a non-linear fashion so the final editing of this post is being done on a CEPH cluster with HA and the new software defined networking system and quite a few other interesting items, most of which already have rough draft blog posts.

    However, if you believe Clockify, over the course of half a year of hobby-scale effort, I have logged 125 hours, 57 minutes, and 52 seconds getting to this point.  So, around "three weeks full time labor" to convert a small OpenStack cluster to a medium size half-way-configured Proxmox VE cluster.  I believe this is about half the time it took to convert from VMware to OpenStack a couple years back.

    Future Adventures

    This is just a list of topics you can expect to see in blog posts and Spring City Solutions Youtube videos probably after the holidays or in late winter / early spring:  Setting up and upgrading CEPH.  Optimizing memory, CPU, storage.  Adding the new SDN feature.  Open vSwitch and Netgear hardware QoS.  Connecting Ansible and probably Terraform to Proxmox.  Monitoring using Observium, Zabbix, and Elasticsearch.  Setting up Rancher and RKE2 production clusters on top of Proxmox.  Backups using the Proxmox Backup Server product.  Cloud-init, will I ever it it working the way I want it to work?  HA High Availability, unfortunately I can verify this software feature works excellently during hardware failures.  USB pass thru.  

    Most of the stuff listed above is done or in process, and already partially documented in rough draft blog posts.  CEPH integration, for example, has been unimaginably cool.


    Anyway, thanks for reading and have a great day!

    Wednesday, December 6, 2023

    Proxmox VE Cluster - Chapter 019 - Proxmox Operations

    Proxmox VE Cluster - Chapter 019 - Proxmox Operations


    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


    Proxmox Operations is a broad and complicated topic.


    Day to day operations are performed ALMOST entirely in the web GUI, with very few visits to the CLI.  I have years of experience with VMware and OpenStack, and weeks, maybe even months of experience with Proxmox, so let's compare the experience:

    • VMware:  vSphere is installed on the cluster as an image, and as an incredibly expensive piece of licensed software, you get one (maybe two, depending on HA success) installation of vSphere and you get to hope it works.  Backup, restore, upgrades, and installation work about as well as you expect for "enterprise" grade software.
    • OpenStack: Horizon is installed on the controller and the controller is NOT part of the cluster.  It's free, feel free to install multiple controllers although I never operated that way.  Its expensive in terms of hardware as the core assumptions of the design assume you're throwing a rather large cloud, not a couple hosts in a rack.  Upgrades are terrifying and moderately painful and long process.  The kolla-ansible solution of running it all in containers is interesting although it replaces the un-troubleshoot-able complication of bare metal installation with an equal level of un-troubleshoot-able complication of Docker containers.
    • Proxmox VE: Every VE node has a web front end to do CRUD operations against the shared cluster configuration database.  The VE system magically synchronizes the hardware to match the configuration database.  Very cool design and 100% reliable so far.  Scalability is excellent; whereas OpenStack assumes you're rolling in with a minimum of a dozen or so nodes, Proxmox works from as low as one isolated node.
    An interesting operational note is the UI on Proxmox is more "polished" and "professional" and "complete" than either alternative.  Usually FOSS has a reputation for inadequate UI but Proxmox has the best UI of the three.

    Upgrades

    Lets consider one operational task.  Upgrades.  Proxmox is essentially a Debian Linux installation with a bunch of Proxmox specific packages installed on top of it.  Not all that different from installing Docker or ElasticSearch from upstream.  I try to upgrade every node in the cluster at least monthly, the less stuff that changes per upgrade the less "exciting" the upgrade.  The level of excitement and drama and stress scales exponentially with the number of upgraded software packages with Debian-based operating systems in general.

    The official Proxmox process for upgrades is just hit it, maybe have to reboot, all good.

    As you'd expect, there are complications, IRL.

    First I make a plan, upgrading all the hosts in one sitting because I don't want cross-version compatibility cluster issues, and I start with the least sensitive cluster host.  Note that if you log into proxmox001 and upgrade/reboot proxmox002, you stay logged into the cluster.  However if you log into proxmox001 and upgrade and reboot proxmox001, you lose web access to the rest of the cluster during the reboot (as a work around, simply log into the proxmox002 webui while rebooting proxmox001).

    Next I verify the backups of the VMs on a node, and generally poke thru the logs.  If I'm getting hardware errors or something I want to know before I start changing software.  Yes this blog post series is non-linear and I haven't mentioned backups or the Proxmox Backup Server product but those posts are coming soon.

    I generally shutdown clustered VMs and unimportant VMs and migrate "important" VMs to other hosts. 

    There are special notes about Beelink DKMS process for the custom ethernet driver using non-free firmware.  Basically Proxmox 8.0 shipped with a Linux kernel that could be modified to use the DKMS driver for the broken Realtek ethernet driver, however, the DKMS driver does NOT seem compatible with the kernel shipped with Proxmox 8.1, so after some completely fruitless hours of effort, I simply removed my three Beelink microservers from the cluster.  "Life's too short to use Realtek".  You'd think Linux compatibility would be better in 2023 than 1993 when I got started, but really there isn't much difference between 2023 and 1993 and plenty of stuff just doesn't work.  So, here's a URL to remove nodes from a cluster, which is a bit more involved than adding nodes LOL:

    Other than fully completing and verifying operation of exactly one node at a time, I have no serious advice.  Upgrades on Proxmox generally just work, somehow even less drama than VMware upgrades.  Lightyears less stress than an OpenStack upgrade.  Don't forget to update the Runbook docs and due date in Redmine after each node upgrade.

    Note that upgrading the Proxmox VE software is only half the job, once that's done entirely across the cluster its time to look at CEPH.  Again I mention these blog posts are being written long after the action, and I haven't mentioned CEPH in a blog post.  Those posts are on the way.


    Shortly after I rough drafted these blog posts, Proxmox 8.1 dropped along with an upgrade from CEPH Quincy to CEPH Reef.  AFAIK any CEPH upgrade even a minor version number is basically the same as a major upgrade, just much less exciting and stressful.  I do everything for a minor upgrade in the same order and process, more or less, as a major CEPH version upgrade, and that may even be correct.  It does work, at least so far.

    Next post, a summary and evaluation of "Architecture Level 1.0" where we've been and where we're going.

    Monday, December 4, 2023

    Proxmox VE Cluster - Chapter 018 - Moving the remainder of workload to the full size cluster

    Proxmox VE Cluster - Chapter 018 - Moving the remainder of workload to the full size cluster


    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


    Some notes on moving the remainder of the old OpenStack workload to the full size Proxmox cluster.  These VMs were "paused" for a couple days and recreated on Proxmox.


    Elasticsearch cluster members es04, es05, es06

    This is the other half of the six host Elasticsearch cluster.  Rather than storing the disk images over CEPH (foreshadowing of future posts...) or enabling HA high availability (more foreshadowing of adventures to come...) I use local 100 LVM disks because the Proxmox VE system only uses a couple gigs of my 1 TB SSD OS install drives.

    Adding more cluster members to an existing Elasticsearch cluster is no big deal.  Create a temporary cluster enrollment token on any existing cluster member, install a blank unused Elasticsearch binary on the VM, run the cluster-mode reconfiguration script with the previously mentioned token, wait until it's done.  The main effort is adjusting the config for kibana, filebeat, and metricbeat on Ansible so I can push out config changes to all hosts to use the additional three cluster members.  It 'just works'.  Currently, I have index lifecycle management to store only a couple days of logs and metrics because it seems 600 gigs of logs fills up faster than it did back in the 'old days'.

    jupyter, mattermost, navidrome, pocketmine, tasmoadmin, ttrss, others..

    These are just docker hosts that run docker containers.  The scripts to set up the docker containers, and the docker volumes, are stored on the main NFS server, so re-deployment amounts to install an Ubuntu server, let Ansible set it up to join the AD domain, install Docker for me, set up autofs, etc, then simply run my NFS mounted scripts to run Docker containers accessing NFS mounted Docker volumes.

    booksonic, others...

    Another Docker host like the above paragraph.  I had set up Active Directory authentication for a couple applications running in Docker containers and I had some "fun" reconfiguring them to use the new domain controller IP addresses.  No big deal, however, AD auth reconfiguration was an unexpected additional step.  If "everything" is configured automatically in Ansible, but its not REALLY "everything", then its easy to forget some application-level configuration remains necessary.  Every system that's big enough, has a couple loose ends somewhere.

    kapua, kura, hawkbit, mqttrouter (containing Eclipse Mosquitto)

    This is my local install of the Eclipse project Java IoT suite that I use for microcontroller experimentation and applications.

    Kapua is a web based server for IoT that does everything except firmware updates.  The software is run via a complicated shell script running version 1 docker-compose that works fine with version 2 docker compose, after exporting some shell environment variables to force the correct Kapua version and editing the start up script to run v2 "docker compose" instead of v1 "docker-compose".  Kapua overall is a bit too complicated to explain in this blog post.

    Kura is an example Java IoT device framework running locally in Docker instead of on real hardware, for testing Kapua and generally messing around.

    Hawkbit is a firmware updater and it works great, anything with wifi/ethernet and MCUboot can upgrade itself very reliably, or recover from being bricked.  Works great with STM32 boards.

    Finally, as for mqttrouter, simply start the NFS config and Eclipse Mosquitto works.

    The Eclipse project Java-based IoT suite is REALLY cool and once upon a time I planned a multi-video Youtube series using it and Zephyr but I ran out of RAM on my STM32 boards before implementing more than 50% of the Kapua/Kura protocol and now-a-days I'd just install Kura on a Raspberry Pi, if not Node-RED or on the smaller end install one of the microcontroller Python implementations and call it good; maybe some day I'll get back into Eclipse Java IoT.

    win11

    This was a gigantic struggle.  The Proxmox side works perfectly, with emulated TPM and the install went perfectly smoothly.  The problem was I have a valid windows license on microsoft.com for this VM but the image refused to 'activate'.  I paid list price for this license that I can't even use; I can see why people have a bad attitude about Microsoft...  None the less, via various technical means I now have a remotely accessible domain-joined windows 11 image that I can access via Apache Guacamole's rdesktop feature from any modern web browser (including my Chromebook) to run windows "stuff" remotely.  Works pretty well, aside from the previously mentioned license activation problem.  Everything 'Microsoft' is a struggle all the time.

    ibm7090, pdp8, rdos, rsx11, tops10, mvs, a couple others

    Runs the latest OpenSIMH retrocomputing emulator in a tmux window.  The MVS host has the "famous" MVS/370 Turnkey 5 installed along with a console 3270 emulator.  The disk images are normally stored over NFS along with all configs.  All data is stored in projects on Redmine.  I have login entries on Apache Guacamole so I have full access to my retrocomputing environment via any web browser.


    Next blog post:  Various operations issues.  Upgrading Proxmox VE software, daily stuff like that.

    Wednesday, November 29, 2023

    Proxmox VE Cluster - Chapter 017 - Install Proxmox VE on the old OS2 cluster hardware

    Proxmox VE Cluster - Chapter 017 - Install Proxmox VE on the old OS2 cluster hardware


    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


    Some notes on installing Proxmox VE on the old OS2 cluster hardware.  The main difference between installation on OS1 and OS2 is adding NTP serving to OS2, more or less as per Chapter 014 NTP notes.  The main reference document:

    https://pve.proxmox.com/pve-docs/chapter-pve-installation.html

    The plan to work around the networking challenges is to get everything working on a single plain temporary 1G ethernet connection, then use that as a management web interface to get the dual 10G LAG with VLANs up and running, then use the new "20G" ethernet connected web management interface to connect and reconfigure the dual 1G LAG / VLAN ethernet ports, at which point everything will be working.

    First Steps

    Install using IPMI KVM and USB key, so find the usb key and plug in the IPMI ethernet.


    On boot, DEL for setup, alter the boot options to include the USB drive, reboot, hit F11 for boot menu, then boot off the USB.

    Proxmox "OS" install process

    • I have a habit of using the console install environment.
    • Installer wants to default to install on the M2 drive although I am using the SATA.
    • Country: United States
    • Timezone: The "timezone" field will not let me enter a timezone, only city names, none of which are nearby.  Super annoying I can't just enter a timezone like a real operating system.  I ended up selecting a city a thousand miles away.  This sucks.  Its a "timezone" setting not "name a far away city that coincidentally is in the same timezone".  I expect better from Proxmox.
    • Keyboard Layout: U.S. English
    • Password: (mind your caps-lock)
    • Administrator email: vince.mulhollon@springcitysolutions.com
    • Management Interface: the first 1G ethernet (eno1, aka the "bottom left corner")
    • Hostname FQDN: as appropriate, as per the sticker on the device
    • IP address (CIDR): as appropriate, as per the sticker on the device / 016
    • Gateway address: 10.10.1.1
    • DNS server address: 10.10.8.221
    • Note you can't set up VLANs in the installer, AFAIK.
    • Hit enter to reboot, yank the USB flash install drive, yank the USB keyboard, watch the monitor... seems to boot properly...
    • Web interface is on port 8006.  Log in as root.  Note I installed 8.0-2 and on the first boot, the web gui reports version 8.0.3, it must have auto-updated as part of the install process?

    Upgrade the new Proxmox VE node

    1. Double check there's no production workload on the server; its a new install there shouldn't be anything, but its a good habit.
    2. Select the "Server View" then node name, then on the right side, "Updates", "Repositories", disable both enterprise license repos.  Add the community repos as explained at https://pve.proxmox.com/wiki/Package_Repositories
    3. Or in summary, click "add", select "No-subscription", "add", then repeat for the "Ceph Quincy No-Subscription" repo.
    4. In right pane, select "Updates" then "Refresh" and watch the update.  Click "Upgrade" and watch the upgrade.
    5. Optimistically get a nice message on the console of "Your system is up-to-date" and a request to reboot.
    6. Reboot and verify operation.

    Install hardware in permanent location with temporary ethernet cables

    1. Perform some basic operation testing
    2. In the web UI "Shutdown" then wait for power down.
    3. Reinstall in permanent location.
    4. Connect eno1 to any untagged "Prod" VLAN 10 access-only ethernet port, temporarily, for remote management via the web interface.
    5. Connect the 10G ethernets eno3 and eno4 to the LAG'd and VLAN'd 10G ethernet switch ports.

    Move the Linux Bridge from single 1 gig eno1 to dual 10 gig LAG on eno3 and eno4

    You are going to need this:
    1. Modify eno3 and eno4, checkmark "Advanced", change MTU to 9000.
    2. Create a Linux Bond named bond1, Checkmark "Advanced", change MTU to 9000, Mode "balance-xor", slaves "eno3 eno4" (note space in between, not comma etc).  Note bond0 will eventually be the 1G LAG, and the old OpenStack used "balance-xor" so I will start with that on the Proxmox.
    3. Create Linux VLAN named bond1.10 with MTU 9000, can create the other VLANs now if you want.
    4. Edit vmbr0 Linux bridge to have a MTU of 9000 and Bridge Ports of bond1.10
    5. Double check everything then "Apply Configuration", and after about twelve to thirteen heart stopping seconds it should be up and working.
    At some later date I will try some LAG bond modes more interesting than "balance-xor".

    Note the network interfaces do not have "VLAN aware" checked.  Everything works.  I will research this later in a dedicated advanced networking post.

    Convert the single 1 gig eno1 to dual 1 gig LAG on eno1 and eno2

    1. Edit eno1 and eno2 and set MTU to 9000
    2. Create a Linux Bond named bond0, Checkmark "Advanced", change MTU to 9000, Mode "balance-xor", slaves "eno1 eno2" (space in between).
    3. Create VLAN interfaces now on bond0, or create them later.

    Configure NTP

    1. Create (or copy) the files for sources into "/etc/chrony/sources.d" I put exactly one clock in each file.  Files in sources.d can be re-read without restarting the entire service by running "chronyc reload sources".  If successful you should see the other clocks are now accessible when running "chronyc sources".
    2. Remove the default clocks shipped by Proxmox and enable NTP serving.  Edit /etc/chrony/chrony.conf and comment out the "pool" directive and add a line underneath "allow 10.0.0.0/8"  This will require a service restart not a mere reload, so "service chrony restart" and verify Chrony operation after a few minutes using "chronyc sources"
    3. Edit DNS for ntp4 (or as appropriate) to point to the new proxmox node IP address.
    4. Edit NTP on ALL THE OTHER NODES to reflect the presence of this new NTP server.
    5. Test NTP from various nodes, VMs, and hardware to verify NTP is working.

    Final Installation Tasks

    1. Join the new node(s) to the existing cluster.  In "Datacenter" on any cluster member, "Cluster" "Join Information" "Copy Information" cut and paste into "Datacenter" on the new node, "Join Cluster", enter the peer's root password, "Join 'Proxmox'".  Will have to log back into the web UI after the SSL certs update...
    2. Verify information in Netbox to include MAC, serial number, ethernet cabling, platform should be Proxmox VE, remove old Netbox device information.
    3. Add new hosts to Zabbix.
    The next post will be about adding the remaining "paused" workload to the now "full sized" Proxmox VE cluster.

    Monday, November 27, 2023

    Proxmox VE Cluster - Chapter 016 - Hardware Prep Work on the OS2 Cluster

    Proxmox VE Cluster - Chapter 016 - Hardware Prep Work on the OS2 Cluster


    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


    This will be similar to Chapter 012 although different hardware.

    These microservers are three old SuperMicro SYS-E200-8D that were used for Homelab workloads.  They will become Proxmox cluster nodes proxmox004, proxmox005, and proxmox006.  

    This server hardware was stereotypical for a late 2010's "VMware ESXi Eval Experience"-licensed cluster, and later worked very well under OpenStack.  1.90 GHz Xeon D-1528 with six cores and 96 GB of ram, 1 TB SATA SSD for boot and local storage, new 1 TB M2 NVME SSD for eventual CEPH cluster storage.

    Hardware reliability history

    Proxmox004 had its AC power brick replaced 2022-07-10

    Proxmox005 had a NVME failure 2020-05-24, took advantage of that outage to also upgrade its SSD to a new 1TB (on suspicion, the old one was working fine although wearout measurement was getting to a high percentage per SMART reports) on 2020-05-27.

    Proxmox006 had a NVME failure 2021-02-20, and had its AC power brick replaced 2022-06-10

    Previously in Chapter012 I claimed that 5/6 of the power supplies had failed on my E800 microservers, but I made a mistake, and it seems "only" TWO THIRDS of the power supplies have failed as of late 2023, currently proxmox001 and proxmox005 are still running on the original mid 2010s power supplies.  I will keep a close eye on the output voltages (monitorable via IMPI using Observium and probably Zabbix and maybe somehow via Elasticsearch)

    FIVE Ethernet ports

    Even the official manufacturer's operating manual fails to explain the layout of the five ethernet ports on this server.  Looking at the back of the server, the lone port on the left side is the IPMI, then:

    eno1 1G ethernet bottom left corner, 9000 byte MTU

    eno2 1G ethernet top left corner, 9000 byte MTU

    eno3 10G ethernet bottom right corner, 9000 byte MTU

    eno4 10G ethernet top right corner, 9000 byte MTU

    eno1 and eno2 are combined into bond12, which uses balance-xor mode to provide 2 GB of bandwidth.

    eno3 and eno4 are combined into bond34, which uses balance-xor mode to provide 20 GB of bandwidth.  20 GB ethernet is pretty fast!

    I run the VLANs as subinterfaces of the bond interfaces.  So, "Production" VLAN 10, has an interface name of "bond34.10"

    Hardware Preparation task list

    1. Clean and wipe old servers, both installed software and physical dusting.
    2. Relabel ethernet cables and servers.
    3. Update port names in the managed Netgear ethernet switch.  VLAN and LAG configs remain the same, making installation "exciting" and "interesting".
    4. Remove monitoring of old server in Zabbix.
    5. Verify IPAM information in Netbox.
    6. Test and verify new server DNS entries.
    7. Install new 1TB M.2/NVME SSDs.
    8. Replace old CMOS CR2032 battery as it's probably 5 to 7 years old.  This is child's-play compared to replacing the battery on a hyper-compact Intel-NUC.
    9. Reconfigure the BIOS in each server.  For a variety of reasons, PXE netboot requires UEFI and BIOS initialization of the network, so I used that in the OpenStack era which was installed on top of Ubuntu.  However, I could not force the UEFI bios to boot the SATA SSD it insisted on booting the M.2 only, which is odd because it worked fine under older, USB-stick installed Ubuntu.  Another problem with the BIOS config was "something" about pre-initializing the ethernet system for PXEBoot messes up the bridge configuration on Proxmox's Debian OS, resulting in traffic not flowing; I experimented with manually adding other interfaces to the bridge; no go; symptoms were no packets flowing in (brctl showmac is essentially the bridge's ARP table) also no packets out, although link light up and everything looks OK.  Anyway, in summary, disable PXEboot entirely and convert entirely from UEFI to Legacy BIOS booting.  This was typical of the UEFI experience in the late 2010s, it doesn't really work most of the time, but Legacy BIOS booting always works.  Things are better now.

    Next post will be about installing Proxmox VE on the old OS2 cluster hardware.

    Friday, November 24, 2023

    Proxmox VE Cluster - Chapter 015 - Migrate workload of the old OS2 cluster

    Proxmox VE Cluster - Chapter 015 - Migrate workload of the old OS2 cluster


    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


    I plan to reuse the OpenStack cluster OS2 hardware to increase Proxmox cluster capacity.  Before I can reuse the hardware, I need to move all remaining workload off the OpenStack hardware.  The OS2 workload is either immediately migrated to Proxmox if its important enough, or "paused" for a few days and restored on Proxmox after the cluster work is done.  Here is a list of the OS2 cluster workload, and a description of what I did to each application:

    storage

    An Ubuntu Samba and NFS fileserver.

    Converted from FreeBSD to Ubuntu.  While doing that, I added NFS file serving for Ubuntu on Ansible, so that is scripted and automated now.

    netbox

    https://netbox.dev/

    A complete FOSS IPAM system.

    This one was tricky, its a docker-compose with local volumes so need to backup the DB on the old system then restore onto the new system, more or less.  I need to completely redesign this so all data is stored over NFS instead of on local host volumes.  Its the only container I have that stores data in local volumes which makes management a hassle.

    I recommend against directly following the database restore instructions on:

    https://github.com/netbox-community/netbox-docker/wiki/Troubleshooting#database-operations

    I unfortunately have extensive experience with restoring an older schema on top of a freshly installed empty new schema, resulting in considerable data loss.  At least I keep good backups LOL, so I was able to recover from that.  A better strategy is NOT to start everything then shut off the client processes, then (try to) restore an old schema backup over an empty new schema as per existing online docs, but instead to start ONLY the postgres container, then while it's empty, restore the old schema backup into postgres, then and only then start up the client containers (everything else) and let the automatic upgrade process upgrade the schema of the freshly restored old schema database.  That strategy worked perfectly with no data loss.

    I filed a docs improvement bug regarding the above adventure at GitHub:

    https://github.com/netbox-community/netbox-docker/issues/1113

    unifi

    The "unifi" controller software in a docker container for a cloudy-ish WIFI network based on Ubiquiti hardware.

    The Unifi Controller is just another Docker container.  However, a new controller IP address means re-homing all Unifi devices off the old server and old IP address and onto the new server with it's new IP address.  I copied the NFS mounted volume for unifi controller over to unificontroller-old, because then I could start controllers on both the new and old servers.  Obviously, every Ubiquiti hardware device on the LAN reconnected to the unifi-old server, although the new unifi server looked fine (other than acting like all devices were suddenly disconnected from it, which is accurate).  Then on the old controller, in "Settings" "System" "Advanced" there's an "inform host" setting which had the old server's IP address, so I put in the new address and hit apply.  There's also a way to manually SSH into each device individually, which can be a bit of a pain, so I used the web UI "Inform" method.

    The above resulted in a minor problem, the "inform host" on the new controller was  pointing to the old controller IP address because the new controller was a clone of the old controller, and the "inform host" on the old controller was pointing to the new host, so the devices ping-pong-ed back and forth between the old and new controllers for awhile.  I fixed the "inform host" setting on the new controller and Wi-Fi devices started coming online.  Cool.  The ethernet switches were mad at me for at least several minutes, I think I crashed some firmware or something, although they did eventually ALL come online.  Mildly interesting that the Wi-Fi devices connect much faster than the ethernet switches.  In summary, that was exciting for awhile, but in the end it all worked pretty well.

    redmine

    https://www.redmine.org/

    A FOSS project management suite

    Shut down the docker-compose on the old server, start it on the new server, painless.  All the data volumes reside on the same NAS NFS server, this was just moving the docker "compute host" from the old OpenStack cluster number two, to the new Proxmox docker "compute host".

    es02, es03, kibana

    https://www.elastic.co/

    Elasticsearch infrastructure for syslog storage and analysis

    This turned into a larger adventure that initially planned.  Ended up being an upgrade of Elasticsearch from 8.6 to 8.10 and a new cluster being formed on ES01, ES02, and ES03.  Later I will add ES04, ES05 and ES06.  This seems like a lot of work to store syslog messages, but Elasticsearch as a database technology is fun to play with and Kibana can make really cool graphical dashboards, so its worth the effort.

    portainer

    https://www.portainer.io/

    A FOSS centralized web based Docker management tool.

    Shut down the docker container on the old server, start it on the new server, painless.

    guacamole

    The Apache Guacamole project provides a website that turns any web browser into a SSH client or RDesktop client.

    Shut down the old docker container, start on the new server, painless.

    dc21, dc22 to dc02, dc03

    https://www.samba.org/

    Samba Active Directory Domain Controller Cluster.

    Need to remove dc21 and dc22 as the second to last VMs removed from OS2, some VMs on OS2 will point to dc21 and dc22 for DNS resolution.

    Probably the only "interesting" thing to remember to do was move the FSMO roles to dc01 off of dc21.

    I took this opportunity to clean up the old DNS entries.  I use the RSAT tools on a windows 11 desktop, works pretty well to control Samba Active Directory.

    dns21, dns22 to dns02, dns03

    Ubuntu servers doing DNS resolving requests that are forwarded from the Domain Controller cluster.

    dc01 forwards DNS resolution to dns01 and dns02, dc02 forwards to dns02 and dns03, and so forth, so everything has multiple backups.  This works pretty well.

    Need to remove dns21 dns22 after removing dc21 dc22.  These will be the last VMs removed from OS2.

    emby

    https://emby.media/

    A media server and DVR for Roku and other household set top boxes.

    Just an Ubuntu server with emby package installed.

    Need to convert when the recordings list is empty

    Note there is, of course, a new IP addrs.

    Emby has a very elaborate and detailed manual provisioning process, documented in it's Redmine runbook "issue".

    ubuntu

    General end user use.

    Move old ubuntu to ubuntu-old in DNS

    Set up a new Ubuntu for enduser use.

    Make sure Ansible runs on the new ubuntu before shutting down the old ubuntu.

    backup

    An Ubuntu NFS and Samba fileserver holding backup data.

    This was a test fileserver before converting "storage2" to "storage".  Only problem I ran into was minor, FreeBSD prefers the use of the group "wheel" whereas Ubuntu prefers the use of the group "root".


    Next blog post, there's nothing left on OpenStack cluster 2, shut it down and prepare the OS2 cluster hardware for reuse as additional Proxmox capacity.

    Wednesday, November 22, 2023

    Proxmox VE Cluster - Chapter 014 - Configure NTP on the Proxmox Cluster

    Proxmox VE Cluster - Chapter 014 - Configure NTP on the Proxmox Cluster


    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


    The NTP architecture here uses the E800 nodes as a central NTP source for the entire network.  Those six nodes are the only NTP servers that get time from the internet or from the local GPS refclock, and the rest of the network syncs time off those six.  Helpfully, a long time ago, I set up DNS aliases ntp1 thru ntp6 for these clocks so I don't need to change any configurations after I alter the DNS ... probably.

    Basically, today I am converting from classic NTP on the OpenStack servers to Chrony on the Proxmox servers.


    References

    https://pve.proxmox.com/wiki/Time_Synchronization

    https://ubuntu.com/server/docs/how-to-serve-the-network-time-protocol-with-chrony

    https://ubuntu.com/server/docs/how-to-serve-the-network-time-protocol-with-chrony


    The Big Picture Plan

    1. Configure proxmox001-003 to get time from the local GPS clock, from the other proxmox servers, and one internet time pool source.
    2. Configure proxmox001-003 to serve time.
    3. Modify dns such that ntp1-ntp3 will now point to proxmox001-003.  Note some devices will require manual configuration such as the Ethernet switches, maybe the Ubiquity wifi, maybe the TrueNAS, who knows?
    4. After proxmox004-006 are set up, the DNS hosts ntp4-ntp6 will need to be updated.

    I will set this up manually because its simple and I have not integrated proxmox with Ansible yet.  But eventually Proxmox will be configured via Ansible.


    Manually configuring chrony on Proxmox VE

    1. Create (or copy) the files for sources into "/etc/chrony/sources.d" I put exactly one clock in each file.  Files in sources.d can be re-read without restarting the entire service by running "chronyc reload sources".  If successful you should see the other clocks are now accessible when running "chronyc sources".
    2. Remove the default clocks shipped by Proxmox and enable NTP serving.  Edit /etc/chrony/chrony.conf and comment out the "pool" directive and add a line underneath "allow 10.0.0.0/8"  This will require a service restart not a mere reload, so "service chrony restart" and verify Chrony operation after a few minutes using "chronyc sources"
    3. Edit DNS for ntp1 (or as appropriate) to point to the new proxmox node IP address.
    4. Test NTP from various VMs and hardware to verify NTP is working.


    List of clocks in /etc/chrony/sources.d:

    • gpsclock.sources = the local, on LAN, "stratum 1-ish" GPS clock
    • proxmox001.sources = should be five files pointing to the other five E800 nodes
    • pool.sources = "server 0.pool.ntp.org" as an external reference.


    Cool, the new Proxmox nodes are now providing NTP time service to the network.  Next blog post will be about moving all the workload off the old OpenStack OS2 cluster so as to repurpose that hardware as yet more Proxmox capacity.

    Monday, November 20, 2023

    Proxmox VE Cluster - Chapter 013 - Install Proxmox VE on the old OS1 cluster hardware

    Proxmox VE Cluster - Chapter 013 - Install Proxmox VE on the old OS1 cluster hardware


    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


    Some notes on installing Proxmox VE on the old OS1 cluster hardware.  The main reference document:

    https://pve.proxmox.com/pve-docs/chapter-pve-installation.html

    The plan to work around the networking challenges is to get everything working on a single plain temporary 1G ethernet connection, then use that as a management web interface to get the dual 10G LAG with VLANs up and running, then use the new "20G" ethernet connected web management interface to connect and reconfigure the dual 1G LAG / VLAN ethernet ports, at which point everything will be working.

    First Steps

    Install using IPMI KVM and USB key, so find the usb key and plug in the IPMI ethernet.


    On boot, DEL for setup, alter the boot options to include the USB drive, reboot, hit F11 for boot menu, then boot off the USB.

    Proxmox "OS" install process

    • I have a habit of using the console install environment.
    • Installer wants to default to install on the M2 drive although I am using the SATA.
    • Country: United States
    • Timezone: The "timezone" field will not let me enter a timezone, only city names, none of which are nearby.  Super annoying I can't just enter a timezone like a real operating system.  I ended up selecting a city a thousand miles away.  This sucks.  Its a "timezone" setting not "name a far away city that coincidentally is in the same timezone".  I expect better from Proxmox.
    • Keyboard Layout: U.S. English
    • Password: (mind your caps-lock)
    • Administrator email: vince.mulhollon@springcitysolutions.com
    • Management Interface: the first 1G ethernet (eno1, aka the "bottom left corner")
    • Hostname FQDN: as appropriate, as per the sticker on the device
    • IP address (CIDR): as appropriate, as per the sticker on the device / 016
    • Gateway address: 10.10.1.1
    • DNS server address: 10.10.8.221
    • Note you can't set up VLANs in the installer, AFAIK.
    • Hit enter to reboot, yank the USB flash install drive, yank the USB keyboard, watch the monitor... seems to boot properly...
    • Web interface is on port 8006.  Log in as root.  Note I installed 8.0-2 and on the first boot, the web gui reports version 8.0.3, it must have auto-updated as part of the install process?

    Upgrade the new Proxmox VE node

    1. Double check there's no production workload on the server; its a new install there shouldn't be anything, but its a good habit.
    2. Select the "Server View" then node name, then on the right side, "Updates", "Repositories", disable both enterprise license repos.  Add the community repos as explained at https://pve.proxmox.com/wiki/Package_Repositories
    3. Or in summary, click "add", select "No-subscription", "add", then repeat for the "Ceph Quincy No-Subscription" repo.
    4. In right pane, select "Updates" then "Refresh" and watch the update.  Click "Upgrade" and watch the upgrade.
    5. Optimistically get a nice message on the console of "Your system is up-to-date" and a request to reboot.
    6. Reboot and verify operation.

    Install hardware in permanent location with temporary ethernet cables

    1. Perform some basic operation testing
    2. In the web UI "Shutdown" then wait for power down.
    3. Reinstall in permanent location.
    4. Connect eno1 to any untagged "Prod" VLAN 10 access-only ethernet port, temporarily, for remote management via the web interface.
    5. Connect the 10G ethernets eno3 and eno4 to the LAG'd and VLAN'd 10G ethernet switch ports.

    Move the Linux Bridge from single 1 gig eno1 to dual 10 gig LAG on eno3 and eno4

    You are going to need this:
    1. Modify eno3 and eno4, checkmark "Advanced", change MTU to 9000.
    2. Create a Linux Bond named bond1, Checkmark "Advanced", change MTU to 9000, Mode "balance-xor", slaves "eno3 eno4" (note space in between, not comma etc).  Note bond0 will eventually be the 1G LAG, and the old OpenStack used "balance-xor" so I will start with that on the Proxmox.
    3. Create Linux VLAN named bond1.10 with MTU 9000, can create the other VLANs now if you want.
    4. Edit vmbr0 Linux bridge to have a MTU of 9000 and Bridge Ports of bond1.10
    5. Double check everything then "Apply Configuration", and after about twelve to thirteen heart stopping seconds it should be up and working.
    At some later date I will try some LAG bond modes more interesting than "balance-xor".

    Note the network interfaces do not have "VLAN aware" checked.  Everything works.  I will research this later in a dedicated advanced networking post.

    Convert the single 1 gig eno1 to dual 1 gig LAG on eno1 and eno2

    1. Edit eno1 and eno2 and set MTU to 9000
    2. Create a Linux Bond named bond0, Checkmark "Advanced", change MTU to 9000, Mode "balance-xor", slaves "eno1 eno2" (space in between).
    3. Create VLAN interfaces now on bond0, or create them later.

    Final Installation Tasks

    1. Join the new node(s) to the existing cluster.
    2. Verify information in Netbox to include MAC, serial number, ethernet cabling, platform should be Proxmox VE, remove old Netbox device information.
    3. Add new hosts to Zabbix.
    The next post will be about setting up NTP on the new Proxmox VE cluster.

    Friday, November 17, 2023

    Proxmox VE Cluster - Chapter 012 - Hardware Prep Work on OS1 cluster

    Proxmox VE Cluster - Chapter 012 - Hardware Prep Work on OS1 cluster


    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.


    These microservers are three old SuperMicro SYS-E200-8D that were used for Homelab workloads.  They will become Proxmox cluster nodes proxmox001, proxmox002, and proxmox003.  

    This server hardware was stereotypical for a late 2010's "VMware ESXi Eval Experience"-licensed cluster, and later worked very well under OpenStack.  1.90 GHz Xeon D-1528 with six cores and 96 GB of ram, 1 TB SATA SSD for boot and local storage, new 1 TB M2 NVME SSD for eventual CEPH cluster storage.

    Hardware reliability history

    Proxmox001 is the only server out of six that is still running the original AC power supply.  The other five required replacement.  Voltage would sag lower and lower until there were random reboots under heavy load, and eventually the supplies would fail completely.  Thankfully its just a dead power supply and the rest of the hardware has been extremely reliable.  I can't recommend SuperMicro hardware enough, its really good stuff... other than the power supplies from the late 2010s.

    Proxmox002 had its AC power brick replaced 2020-11-19 and AGAIN on 2023-07-01

    Proxmox003 had a NVME failure 2021-02-20, AC power brick replaced 2022-05-31

    FIVE Ethernet ports

    Even the official manufacturer's operating manual fails to explain the layout of the five ethernet ports on this server.  Looking at the back of the server, the lone port on the left side is the IPMI, then:

    eno1 1G ethernet bottom left corner, 9000 byte MTU

    eno2 1G ethernet top left corner, 9000 byte MTU

    eno3 10G ethernet bottom right corner, 9000 byte MTU

    eno4 10G ethernet top right corner, 9000 byte MTU

    eno1 and eno2 are combined into bond12, which uses balance-xor mode to provide 2 GB of bandwidth.

    eno3 and eno4 are combined into bond34, which uses balance-xor mode to provide 20 GB of bandwidth.  20 GB ethernet is pretty fast!

    I run the VLANs as subinterfaces of the bond interfaces.  So, "Production" VLAN 10, has an interface name of "bond34.10"

    Hardware Preparation task list

    1. Clean and wipe old servers, both installed software and physical dusting.
    2. Relabel ethernet cables and servers.
    3. Update port names in the managed Netgear ethernet switch.  VLAN and LAG configs remain the same, making installation "exciting" and "interesting".
    4. Remove monitoring of old server in Zabbix.
    5. Verify IPAM information in Netbox.
    6. Test and verify new server DNS entries.
    7. Install new 1TB M.2/NVME SSDs.
    8. Replace old CMOS CR2032 battery as it's probably 5 to 7 years old.  This is child's-play compared to replacing the battery on a hyper-compact Intel-NUC.
    9. Reconfigure the BIOS in each server.  For a variety of reasons, PXE netboot requires UEFI and BIOS initialization of the network, so I used that in the OpenStack era which was installed on top of Ubuntu.  However, I could not force the UEFI bios to boot the SATA SSD it insisted on booting the M.2 only, which is odd because it worked fine under older, USB-stick installed Ubuntu.  Another problem with the BIOS config was "something" about pre-initializing the ethernet system for PXEBoot messes up the bridge configuration on Proxmox's Debian OS, resulting in traffic not flowing; I experimented with manually adding other interfaces to the bridge; no go; symptoms were no packets flowing in (brctl showmac is essentially the bridge's ARP table) also no packets out, although link light up and everything looks OK.  Anyway, in summary, disable PXEboot entirely and convert entirely from UEFI to Legacy BIOS booting.  This was typical of the UEFI experience in the late 2010s, it doesn't really work most of the time, but Legacy BIOS booting always works.  Things are better now.

    In the next post, we install Proxmox VE on the old OS1 cluster hardware.  It'll be interesting with all those VLANs and LAGs.

    Wednesday, November 15, 2023

    Proxmox VE Cluster - Chapter 011 - Move OpenStack cluster 1 workload to Proxmox

    Proxmox VE Cluster - Chapter 011 - Move OpenStack cluster 1 workload

    A voyage of adventure, moving a diverse workload running on OpenStack, Harvester, and RKE2 K8S clusters over to a Proxmox VE cluster.

    Before the hardware used for OpenStack cluster OS1 can be repurposed for the Proxmox cluster, I need to move all the virtual machines and containers off OS1.  There are several options: temporarily delete them until there is more capacity, permanently delete them if no longer needed, or move to the Proxmox cluster.

    The old "warm backup" availability strategy for OpenStack was some workload was installed on both clusters, but only operating on one cluster at a time, for example, one of the minor file servers.  It was expensive to keep two copies around of "everything" and only running one copy on the Proxmox cluster should save quite a bit of capacity, overall.

    Here is a list of workload I moved to Proxmox:

    netbootxyz

    Netboot.xyz provides network booting infrastructure.  Network booting starts with a DHCP server like ISC-DHCP (Or KEA...) pointing a booting PC to a TFTP address, the address of the netboot.xyz server.  This is where Netboot.xyz comes into the picture, it serves a really nice CLI menu of dozens of operating system install ISO files.  A very convenient way to install an OS.  There are also plenty of testing and troubleshooting images available.

    https://netboot.xyz/

    The VM is a simple Ubuntu 20.04 install that runs Docker.  I NFS mount all my Docker volumes, this has worked well for several years.  The move was uneventful.  Shut down the old VM on the OS1 cluster, start the new VM (on a new address) on the Proxmox cluster, run a script I keep in the NFS mounted docker directory to pull and start a netboot.xyz container, repoint the DHCP servers to the new netboot.xyz ip address, and it just works.

    wiki

    This is a Docker container of DokuWiki.  I use it as a "home page" or "phone book" for the LAN.  If its a web-accessible server, it has a link to it on the wiki.

    https://www.dokuwiki.org/dokuwiki

    This is another simple Ubuntu VM holding a Docker container, much like the netbootxyz VM above.  One of many advantages of storing my Docker volumes over NFS is a move like this is so simple; shut down the Docker container on the old server, start the Docker container on the new server, done.  The move was uneventful.

    dhcp11, dhcp12, dhcp21, dhcp22 all replaced by dhcp01, dhcp02

    This is a classic dual server ISC-DHCPD cluster.

    https://www.isc.org/dhcp/

    On the OS1 cluster this was running FreeBSD and converted this to Ubuntu 20.04.  This conversion was uneventful.  I am aware ISC DHCP is discontinued as of 2022, and KEA is the next generation of ISC supported DHCP servers.  Will convert to KEA later, stay tuned for a Spring City Solutions Youtube channel video about that conversion process.

    dc11, dc12 replaced by dc01

    This LAN uses Samba servers as Active Directory Domain Controllers.  Really nice to have network access from any machine to my home directory, and SSO is also pretty cool.

    https://www.samba.org/

    This was also a conversion from FreeBSD Samba (which is pretty easy to use) to Ubuntu Samba (which is definitely not as easy to use).  After the initial OS install and Ansible configuration, Do not do a "net ads join -U administrator" during the Ansible process.  The Ubuntu samba-tool utility has to create a fresh smb.conf file from scratch during the joining process, so just move the Ansible provided file out of the way temporarily (the Ansible file should be identical other than configuring DNS forwarder servers).  After initial configuration you will have to manually edit (or use ansible) to fix the DNS forwarder entry in /etc/samba/smb.conf.  You can join a Ubuntu DC to a domain with no error messages while running the "user" set of samba services SMBD, NMBD, Winbond and there will be zero error messages aside from "samba-tool drs showrepl" command failing to connect to port 135, and of course the DC not working in general.  It seems that on Ubuntu, you need to shut down the user class daemons SMBD, NMBD, and WINBOND using systemctl, then look up how systemd permanently shuts down samba-ad-dc in order to figure out how to "unmask" that service.  The next Ubuntu Samba related problem is systemd-resolved is autoconfigured to start on port 53 before trying to start Samba while refusing external connections, and samba-ad-dc will successfully start and not output any error messages while failing to bind to port 53, so in summary by default domain controller authoritative DNS will fail to work.  Systemd is always so annoying to use and just makes everything harder.  The solution is to  "systemctl stop" and systemctl mask" the "systemd-resolved.service".  A final minor problem, or surprise, is I couldn't get replication working without a reboot.  Yes I know its on a fifteen minute (or so) timer but it just wouldn't start without a reboot.  Believe it or not, on FreeBSD, none of this drama is necessary.  Regardless of extensive effort required to work around systemd, Samba worked eventually.

    dns11, dns12 replaced by dns01

    These are typical ISC Bind DNS servers.  Samba Active Directory Domain Controllers do not have a very advanced resolver, so I forward DNS queries for everything they're not authoritative for to a resolver cluster.  I've always preferred a DNS architecture that keeps authoritative DNS and DNS resolving separate.

    https://www.isc.org/bind/

    The VM is a simple Ubuntu server acting as a DNS resolver (not doing authoritative DNS).  The conversion was uneventful.  Active directory domain controllers DC11 and DC12 use DNS11 and DNS12 so I could not shut down and remove DNS11 and DNS12 until after removing DC11 and DC12.

    observium

    Observium is a SNMP-based monitoring system.  Mostly monitors my Netgear ethernet switches, but can monitor other devices.

    https://www.observium.org/

    The VM is another simple Ubuntu Docker server.  I had to expand the memory to 8 GB and expand the hard disk in Proxmox (then expand the LVM PV, then the LVM LV, finally expand the FS) to 32 GB and expand the CPU allocation to dual core.  Starts up slow, but works.

    zabbix

    Zabbix is an agent-based detailed monitoring system.  All VMs and some physical hardware run the zabbix agent for advanced monitoring and trend analysis.

    https://www.zabbix.com/

    The Zabbix system has two VMs.  In the past I've run into weird problems with the Zabbix trying to monitor itself.  One way around that is Zabbix has a proxy/concentration server, so I install one of those, and it can connect to the main Zabbix server.  Probably no longer necessary, but it is pretty cool.

    The zabbix VM is yet another simple Ubuntu Docker server.  This was a bit of a headache.  I ran into some kind of incompatibility between old and new MySQL versions.  Would have impacted me upon next container upgrade had I not moved to a new cluster.  Ended up doing a complete Zabbix reinstall, there are too many new features to make the backup useful, etc.  In the end after some labor, this works pretty well.

    The zabbixproxy VM, again, yet another simple Ubuntu Docker server, holds the Docker container for zabbix-proxy.  The move from the OS1 cluster was uneventful.

    zerotier

    ZeroTier is a complete VPN solution.  Pretty cool!

    https://www.zerotier.com/

    Uneventful reinstallation.  Ended up creating a new connection to Zerotier then reroute traffic for the LAN to the new connection (because for awhile I was running both in parallel for testing purposes, which also necessitated some LAN IP address and static route juggling).  Also updated DNS to point to dc01 (and later on, added dc02 and dc03, although they don't exist yet).


    Next blog post will be about hardware prep work to turn the old OpenStack OS1 cluster into more Proxmox nodes.