mercoledì 4 marzo 2026

[VCF 9.0 - SDDC Manger] Doesn't boot properly after upgrade

Issue

A few days ago, I was testing a VMware Cloud Foundation (VCF) upgrade in my lab, specifically moving from version 9.0.0 to 9.0.1.0.

NOTE: Before proceeding with any upgrades, always make sure you have reliable backups of your various components. Additionally, before taking any action, it's highly recommended to take snapshots of the involved components.

During the SDDC Manager upgrade phase...

After a few minutes, following the automatic reboot of the SDDC Manager appliance, I was greeted with this error message:

Authorization Error : Unauthorized access.

This message was present in both the SDDC Manager UI...

...and in the Lifecycle Manager.

At this point, the upgrade seemed to be completely stuck.



Solution

Disclaimer: Some of the procedures described below may not be officially supported by VMware. Use it at your own risk.

Googling around, I found the following Broadcom article: VCF Operations 'SDDC Manager' tab shows "Authorization Error: Unauthorized access".

As indicated in the KB, I went to the VM console to check if the SDDC Manager issue matched the symptoms described:

SDDC Manager is inaccessible and keeps spinning
SDDC manager displays CPU errors similar to :
[2989.634241] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 42s! [jsvc:8372]

However, my finding was completely different from what the KB described. The SDDC Manager was actually booting into Emergency Mode, showing the following errors:

[    2.8952191 integrity: Problem loading X.509 certificate -22
[FAILED] Failed to mount /boot/efi.
[DEPEND] Dependency failed for Local File Systems.
"journalctl
You are in emergency mode. After logging in, type "systemctl default" or
^D to boot into default mode.
Give root password for maintenance (or press Control-D to continue):

It appeared the system was unable to correctly mount the /boot/efi file system.

I entered the root password and tried a simple reboot just to see if it would clear up, but without success.
Checking the file system, I confirmed it was failing on /boot/efi. I started investigating the mounts.

cat /etc/fstab
blkid
lsblk -f

I initially tried to fix the issue by replacing the UUID in /etc/fstab with the new ID retrieved from the lsblk command, but that didn't work.

vi /etc/fstab

So, I decided to comment out the /boot/efi line entirely. The appliance booted up successfully, but threw these firewall errors:

At this point, I realized the system wasn't properly loading its IP address.
I forced the manual configuration of the IP and gateway directly from the command line:

ifconfig eth0 10.1.1.5 netmask 255.255.255.0
route add default gw 10.1.1.1

Once the network was up, I restarted the SDDC Manager services using the built-in script:

/opt/vmware/vcf/operationsmanager/scripts/cli/sddcmanager_restart_services.sh

After the services restarted, I opened the SDDC Manager web interface to force or continue the upgrade process then I checked the logs.

Surprisingly, after a while, the logs showed that the upgrade had actually completed successfully!

However, doing a quick check on VCF Operation, it still appeared as disconnected.

But when I verified the target version in the SDDC Manager UI, it displayed the correct upgraded version.

To clean things up, I went ahead and rebooted the VCF Operation appliance.

After waiting for it to reboot, I logged back in to verify the status, and everything was finally green and fully connected.

With this roadblock cleared, I was able to safely continue upgrading the rest of the lab environment.

That's it.

Nessun commento:

Posta un commento