mercoledì 4 marzo 2026

[VCF 9.0 - SDDC Manger] Doesn't boot properly after upgrade

Issue

A few days ago, I was testing a VMware Cloud Foundation (VCF) upgrade in my lab, specifically moving from version 9.0.0 to 9.0.1.0.

NOTE: Before proceeding with any upgrades, always make sure you have reliable backups of your various components. Additionally, before taking any action, it's highly recommended to take snapshots of the involved components.

During the SDDC Manager upgrade phase...

After a few minutes, following the automatic reboot of the SDDC Manager appliance, I was greeted with this error message:

Authorization Error : Unauthorized access.

This message was present in both the SDDC Manager UI...

...and in the Lifecycle Manager.

At this point, the upgrade seemed to be completely stuck.



Solution

Disclaimer: Some of the procedures described below may not be officially supported by VMware. Use it at your own risk.

Googling around, I found the following Broadcom article: VCF Operations 'SDDC Manager' tab shows "Authorization Error: Unauthorized access".

As indicated in the KB, I went to the VM console to check if the SDDC Manager issue matched the symptoms described:

SDDC Manager is inaccessible and keeps spinning
SDDC manager displays CPU errors similar to :
[2989.634241] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 42s! [jsvc:8372]

However, my finding was completely different from what the KB described. The SDDC Manager was actually booting into Emergency Mode, showing the following errors:

[    2.8952191 integrity: Problem loading X.509 certificate -22
[FAILED] Failed to mount /boot/efi.
[DEPEND] Dependency failed for Local File Systems.
"journalctl
You are in emergency mode. After logging in, type "systemctl default" or
^D to boot into default mode.
Give root password for maintenance (or press Control-D to continue):

It appeared the system was unable to correctly mount the /boot/efi file system.

I entered the root password and tried a simple reboot just to see if it would clear up, but without success.
Checking the file system, I confirmed it was failing on /boot/efi. I started investigating the mounts.

cat /etc/fstab
blkid
lsblk -f

I initially tried to fix the issue by replacing the UUID in /etc/fstab with the new ID retrieved from the lsblk command, but that didn't work.

vi /etc/fstab

So, I decided to comment out the /boot/efi line entirely. The appliance booted up successfully, but threw these firewall errors:

At this point, I realized the system wasn't properly loading its IP address.
I forced the manual configuration of the IP and gateway directly from the command line:

ifconfig eth0 10.1.1.5 netmask 255.255.255.0
route add default gw 10.1.1.1

Once the network was up, I restarted the SDDC Manager services using the built-in script:

/opt/vmware/vcf/operationsmanager/scripts/cli/sddcmanager_restart_services.sh

After the services restarted, I opened the SDDC Manager web interface to force or continue the upgrade process then I checked the logs.

Surprisingly, after a while, the logs showed that the upgrade had actually completed successfully!

However, doing a quick check on VCF Operation, it still appeared as disconnected.

But when I verified the target version in the SDDC Manager UI, it displayed the correct upgraded version.

To clean things up, I went ahead and rebooted the VCF Operation appliance.

After waiting for it to reboot, I logged back in to verify the status, and everything was finally green and fully connected.

With this roadblock cleared, I was able to safely continue upgrading the rest of the lab environment.

That's it.

domenica 1 marzo 2026

[SSP 5.1.0] vDefend SSP Backup Failures: Stuck at 60%

Issue


If you manage a VMware vDefend Security Services Platform (SSP) environment, you know how critical automated backups are. Recently, I ran into a frustrating issue where scheduled backups started failing mysteriously. What initially seemed like a simple user and misconfigured settings issue, “network timeout” turned out to be a bizarre problem involving lost encryption keys and crashing containers.
Looking on the SSP UI at the System > Backup and Restore dashboard, I noticed a long list of failed backup jobs. Hovering over the failure status yielded a very generic message: "operation timed out, please contact administrator".
I tried manually forcing the backup, but it seems to stuck at 60% and then time out after a while.
The generic timeout error on the Backup and Restore dashboard


Solution


My first instinct was to check the SFTP credentials, re-type the password, SAVE and seemed fine.
Googling around, I found this interesting link "Backup and Restore goes into Failed State" which helped me in the final resolution.

Since the UI wasn't giving me enough technical details, it was time to drop into the CLI. I logged into the SSPI appliance via SSH as sysadmin to check what was happening at the Kubernetes pod level.

I ran a quick ...
kubectl get pods -n nsxi-platform | grep backup
... and immediately spotted the problem. The backup job pods weren't just timing out; they were actively crashing, throwing Error and OOMKilled statuses.
To understand why the pods were in this state ("OOMKilled", "Error"), I checked the logs of both pods....
k logs job-backup-4fv4k-5gq5s -n nsxi-platform
Scrolling through the container logs, I could see ... the system passed a malformed argument to the OpenSSL encryption command.
The log explicitly stated:
Error executing command: [openssl enc -aes-256-cbc -salt -in ... -out ... -k ******]. 
Error: fork/exec /usr/bin/bash: argument list too long.
Scrolling through the container logs, I could see the backup job actually doing its work: generating the .obj and .dump files from the Postgres database and config files.
However, the process abruptly halted right after the ...
<INFO> ... Started encrypting backup bundle
... message.


Conclusion

How did I fix it? I re-entered the password and retyped the passphrase to ensure the openssl process received a valid string, allowing the container to properly encrypt the dump files and upload them to the SFTP server.

The mystery was solved. The "timeout" in the UI was just a side effect. The real issue was that the missing encryption key caused the openssl command to fail spectacularly (resulting in an argument list too long OS error), which crashed the container pod.

It appeared the platform had somehow lost track of the encryption key. The configuration was stuck in a weird state where trying to "Reuse prior passphrase" wasn't working.

Now starting the backup in manual mode, it completes successfully.


That's it.

martedì 24 febbraio 2026

How to Converge an existing VMware vSphere environment to VCF 9.0 step-by-step.

We will see below (step-by-step), how to Converge a vCenter Instance and ESX Hosts to vSphere Foundation Platform.
In our case the target VMware Cloud Foundation we converge on is 9.0.2 version.

To successfully converge your existing vCenter instance and ESX hosts, you must ensure that you comply with the prerequisites, follow the procedure, and perform the next steps.
Verify that your existing configuration is supported and you satisfy the minimum component versions. See Supported Configurations in Converging Existing Virtual Infrastructure to a VCF or a vSphere Foundation Platform.

Last but not least, check Supported Scenarios to Converge to VCF.

Before to begin, a quick tip to keep in mind:
The first step to implement VCF 9 is to deploy the 'VCF-SDDC-Manager-Appliance-9.0.2.0.25151285.ova' file, which serves as the installer, as shown here.
If the VMware Cloud Foundation installer appliance is deployed on one of the hosts in the management domain. During the deployment process, the installer appliance will be converted into the SDD Manager appliance; otherwise a new SDDC Manager appliance will be deployed within the vCenter being converged.
In the case illustrated below we deployed the VCF installer outside the vCenter to be converged.


Let's start:
Access the "VCF Installer" (how to download, deploy and configure VCF Installer available here) UI via browser (https://<appliance-ip>), and log in.
Check that the depot (in our case the online depot) is correctly configured, and binaries are successfully downloaded.
Expand "DEPLOYMENT WIZARD" and select "VMware Cloud Foundation" (1).
Select "Deploy a new VCF fleet" (2) > CONTINUE (3)
In our case, the vCenter that will be converged will also be the Management Domain's vCenter.
Select the existing components (in our case only vCenter)(4) and click "NEXT" (5).
Fill in the "General Information" (6), selecting the version, entering the new name of the VCF instance, the management domain, etc.
Select the "Deployment model" (in our case, since it's a LAB, I selected Standard (Single-node)). Then proceed by clicking "NEXT" (7).
Select the correct size of the "Operations Appliance", fill in the FQDN (8), passwords and enter the FQDNs for the "Fleet Management Appliance" (9) and for the "Operations Collector" (10) Appliance. Press NEXT (11).
In this case, since it is a LAB, I skip the installation of "Automation"(12) appliance and continue by pressing NEXT (13).
Indicate and enter the credentials of the existing vCenter that you want to convert (14).
Accept the certificates (16) and confirm (17).
Enter the required information to deploy the NSX appliance (18) (19) to the management domain and continue by pressing NEXT (20).
The VMware Cloud Foundation installation appliance is not deployed on one of the hosts in the management domain. A new SDDC Manager appliance will be deployed. Enter the required information (21).
Review, than perform the VALIDATIONS tests and ACKNOWLEDGE (23) the Warnings ...
... DEPLOY (24).
Wait until the VCF components are installed...
OPEN VCF OPERATIONS UI (25)
log in ...
... and check that everything is OK.
Login on the vCenter as well and check that everything is OK too.

Useful links:
How to Converge a VMware vSphere Environment to VMware Cloud Foundation 9.0

That's it.

lunedì 16 febbraio 2026

[VCF 9.0.2] VMware Cloud Foundation Installer and Depot Configuration

🚀 Kickstarting VCF 9: From Download to Depot Configuration


Below is a quick how-to guide on some critical steps to get your VCF Installer appliance and the initial bootstrap process up and running quickly and easily (at Day 0).

The Workflow:

1. DOWNLOAD (The Broadcom Portal)

Since the transition, locating the binary is the first challenge.
  • Log in to the Broadcom Support Portal.

  • Navigate to "My Downloads" -> type "VMware Cloud Foundation"(1) into "Search Product Name" (1) and hit "Show Results" (2) botton. Click "VMware Cloud Foundation"(3)
  • Click on "VMware Cloud Foundation 9" to expand and select the desired releases (in my case 9.0.2.0).
  • Agree both "Terms and Conditions"(1) and "Compliance Reporting Terms and Conditions"(1) then flag the check box (2). Click on "View Group" in the row corresponding to "VMware Cloud Foundation Installer"
  • Download the VCF Installer Appliance OVA "VCF-SDDC-Manager-Appliance-9.0.2.0.25151285.ova" (approx. 2.03GB) (1)

2. DEPLOY (The OVF Wizard)

Deploying the appliance is standard, but accuracy is key for the automation engine to work later (detailed steps on how to distribute an OVA are beyond the scope here).
  • Open your existing vSphere Client.
  • Right-click Cluster -> Deploy OVF Template.
  • Upload the OVA.
  • Crucial Inputs: Provide the Name of the VM, a strong password, valid DNS, NTP, a Static IP and so on.
    Result: Power on the appliance. Wait for the services to initialize (approx. 10-15 mins).

3. CONFIGURE DEPOT (Hydrating the BOM)

This is where VCF 9 shines with its decoupled architecture. Once the appliance is up, you need to populate it with the software Bill of Materials (BOM).
  • Access the Appliance UI via browser (https://<appliance-ip>).
    Log in by entering the credentials provided during the previous deployment phase
  • Click on "DEPOT SETTINGS AND BINARY MANAGEMENT".
  • Choose one of the two DEPOT configuration options (Online - Offline).
    • Option A (Online): Enter your Broadcom Support credentials (Token). The appliance will sync the manifest and download the required bundles (ESXi, NSX, SDDC Manager services) automatically.
    • Option B (Offline): If you are in an air-gapped environment, use the Bundle Transfer Utility to upload the bundles manually.
    In our case we will connect to the online depot clicking on "CONFIGURE".
  • Connecting to the online depot requires generating a Token which can be obtained from the Broadcom Support Portal. On how to generate the token, see KB 390098

  • Insert the Token and click "AUTHENTICATE"
  • If the token entered in the previous step is correct, onces choosen the desired version (in our case, 9.0.2.0) we can download the binaries of the components we need.
  • Select the desired packages, click "DOWNLOAD" and wait for the download to complete.
We have everything we need and are ready to get started with VCF 9. We'll see how to proceed in the next posts.

That's it.