domenica 1 marzo 2026

[SSP 5.1.0] vDefend SSP Backup Failures: Stuck at 60%

Issue


If you manage a VMware vDefend Security Services Platform (SSP) environment, you know how critical automated backups are. Recently, I ran into a frustrating issue where scheduled backups started failing mysteriously. What initially seemed like a simple user and misconfigured settings issue, “network timeout” turned out to be a bizarre problem involving lost encryption keys and crashing containers.
Looking on the SSP UI at the System > Backup and Restore dashboard, I noticed a long list of failed backup jobs. Hovering over the failure status yielded a very generic message: "operation timed out, please contact administrator".
I tried manually forcing the backup, but it seems to stuck at 60% and then time out after a while.
The generic timeout error on the Backup and Restore dashboard


Solution


My first instinct was to check the SFTP credentials, re-type the password, SAVE and seemed fine.
Googling around, I found this interesting link "Backup and Restore goes into Failed State" which helped me in the final resolution.

Since the UI wasn't giving me enough technical details, it was time to drop into the CLI. I logged into the SSPI appliance via SSH as sysadmin to check what was happening at the Kubernetes pod level.

I ran a quick ...
kubectl get pods -n nsxi-platform | grep backup
... and immediately spotted the problem. The backup job pods weren't just timing out; they were actively crashing, throwing Error and OOMKilled statuses.
To understand why the pods were in this state ("OOMKilled", "Error"), I checked the logs of both pods....
k logs job-backup-4fv4k-5gq5s -n nsxi-platform
Scrolling through the container logs, I could see ... the system passed a malformed argument to the OpenSSL encryption command.
The log explicitly stated:
Error executing command: [openssl enc -aes-256-cbc -salt -in ... -out ... -k ******]. 
Error: fork/exec /usr/bin/bash: argument list too long.
Scrolling through the container logs, I could see the backup job actually doing its work: generating the .obj and .dump files from the Postgres database and config files.
However, the process abruptly halted right after the ...
<INFO> ... Started encrypting backup bundle
... message.


Conclusion

How did I fix it? I re-entered the password and retyped the passphrase to ensure the openssl process received a valid string, allowing the container to properly encrypt the dump files and upload them to the SFTP server.

The mystery was solved. The "timeout" in the UI was just a side effect. The real issue was that the missing encryption key caused the openssl command to fail spectacularly (resulting in an argument list too long OS error), which crashed the container pod.

It appeared the platform had somehow lost track of the encryption key. The configuration was stuck in a weird state where trying to "Reuse prior passphrase" wasn't working.

Now starting the backup in manual mode, it completes successfully.


That's it.

martedì 24 febbraio 2026

How to Converge an existing VMware vSphere environment to VCF 9.0 step-by-step.

We will see below (step-by-step), how to Converge a vCenter Instance and ESX Hosts to vSphere Foundation Platform.
In our case the target VMware Cloud Foundation we converge on is 9.0.2 version.

To successfully converge your existing vCenter instance and ESX hosts, you must ensure that you comply with the prerequisites, follow the procedure, and perform the next steps.
Verify that your existing configuration is supported and you satisfy the minimum component versions. See Supported Configurations in Converging Existing Virtual Infrastructure to a VCF or a vSphere Foundation Platform.

Last but not least, check Supported Scenarios to Converge to VCF.

Before to begin, a quick tip to keep in mind:
The first step to implement VCF 9 is to deploy the 'VCF-SDDC-Manager-Appliance-9.0.2.0.25151285.ova' file, which serves as the installer, as shown here.
If the VMware Cloud Foundation installer appliance is deployed on one of the hosts in the management domain. During the deployment process, the installer appliance will be converted into the SDD Manager appliance; otherwise a new SDDC Manager appliance will be deployed within the vCenter being converged.
In the case illustrated below we deployed the VCF installer outside the vCenter to be converged.


Let's start:
Access the "VCF Installer" (how to download, deploy and configure VCF Installer available here) UI via browser (https://<appliance-ip>), and log in.
Check that the depot (in our case the online depot) is correctly configured, and binaries are successfully downloaded.
Expand "DEPLOYMENT WIZARD" and select "VMware Cloud Foundation" (1).
Select "Deploy a new VCF fleet" (2) > CONTINUE (3)
In our case, the vCenter that will be converged will also be the Management Domain's vCenter.
Select the existing components (in our case only vCenter)(4) and click "NEXT" (5).
Fill in the "General Information" (6), selecting the version, entering the new name of the VCF instance, the management domain, etc.
Select the "Deployment model" (in our case, since it's a LAB, I selected Standard (Single-node)). Then proceed by clicking "NEXT" (7).
Select the correct size of the "Operations Appliance", fill in the FQDN (8), passwords and enter the FQDNs for the "Fleet Management Appliance" (9) and for the "Operations Collector" (10) Appliance. Press NEXT (11).
In this case, since it is a LAB, I skip the installation of "Automation"(12) appliance and continue by pressing NEXT (13).
Indicate and enter the credentials of the existing vCenter that you want to convert (14).
Accept the certificates (16) and confirm (17).
Enter the required information to deploy the NSX appliance (18) (19) to the management domain and continue by pressing NEXT (20).
The VMware Cloud Foundation installation appliance is not deployed on one of the hosts in the management domain. A new SDDC Manager appliance will be deployed. Enter the required information (21).
Review, than perform the VALIDATIONS tests and ACKNOWLEDGE (23) the Warnings ...
... DEPLOY (24).
Wait until the VCF components are installed...
OPEN VCF OPERATIONS UI (25)
log in ...
... and check that everything is OK.
Login on the vCenter as well and check that everything is OK too.

Useful links:
How to Converge a VMware vSphere Environment to VMware Cloud Foundation 9.0

That's it.

lunedì 16 febbraio 2026

[VCF 9.0.2] VMware Cloud Foundation Installer and Depot Configuration

🚀 Kickstarting VCF 9: From Download to Depot Configuration


Below is a quick how-to guide on some critical steps to get your VCF Installer appliance and the initial bootstrap process up and running quickly and easily (at Day 0).

The Workflow:

1. DOWNLOAD (The Broadcom Portal)

Since the transition, locating the binary is the first challenge.
  • Log in to the Broadcom Support Portal.

  • Navigate to "My Downloads" -> type "VMware Cloud Foundation"(1) into "Search Product Name" (1) and hit "Show Results" (2) botton. Click "VMware Cloud Foundation"(3)
  • Click on "VMware Cloud Foundation 9" to expand and select the desired releases (in my case 9.0.2.0).
  • Agree both "Terms and Conditions"(1) and "Compliance Reporting Terms and Conditions"(1) then flag the check box (2). Click on "View Group" in the row corresponding to "VMware Cloud Foundation Installer"
  • Download the VCF Installer Appliance OVA "VCF-SDDC-Manager-Appliance-9.0.2.0.25151285.ova" (approx. 2.03GB) (1)

2. DEPLOY (The OVF Wizard)

Deploying the appliance is standard, but accuracy is key for the automation engine to work later (detailed steps on how to distribute an OVA are beyond the scope here).
  • Open your existing vSphere Client.
  • Right-click Cluster -> Deploy OVF Template.
  • Upload the OVA.
  • Crucial Inputs: Provide the Name of the VM, a strong password, valid DNS, NTP, a Static IP and so on.
    Result: Power on the appliance. Wait for the services to initialize (approx. 10-15 mins).

3. CONFIGURE DEPOT (Hydrating the BOM)

This is where VCF 9 shines with its decoupled architecture. Once the appliance is up, you need to populate it with the software Bill of Materials (BOM).
  • Access the Appliance UI via browser (https://<appliance-ip>).
    Log in by entering the credentials provided during the previous deployment phase
  • Click on "DEPOT SETTINGS AND BINARY MANAGEMENT".
  • Choose one of the two DEPOT configuration options (Online - Offline).
    • Option A (Online): Enter your Broadcom Support credentials (Token). The appliance will sync the manifest and download the required bundles (ESXi, NSX, SDDC Manager services) automatically.
    • Option B (Offline): If you are in an air-gapped environment, use the Bundle Transfer Utility to upload the bundles manually.
    In our case we will connect to the online depot clicking on "CONFIGURE".
  • Connecting to the online depot requires generating a Token which can be obtained from the Broadcom Support Portal. On how to generate the token, see KB 390098

  • Insert the Token and click "AUTHENTICATE"
  • If the token entered in the previous step is correct, onces choosen the desired version (in our case, 9.0.2.0) we can download the binaries of the components we need.
  • Select the desired packages, click "DOWNLOAD" and wait for the download to complete.
We have everything we need and are ready to get started with VCF 9. We'll see how to proceed in the next posts.

That's it.

lunedì 9 febbraio 2026

[Holodeck] - Troubleshooting VCF 9 Online Depot Connectivity in a Holodeck Environment

Issue


If you are deploying VMware Cloud Foundation 9 (VCF 9) within a Holodeck nested lab environment, you might hit a roadblock when trying to configure the Online Depot.

The Online Depot is crucial for pulling down software bundles and compatibility data from Broadcom. However, in a nested environment where networking layers can get complicated, simple internet connectivity isn't always guaranteed.

Here is a walkthrough of a problem I encountered recently, how I diagnosed it, and the fix involving the Holorouter.

I was attempting to configure the Online Depot in the VCF Operations console (Lifecycle > VCF Management > Depot Configuration).

I entered my Broadcom Token as required. However, almost immediately after clicking "OK," I was greeted with a red banner error:

Error in setting Online depot configuration
I also attempted to simply view the certificate details to check the connection. That failed as well with a timeout error:

Connection failed - connect timed out


Solution


To understand what was happening under the hood, I SSH'd into the Fleet Management VM (OPSLCM). This is the appliance responsible for handling Lifecycle Manager operations.
I navigated to the log directory:
# cd /var/log/vrlcm/
I then tailed the main log file to watch the traffic in real-time while I retried the configuration in the UI:
# tail -f vmware_vrlcm.log
The logs painted a clear picture. The appliance was trying to reach Broadcom's download servers but was timing out.
INFO ... Fetching certificate from https://dl.broadcom.com
INFO ... Endpoint : https://dl.broadcom.com
ERROR ... IOException occurred - connect timed out
The log confirmed that the application was working fine, but the network wasn't. I tried running a simple curl command from the OPSLCM appliance to the internet, and that timed out too.
Here is the architecture of the issue:
  1. OPSLCM (Nested VM) sends a packet to the internet.
  2. The packet goes to its default gateway: the Holo-Router (10.1.1.1).
  3. The Holo-Router forwards the packet out of its WAN interface (eth0) to the physical router (192.168.1.1 in my case).
  4. The packet reaches the physical router with a source IP from the nested environment (e.g., 10.1.1.x). The physical router/firewall has no idea where 10.1.1.x is located—it has no route back to the nested environment managed by Holodeck. Consequently, the return traffic is dropped.
  5. Usually, you might solve this by adding a static route on your physical router pointing to the Holorouter. However, in many lab scenarios (including mine), we don't have access to modify the physical network infrastructure.
The solution is to enable NAT on Holorouter.
To fix this, we need to ensure that traffic leaving the Holorouter looks like it's coming from the Holorouter's WAN IP (which the physical network does know how to route). We need to enable Source NAT (Masquerading).

I logged into the Holorouter (root@holorouter) and performed the following steps.
  1. Verify IP Forwarding
    First, ensure the kernel allows forwarding (it usually does in Holodeck, as FRR is running):
    # sysctl net.ipv4.ip_forward
    Should return = 1
  2. Apply the NAT Rule
    I added an iptables rule to masquerade traffic coming from the internal VLANs (e.g., the VLAN interface eth0.10 or the specific subnet) when it exits the WAN interface (eth0).
    In my case I created a generic rule to NAT all outgoing traffic on eth0
    # iptables -t nat -I POSTROUTING 1 -o eth0 -j MASQUERADE
    With the following command verify the insertion of the NAT rule:
    # iptables -t nat -L POSTROUTING -v -n
  3. Make it Persistent
    Since the Holorouter might be rebooted or affected by Kubernetes network refreshes, I saved the configuration:
    # iptables-save > /etc/systemd/scripts/ip4save
Conclusions:
Immediately after applying the NAT rule, the "return path" for the traffic was established. The physical router now sees traffic coming from the Holorouter's valid WAN IP and returns it correctly. The Holorouter then untranslates the address and hands the packet back to the Fleet Management VM.
I went back to the VCF UI, clicked "Configure," and the Online Depot connected successfully.
Now can I proceed with the upgrade proces!!!


That's it.