lunedì 9 febbraio 2026

[Holodeck] - Troubleshooting VCF 9 Online Depot Connectivity in a Holodeck Environment

Issue


If you are deploying VMware Cloud Foundation 9 (VCF 9) within a Holodeck nested lab environment, you might hit a roadblock when trying to configure the Online Depot.

The Online Depot is crucial for pulling down software bundles and compatibility data from Broadcom. However, in a nested environment where networking layers can get complicated, simple internet connectivity isn't always guaranteed.

Here is a walkthrough of a problem I encountered recently, how I diagnosed it, and the fix involving the Holorouter.

I was attempting to configure the Online Depot in the VCF Operations console (Lifecycle > VCF Management > Depot Configuration).

I entered my Broadcom Token as required. However, almost immediately after clicking "OK," I was greeted with a red banner error:

Error in setting Online depot configuration
I also attempted to simply view the certificate details to check the connection. That failed as well with a timeout error:

Connection failed - connect timed out


Solution


To understand what was happening under the hood, I SSH'd into the Fleet Management VM (OPSLCM). This is the appliance responsible for handling Lifecycle Manager operations.
I navigated to the log directory:
# cd /var/log/vrlcm/
I then tailed the main log file to watch the traffic in real-time while I retried the configuration in the UI:
# tail -f vmware_vrlcm.log
The logs painted a clear picture. The appliance was trying to reach Broadcom's download servers but was timing out.
INFO ... Fetching certificate from https://dl.broadcom.com
INFO ... Endpoint : https://dl.broadcom.com
ERROR ... IOException occurred - connect timed out
The log confirmed that the application was working fine, but the network wasn't. I tried running a simple curl command from the OPSLCM appliance to the internet, and that timed out too.
Here is the architecture of the issue:
  1. OPSLCM (Nested VM) sends a packet to the internet.
  2. The packet goes to its default gateway: the Holo-Router (10.1.1.1).
  3. The Holo-Router forwards the packet out of its WAN interface (eth0) to the physical router (192.168.1.1 in my case).
  4. The packet reaches the physical router with a source IP from the nested environment (e.g., 10.1.1.x). The physical router/firewall has no idea where 10.1.1.x is located—it has no route back to the nested environment managed by Holodeck. Consequently, the return traffic is dropped.
  5. Usually, you might solve this by adding a static route on your physical router pointing to the Holorouter. However, in many lab scenarios (including mine), we don't have access to modify the physical network infrastructure.
The solution is to enable NAT on Holorouter.
To fix this, we need to ensure that traffic leaving the Holorouter looks like it's coming from the Holorouter's WAN IP (which the physical network does know how to route). We need to enable Source NAT (Masquerading).

I logged into the Holorouter (root@holorouter) and performed the following steps.
  1. Verify IP Forwarding
    First, ensure the kernel allows forwarding (it usually does in Holodeck, as FRR is running):
    # sysctl net.ipv4.ip_forward
    Should return = 1
  2. Apply the NAT Rule
    I added an iptables rule to masquerade traffic coming from the internal VLANs (e.g., the VLAN interface eth0.10 or the specific subnet) when it exits the WAN interface (eth0).
    In my case I created a generic rule to NAT all outgoing traffic on eth0
    # iptables -t nat -I POSTROUTING 1 -o eth0 -j MASQUERADE
    With the following command verify the insertion of the NAT rule:
    # iptables -t nat -L POSTROUTING -v -n
  3. Make it Persistent
    Since the Holorouter might be rebooted or affected by Kubernetes network refreshes, I saved the configuration:
    # iptables-save > /etc/systemd/scripts/ip4save
Conclusions:
Immediately after applying the NAT rule, the "return path" for the traffic was established. The physical router now sees traffic coming from the Holorouter's valid WAN IP and returns it correctly. The Holorouter then untranslates the address and hands the packet back to the Fleet Management VM.
I went back to the VCF UI, clicked "Configure," and the Online Depot connected successfully.
Now can I proceed with the upgrade proces!!!


That's it.

lunedì 2 febbraio 2026

[VCF 9.0 - Import ] VMWARE_COMPAT is not found

Issue


Today while I was doing some Importing tests of an external workload domain into my VCF 9.0 instance (in LAB). During the import prechecks tests I got the following error message:

An error occurred when validating VMware Cloud Foundation compatibility: File with Compatibility Matrix Content for Compatibility controller VMWARE_COMPAT is not found for <vCenter>.

Please refer to error message above and contact support for more details.



Solution


Googling around I found the following article: Deploying a new VCF instance results in an error: “VcManager vc1.example.com: An error occurred when validating VMware Cloud Foundation compatibility: File with Compatibility Matrix Content for Compatibility controller VMWARE_COMPAT is not found.”

As shown in the KB:
  1. SSH on the SDDC Manager, and elevate to root user.
  2. Check and if doesn't exist create the compatibility directory

    # ls /nfs/vmware/vcf/nfs-mount/compatibility
    # mkdir /nfs/vmware/vcf/nfs-mount/compatibility
  3. Download VmwareCompatibilityData.json file using curl

    #curl --request GET --url 'https://vvs.broadcom.com/v1/products/bundles/type/vcf-lcm-v2-bundle?format=json' --header 'x-vmw-esp-clientid: vcf-lcm' > /nfs/vmware/vcf/nfs-mount/compatibility/VmwareCompatibilityData.json
  4. Change the permission on the directory

    #chown -R vcf_lcm:vcf /nfs/vmware/vcf/nfs-mount/compatibility
  5. Re-run the validation from the installer.



That's it.

venerdì 24 ottobre 2025

[Holodeck] - issue fixed : services don't start in DHCP mode

Issue


Holodeck is a powerfull toolkit designed to provide a standardized and automated method to deploy nested VMware Cloud Foundation (VCF) environments on a VMware ESX host or a vSphere cluster for your homelab learning test.

All informations about holodeck are available here.

The appliance can be deployed with a static IP or with DHCP.
What happens if I deploy the appliance using DHCP, then shut it down, and when I power it back on, it receives a different IP address?
The answer is that after reboot, previously configured services may fail to start properly, causing the Kubernetes control plane to become unresponsive.

I retrieve the new IP address and attempt to access the appliance via the web interface ...


Solution


Disclaimer: Use it at your own risk.

As a quick fix, if the old IP is still available, simply set the old IP in the network configuration as a static IP and restart the network services. The pods will be activated fairly quickly.
If the IP is unavailable, follow the steps below.

1. Stop iptables

First of all I stop the iptables/Firewall service to connect via SSH to the VM.
# systemctl stop iptables

2. Check status

Connect via SSH to the Holorouter and control the Kubernetes pod, with the following command:
# kubectl get pods
What you can see from the image above is that the control plane was expecting a response from a different IP than the one we currently have.
Previous IP: 192.168.1.70
Current IP: 192.168.1.238

I check the network configuration as well ...
# cat /etc/systemd/network/50-static-en.network

3. Re-init the kubernetes control plane

To reinitialize the Kubernetes control plane and allow the server API and pods to function properly again, I created the following script (downloadable below). The script also changes the network settings from DHCP to statically with the new IP address obtained (in my case, 192.168.1.238).

I create the new file in the root path:
# vi change-control-plane-ip.sh

I paste what you can see in the image (script below); I save and run the script...
# bash change-control-plane-ip.sh
If all went well, it should look something like the one shown in the picture.
Check the current state, pre-reboot
As you can see from the image above, the pods are in an "Unknow" state.

4. Reboot and check results

I restart the appliance and perform the post-reboot check ...
# reboot

To check if the pods have powered up, I log in to the appliance and run the following command:
# kubectl get pods
If they haven't completely in a running state, wait a moment until they are completely up.
When the pods are up and running try connecting via the web.

Boom!! It works


Below the script used change-control-plane-ip.sh
# change-control-plane-ip.sh
# Stop Services
systemctl stop kubelet docker

# Backup Kubernetes and kubelet
mv -f /etc/kubernetes /etc/kubernetes-backup
mv -f /var/lib/kubelet /var/lib/kubelet-backup

# Keep the certs we need
mkdir -p /etc/kubernetes
cp -r /etc/kubernetes-backup/pki /etc/kubernetes
rm -rf /etc/kubernetes/pki/{apiserver.*,etcd/peer.*}

# Start docker
systemctl start docker

# Get IP address
IP=`ip -o -4 addr show eth0 | awk '{print $4}' | cut -d/ -f1`

# Init cluster with new ip address
kubeadm init --control-plane-endpoint $IP --ignore-preflight-errors=all --v=5

# Verify resutl
kubectl cluster-info

# Change IP on the configuration file 
cp /etc/systemd/network/50-static-en.network /etc/systemd/network/50-static-en.network.backup 
cat > /etc/systemd/network/50-static-en.network << EOF

[Match]
Name=eth0

[Network]
Address=`ip -o -4 addr show eth0 | awk '{print $4}'`
Gateway=`ip route show 0.0.0.0/0 dev eth0 | cut -d\  -f3`
DNS=10.1.1.1 null
EOF

    



That's it.

mercoledì 22 ottobre 2025

HomeLAB v2


I recently upgraded my Homelab with new hardware that gives me the ability to have more computing power and the ability to test new environments like VMware Cloud Foundatin 9.0 in a nested environment.
Specifications are available in the "HomeLAB" area or at the following link.


lunedì 6 ottobre 2025

[VMware Explore on Tour] Paris here we come!!


This year the format of VMware Explore has changed; there are no longer two events (America, Europe) but there are smaller events around the world.
Explore is extending across the globe as 1 to 1.5 day events that will highlight the top content and insights from Explore in Las Vegas. Each event will include a curated subset of sessions and Hands-on Labs, a meetings program, and networking opportunities.



✨ Exciting times ahead – VMware Explore On Tour is coming to Paris! 🇫🇷

I’m truly looking forward to joining this year’s VMware Explore On Tour in Paris – a unique opportunity to immerse ourselves in the latest innovations, strategies, and real-world stories that are shaping the future of IT.

Beyond the inspiring sessions and thought-provoking keynotes, events like VMware Explore always carry something even more valuable: the chance to reconnect with old friends, colleagues, and community members, while meeting new professionals who share the same passion for technology, cloud, and digital transformation.

🔹 Learning from top experts
🔹 Discovering new solutions and use cases
🔹 Expanding perspectives on modern private cloud, AI, networking, and security
🔹 Strengthening relationships and building new connections


These moments of exchange and collaboration are what make this community so special. Every conversation, whether in a breakout session or over a coffee, adds a new piece to the bigger picture of how we’re transforming the way organizations run and innovate.

I can’t wait to be there, dive deeper into the latest trends, and, most of all, enjoy the vibrant energy of our ecosystem coming together in the beautiful city of Paris.

Who else will be there?

venerdì 8 agosto 2025

[NSX - KB406460 ] NSX_OPSAGENT on ESXi node

Issue


Today has been release the KB 406460 related "The memory usage of agent NSX_OPSAGENT on ESXi node <UUID> has reached <kb> kilobytes which is at or above the high threshold value of 80%"


Solution


As a temporary workaround to the issue as mentioned in option 1, which consist in to restart OpsAgent on the affected hosts; I wrote a short prowershell script to restart the agent on all hosts connected to vCenter Cluster.
Let's see it below:

##########
# 
# Run remote commands (Linux like) on esxi hosts to restart /etc/init.d/nsx-opsagent
#
# How it works:
# 	Connect to vCenter
# 	Get the list of ESXi hosts from the cluster
# 	Enable SSH on host
# 	Restart "/etc/init.d/nsx-opsagent" service on the host
# 	Disable SSH on host
#
# Requirement: Install-Module -Name Posh-SSH
#
# LM 22.05.2025
##
Import-Module -Name Posh-SSH

#Replace the parameter below with your values
$esxiUser = "root"
$esxiPassword = "<ESXi - PASSWORD>"
$vc = "<vCenter IP or FQDN>"
$vcUser = "administrator@vsphere.local"
$vcPassword = "<vCenter Password>"
$clusterName = "<Cluster Name>"

Connect-VIServer -Server $vc -User $vcUser -Password $vcPassword

$count=0
foreach ($esxiIP in (Get-Cluster -Name $clusterName | Get-VMHost)) {
  $count = $count + 1      
  Write-Host " ----------------------- $($esxiIP) ----------------------------------"
  # Enable SSH 
  Write-Host " Enabling SSH! " -ForegroundColor Green
  Get-VMHost -Name $esxiIP| Get-VMHostService | ?{"TSM-SSH" -eq $_.Key} | Start-VMHostService

  #SSH connection and service restart 
    $session = New-SSHSession -ComputerName $esxiIP -Credential (New-Object System.Management.Automation.PSCredential($esxiUser, (ConvertTo-SecureString $esxiPassword -AsPlainText -Force)))  -Force
    if ($session.Connected) {
        $command = "/etc/init.d/nsx-opsagent restart"
        $result = Invoke-SSHCommand -SessionId $session.SessionId -Command $command 
    
        if ($result.ExitStatus -eq 0) {
            Write-Host "Service restarted! on Host ->"$esxiIP -ForegroundColor Green
        } else {
            Write-Host "Error on host $($esxiIP): $($result.Error)" -ForegroundColor Red
        }
    
        Remove-SSHSession -SessionId $session.SessionId | Out-Null
    } else {
        Write-Host "SSH Connection failed! on Host ->"$esxiIP -ForegroundColor Red
    }

  sleep 1
  # Disable SSH
  Get-VMHost -Name $esxiIP| Get-VMHostService | ?{"TSM-SSH" -eq $_.Key} | Stop-VMHostService -Confirm:$false

  Write-Host " ----------------------------------------------------------------------------"
  Write-Host
}

Disconnect-VIServer -Server $vc -Confirm:$false

    



That's it.

lunedì 17 febbraio 2025

[NSX - Search for Objects] Quick TIP

Issue


Why if I do a search in the NSX "search bar" with the admin user, I get results and if I do the same search with another user (who identifies himself via vIDM, which has the same roles and permissions as the admin user), I get more information?? Both are "Enterprise Admin".
See pictures below, search with admin user ...
... search with another "Enterprise Admin" user.


Solution


When NSX is first installed, it does not set the "User Interface Mode Toggle" by default in System > Settings > General Settings > User Interface ...
When searching this leads to an incomplete results.
Changing to Policy Mode ...
... repeating the search with the Admin user .... now we have the complete result ...
Further information regarding the "Search for Objects" can be found at the following link.

That's it.