Issue
If you manage a VMware vDefend Security Services Platform (SSP) environment, you know how critical automated backups are. Recently, I ran into a frustrating issue where scheduled backups started failing mysteriously. What initially seemed like a simple user and misconfigured settings issue, “network timeout” turned out to be a bizarre problem involving lost encryption keys and crashing containers.
Looking on the SSP UI at the System > Backup and Restore dashboard, I noticed a long list of failed backup jobs. Hovering over the failure status yielded a very generic message: "operation timed out, please contact administrator".
I tried manually forcing the backup, but it seems to stuck at 60% and then time out after a while.
Solution
My first instinct was to check the SFTP credentials, re-type the password, SAVE and seemed fine.
Since the UI wasn't giving me enough technical details, it was time to drop into the CLI. I logged into the SSPI appliance via SSH as
sysadmin to check what was happening at the Kubernetes pod level.
I ran a quick ...
kubectl get pods -n nsxi-platform | grep backup
... and immediately spotted the problem. The backup job pods weren't just timing out; they were actively crashing, throwing Error and OOMKilled statuses.
To understand why the pods were in this state ("OOMKilled", "Error"), I checked the logs of both pods....
k logs job-backup-4fv4k-5gq5s -n nsxi-platform
Scrolling through the container logs, I could see ... the system passed a malformed argument to the OpenSSL encryption command.
The log explicitly stated:
Error executing command: [openssl enc -aes-256-cbc -salt -in ... -out ... -k ******].
Error: fork/exec /usr/bin/bash: argument list too long.
Scrolling through the container logs, I could see the backup job actually doing its work: generating the .obj and .dump files from the Postgres database and config files.
However, the process abruptly halted right after the ...
<INFO> ... Started encrypting backup bundle
... message.
Conclusion
How did I fix it? I re-entered the password and retyped the passphrase to ensure the openssl process received a valid string, allowing the container to properly encrypt the dump files and upload them to the SFTP server.
The mystery was solved. The "timeout" in the UI was just a side effect. The real issue was that the missing encryption key caused the openssl command to fail spectacularly (resulting in an argument list too long OS error), which crashed the container pod.
It appeared the platform had somehow lost track of the encryption key. The configuration was stuck in a weird state where trying to "Reuse prior passphrase" wasn't working.
Now starting the backup in manual mode, it completes successfully.That's it.



















































