lunedì 16 dicembre 2019

NSX Installation - XXX eam.agent.enable.label not found XXX

Problem
Today I ran into the following error message while I was trying to prepare the ESXi hosts for VXLAN (installing VIBs) on a production cluster. The install would immediately fail with no real good error messages tasks and events displayed “Cannot complete the operation. See the event log for details.” Task: XXX eam.agent.enable.label not found XXX.



from the perspective of the "Installation and Upgrade", "Host Preparation" looks like the image below



I already tried to put into maintenance mode the Host ESXi involved, reboot and click on RESOLVE in ACTIONS several times but without success.
I have checked NTP, TIME, DNS and so on, on the entire environment, but everything seems fine and properly configured.

Then under the Networking area I saw that the vDS at the corresponding ESXi hosts was "Out of sync"



I also tried with Rectify vSphere Distribuited Switch on Host but didn't solve so I tried to:
- Remove the host from the cluster
- Remove the host from the vCenter server
- Remove NSX DV_Switch
- Add the host to vCenter
- Add NSX DV_Switch
- Add the host to the cluster.

But the result doesn't change ......

After digging a little bit deeper into NSX Manager logs and vCenter logs, I was able to find the following error message in my eam.log.


2019-12-12T11:50:48.052Z | ERROR | agent-4 | AuditedJob.java | 75 | JOB FAILED: [#1094199235] 
EnableDisableAgentJob(AgentImpl(ID:'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03')), com.vmware.eam.EamException: VibInstallationFailed
2019-12-12T13:43:20.165Z |  INFO | vlsi | AgencyIssueHandler.java | 93 | Resolving AgencyImpl(ID:'Agency:adf028a8-f028-4c48-974a-0fcd60082b28:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03'): issues=int[] [
   25
], unknown=null
2019-12-12T13:43:20.166Z |  INFO | vlsi | AgentIssueHandler.java | 591 | Resolving AgentImpl(ID:'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03') issues: issues=[I@337b2457, unknown=null
2019-12-12T13:43:20.166Z |  INFO | vlsi | IssueHandler.java | 197 | Issue removed: AgentIssueHandler:AgentImpl(ID:'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03')
eam.issue.VibNotInstalled {
   description = '<error errorClass="AttributeError">
  <errorCode>99</errorCode>
  <errorDesc>'NoneType' object has no attribute 'Copy'</errorDesc>
</error>',
   time = 2019-12-12 11:50:48,024,
   key = 25,
   agency = 'Agency:adf028a8-f028-4c48-974a-0fcd60082b28:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   solutionId = 'VSPHERE.LOCAL\Administrator',
   agencyName = '_NSX_146_Cluster-LAB_VMware Network Fabric',
   solutionName = 'Administrator vsphere.local',
   agentName = 'VMware Network Fabric (6)',
   agent = 'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   host = 'HostSystem:host-10:BAAA27E9-8AB1-4C50-A69D-1B7F9EDB480D',
   hostName = 'r730n4.local.lab',
}
Location: AgentIssueHandler.java:resolve:687
2019-12-12T13:43:20.166Z |  INFO | vlsi | IssueHandler.java | 121 | Updating issues:
New issues:
 []
Removed issues: [
eam.issue.VibNotInstalled {
   description = '<error errorClass="AttributeError">
  <errorCode>99</errorCode>
  <errorDesc>'NoneType' object has no attribute 'Copy'</errorDesc>
</error>',
   time = 2019-12-12 11:50:48,024,
   key = 25,
   agency = 'Agency:adf028a8-f028-4c48-974a-0fcd60082b28:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   solutionId = 'VSPHERE.LOCAL\Administrator',
   agencyName = '_NSX_146_Cluster-LAB_VMware Network Fabric',
   solutionName = 'Administrator vsphere.local',
   agentName = 'VMware Network Fabric (6)',
   agent = 'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   host = 'HostSystem:host-10:BAAA27E9-8AB1-4C50-A69D-1B7F9EDB480D',
   hostName = 'r730n4.local.lab',
}
]
2019-12-12T13:43:20.188Z |  INFO | vlsi | IssueHandler.java | 121 | Updating issues:
New issues:
 []
Removed issues: [
eam.issue.VibNotInstalled {
   description = '<error errorClass="AttributeError">
  <errorCode>99</errorCode>
  <errorDesc>'NoneType' object has no attribute 'Copy'</errorDesc>
</error>',
   time = 2019-12-12 11:50:48,024,
   key = 25,
   agency = 'Agency:adf028a8-f028-4c48-974a-0fcd60082b28:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   solutionId = 'VSPHERE.LOCAL\Administrator',
   agencyName = '_NSX_146_Cluster-LAB_VMware Network Fabric',
   solutionName = 'Administrator vsphere.local',
   agentName = 'VMware Network Fabric (6)',
   agent = 'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   host = 'HostSystem:host-10:BAAA27E9-8AB1-4C50-A69D-1B7F9EDB480D',
   hostName = 'r730n4.local.lab',
}
]
2019-12-12T13:43:30.204Z |  INFO | compute-res-0 | ClusterVibJob.java | 192 | No hosts needs maintenance mode in compute resource: domain-c7
2019-12-12T13:43:33.762Z |  INFO | host-10-4 | VcPatchManager.java | 356 | Scan result on VcHostSystem(ID: host-10):
ScanResult {
   errorCode = 99,
   errorMessage = '<error errorClass="AttributeError">
  <errorCode>99</errorCode>
  <errorDesc>'NoneType' object has no attribute 'Copy'</errorDesc>
</error>',
   success = false,
   vibUrl = https://10.10.10.241/bin/vdn/vibs-6.4.6/6.0-14762108/vxlan.zip,
   pendingReboot = false,
   requiresReboot = false,
   requiresMaintenanceMode = false,
   requiresHostdRestart = false,
   installed = false,
   bulletins = [Vib [VibHeader [id=VMware_bootbank_esx-nsxv_6.0.0-0.0.14762108, VibId [name=esx-nsxv, version=6.0.0-0.0.14762108], vendor=VMware, summary=NSX datapath and host tools, release date=Mon Sep 30 12:14:44 UTC 2019], payload=https://vcenter.local.lab:443/eam/vib?id=6c4f942e-7e0c-41df-803f-4869809d4320]],
}
2019-12-12T13:43:33.762Z | ERROR | host-10-4 | VcPatchManager.java | 358 | PatchManager operation failed:
<esxupdate-response>
<version>1.50</version>
<vib-scan-data>
  <id>VMware_locker_tools-light_6.0.0-2.43.4192238</id>
  <pkgstate>installed</pkgstate>
  <name>tools-light</name>
  <version>6.0.0-2.43.4192238</version>
  <vendor>VMware</vendor>
  <installdate>2017-01-12 10:37:55.871855+00:00</installdate>
  <meetsSystemReq>True</meetsSystemReq>
  <pkgDepsMetByHost>False</pkgDepsMetByHost>
  <requires>
    <requirement>esx-version</requirement>
    <providedByHost>False</providedByHost>
  </requires>
  <conflicts>
    <conflictsWithHost>False</conflictsWithHost>
  </conflicts>
  <obsoletes>
    <obsoletesHost>False</obsoletesHost>
    <obsoletedByHost>False</obsoletedByHost>
  </obsoletes>
  <systemReqs>
    <swPlatform productLineID="embeddedEsx" version=""/>
    <maintenanceMode>False</maintenanceMode>
    <maintenanceModeUninstall>False</maintenanceModeUninstall>
  </systemReqs>
  <postInstall>
    <rebootRequired>False</rebootRequired>
    <hostdRestart>False</hostdRestart>
  </postInstall>
  <postUninstall>
    <rebootRequired>False</rebootRequired>
  </postUninstall>
</vib-scan-data>
<vib-scan-data>
  <id>VMware_bootbank_esx-nsxv_6.0.0-0.0.14762108</id>
  <pkgstate>uninstalled</pkgstate>
  <name>esx-nsxv</name>
  <version>6.0.0-0.0.14762108</version>
  <vendor>VMware</vendor>
  <installdate></installdate>
  <meetsSystemReq>True</meetsSystemReq>
  <pkgDepsMetByHost>False</pkgDepsMetByHost>
  <requires>
    <requirement>esx-base << 7.0</requirement>
    <providedByHost>False</providedByHost>
  </requires>
  <requires>
    <requirement>vmkapi_2_3_0_0</requirement>
    <providedByHost>False</providedByHost>
  </requires>
  <requires>
    <requirement>nsx-api <= 2.1</requirement>
    <providedByHost>False</providedByHost>
  </requires>
  <conflicts>
    <conflictsWithHost>False</conflictsWithHost>
  </conflicts>
  <obsoletes>
    <obsoletesHost>False</obsoletesHost>
    <obsoletedByHost>False</obsoletedByHost>
  </obsoletes>
  <systemReqs>
    <swPlatform productLineID="embeddedEsx" version=""/>
    <maintenanceMode>False</maintenanceMode>
    <maintenanceModeUninstall>True</maintenanceModeUninstall>
  </systemReqs>
  <postInstall>
    <rebootRequired>False</rebootRequired>
    <hostdRestart>False</hostdRestart>
  </postInstall>
  <postUninstall>
    <rebootRequired>False</rebootRequired>
  </postUninstall>
</vib-scan-data>
<host-scan-data>
  <imagePending>False</imagePending>
  <vibStaged>False</vibStaged>
</host-scan-data>
<error errorClass="AttributeError">
  <errorCode>99</errorCode>
  <errorDesc>'NoneType' object has no attribute 'Copy'</errorDesc>
</error>
</esxupdate-response>

2019-12-12T13:43:33.763Z | ERROR | host-10-4 | VibJob.java | 730 | Unhandled response code: 99
2019-12-12T13:43:33.763Z | ERROR | host-10-4 | VibJob.java | 736 | PatchManager operation failed with error code: 99
With VibUrl: https://10.10.10.241/bin/vdn/vibs-6.4.6/6.0-14762108/vxlan.zip
2019-12-12T13:43:33.763Z |  INFO | host-10-4 | IssueHandler.java | 121 | Updating issues:
New issues:
 [
eam.issue.VibNotInstalled {
   description = 'XXX uninitialized',
   time = 2019-12-12 13:43:33,763,
   key = 27,
   agency = 'Agency:adf028a8-f028-4c48-974a-0fcd60082b28:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   solutionId = 'VSPHERE.LOCAL\Administrator',
   agencyName = '_NSX_146_Cluster-LAB_VMware Network Fabric',
   solutionName = 'Administrator vsphere.local',
   agentName = 'VMware Network Fabric (6)',
   agent = 'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   host = 'HostSystem:host-10:BAAA27E9-8AB1-4C50-A69D-1B7F9EDB480D',
   hostName = 'r730n4.local.lab',
}
]
Removed issues: []
2019-12-12T13:43:33.763Z |  INFO | host-10-4 | IssueHandler.java | 121 | Updating issues:
New issues:
 [
eam.issue.VibNotInstalled {
   description = 'XXX uninitialized',
   time = 2019-12-12 13:43:33,763,
   key = 27,
   agency = 'Agency:adf028a8-f028-4c48-974a-0fcd60082b28:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   solutionId = 'VSPHERE.LOCAL\Administrator',
   agencyName = '_NSX_146_Cluster-LAB_VMware Network Fabric',
   solutionName = 'Administrator vsphere.local',
   agentName = 'VMware Network Fabric (6)',
   agent = 'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03',
   host = 'HostSystem:host-10:BAAA27E9-8AB1-4C50-A69D-1B7F9EDB480D',
   hostName = 'r730n4.local.lab',
}
]
Removed issues: []
2019-12-12T13:43:33.767Z | ERROR | host-10-4 | AgentImpl.java | 2413 | AgentImpl(ID:'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03') VIB installation failed: https://10.10.10.241/bin/vdn/vibs-6.4.6/6.0-14762108/vxlan.zip
2019-12-12T13:43:33.767Z |  INFO | host-10-4 | VcComputeResource.java | 505 | VcClusterComputeResource(ID: domain-c7) setting required agent count in VC to 0
2019-12-12T13:43:33.786Z | ERROR | agent-0 | AuditedJob.java | 75 | JOB FAILED: [#1885632038] EnableDisableAgentJob(AgentImpl(ID:'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03')), com.vmware.eam.EamException: VibInstallationFailed
2019-12-12T13:43:33.792Z |  INFO | host-10-4 | VibJob.java | 241 | VIBs installed at AgentImpl(ID:'Agent:c05fcd62-1865-475c-a63e-ea552503b4ad:c4aa57e6-93f2-4bbd-af53-8a296ebe1e03') (VcHostSystem(ID: host-10))
2019-12-12T13:43:33.792Z |  INFO | host-10-4 | VibJob.java | 245 | All agents checked for pending VIB actions.
2019-12-12T13:43:33.792Z |  INFO | compute-res-0 | ClusterVibJob.java | 433 | [] are to be rebooted.


and looking inside the ESXi host I observed.. that no image profile was present ...

[InstallationError] No image profile is found on the host or image profile is empty. An image profile is required to install or remove VIBs. To install an image profile, use the esxcli image profile install command. Please refer to the log file for more details.


[root@r7XXXX:/vmfs/volumes/5ca5de41-90ef28e5-5a93-149ecf45a06f/VIBs] esxcli software vib list
Name         Version             Vendor  Acceptance Level  Install Date
-----------  ------------------  ------  ----------------  ------------
tools-light  6.0.0-2.43.4192238  VMware  VMwareCertified   2017-01-12
[root@r7XXXX:/vmfs/volumes/5ca5de41-90ef28e5-5a93-149ecf45a06f/VIBs]

then I followed the procedure showed in this site, describe step-by-step below here.

Solution
This can be fixed by replacing the corrupt image file and replacing with a known good one from another host. Down here, how to do it.



  1. On the working ESXi host, copy the following image file: imgdb.tgz
    cp /bootbank/imgdb.tgz /vmfs/volumes/

  2. On the corrupt host, copy the file imgdb.tgz from the working host to /tmp:
    cp /vmfs/volumes//imgdb.tgz /tmp

  3. Change Directories to /tmp
    cd /tmp

  4. Extract file you just copied
    tar -xzf imgdb.tgz

  5. Copy the working profile files to the profile directory
    cp /tmp/var/db/esximg/profiles/* /var/db/esximg/profiles/

  6. Copy the working VIBs to the VIB repository
    cp /tmp/var/db/esximg/vibs/* /var/db/esximg/vibs/

  7. Remove the corrupt imgdb.tgz from the bootbank
    rm /bootbank/imgdb.tgz

  8. Move the working copy of imgdb.tgz into the bootbank
    cp /tmp/imgdb.tgz /bootbank/

  9. Make Config Backup
    /sbin/auto-backup.sh

  10. Reboot the host
    reboot

When the ESXi host is up and running again, the system install the missed eam.agent and VIBs.



Clicking Resolve in the ACTIONS menu and the system automatically detects the VIBs.


That's it.

venerdì 6 dicembre 2019

NSX Manager Password Recovery

Problem
Following the update of vShield Manager to NSX Manager 6.2.4 (with the update bundle), the previous passwords seems that admin, enable or web interface no longer work correctly. In few words, seems the we have lost passwords.

Let's covering the process down here in detail, step by step...

Disclaimer: Procedures described below, as the results of numerous attempts, are not officially supported by VMware. Use it at your own risk. The best way is to open a Service Request to the support.


Solution
First of all, let's start by analyzing a working NSX Manager appliance whose password we know. For my purpose I used VMware's HOL.

We access in "Tech Support Mode" to the NSX Manager console via an SSH client as indicated in my previous post (in italian) and then let's check the disk partitions with df -h


then ... mount


What we can see from the above images, in addition to the root partition mounted on /, we can notice the extended partition /dev/sda6 mounted in the /common folder.

Looking around into the folder /common I found an interesting file called passwd in /common/configs/cli/etc/passwd ....


that looks likes a shadow file ... as is possible to see below.


Then knowing the current password of the LAB HOL which is "VMware1!VMware1!" I try to verify if via perl I can generate the hash of the password .... performing the command:

perl -e 'print crypt("Password","\$6\$saltsalt\$") . "\n"'


In my case perl -e 'print crypt("VMware1!VMware1!","\$6\$u5rPILiF\$") . "\n"' and it match


After the analysis let's start with the steps. The fastest way I found to access the NSX Manager Appliance and reset the password, is to start the VM from a CDROM with a Linux live CD.

First of all, we turn off the VM and take a cold snapshot (for backup purposes).


after that we connect to the appliance the Linux live CD (in my case I used lubuntu)


and we boot the appliance selecting the CD-ROM Drive options


Start linux in live mode (without installing anything), and once started open a Terminal and ....

lubuntu@lubuntu:~$ sudo su
root@lubuntu:~# fdisk -l | more
root@lubuntu:~# mkdir /nsx
root@lubuntu:/# cd /

The partition that interests us are /dev/sda3 and /dev/sda6...


root@lubuntu:/# mount /dev/sda3 /nsx
root@lubuntu:/# cat /nsx/etc/shadow


So, now we have to keep notice of the hash algorithm ID (SHA-512 in our case ) and the key used to crypt, so we can generate the hash for ours password "changeme" with the following command ...

root@lubuntu:/# perl -e 'print crypt("changeme","\$6\$YuXXXXXq\$") . "\n"'


we copy the hash of the newly generated password and replace it in the file /nsx/etc/shadow

root@lubuntu:/# vi /nsx/etc/shadow

replace the hash with the new one and save the file hitting "ESC" and then typing ":wq!"


root@lubuntu:/# umount /nsx
root@lubuntu:/# mount /dev/sda6 /nsx
root@lubuntu:/# cat /nsx/configs/cli/etc/passwd

Into "/nsx/configs/cli/etc/passwd" should be stored the password of the enable mode user. Proceed with the same password hash generation procedure and replace into the file /nsx/configs/cli/etc/passwd as described above.

root@lubuntu:/# perl -e 'print crypt("changeme","\$6\$nXXXXXXP\$") . "\n"'

we copy the hash of the newly generated password and replace it in the file /nsx/configs/cli/etc/passwd

root@lubuntu:/# vi /nsx/configs/cli/etc/passwd

replace the hash with the new one and save the file hitting "ESC" and then typing ":wq!"



root@lubuntu:/# umount /nsx
root@lubuntu:/# reboot

When the appliance is up and running, try to get into providing the admin username "admin" and the password "changeme" ....


It Works!!!!!




Change the password of the admin and enable users following the KB2078825 Securing VMware NSX for vSphere 6.x CLI User Accounts and Privileged mode.

If everything went as expected remove the snapshot.

That's it.