lunedì 30 agosto 2021

NSX-T Data Center EDGE does not start correctly

Issue


After some failover attempts performed on the EDGEs, following some laboratory tests, I found myself in the situation where the EDGE is no longer able to boot properly due to file system problems.

Failed to start File System Check on /dev/mapper/nsx-var+dump.
See 'systemctl status "systemd-fsck@dev\\x2dvar\\x2bdump.service"' for details.



Solution


Disclaimer: Procedures described below may not be officially supported by VMware. Use it at your own risk. Before to perform any action described be sure that you have a valid backup. The best way is to open a Service Request to the VMware GSS.

We performed the command below as suggested, then ...

systemctl status "systemd-fsck@dev\\x2dvar\\x2bdump.service"
.. we performed check of the file system ...

fsck -y /dev/mapper/nsx-var+dump
reboot
After the reboot, the Edge has started normally.

That's it.

giovedì 12 agosto 2021

If service is unavailable .... put into maintenance mode the EDGE..

Issue


I was recently asked to create a script, for monitoring by ping a specific service/IP .... and in the event of a fault for three consecutive times to take actions on NSX-T.
In my case, the action to be taken in NSX-T was to put a specific EDGE into maintenance.

Solution


First of all, what we want to realize is a bash script to run on a linux machine ... but, we also need to find out how to retrieve the NSX-T information we need via the REST API.
Let's start finding out how to retrieve information we need from the NSX-T Data Center REST API web site.
Having a linux environment available, my REST API calls will be executed using the curl command. Most API calls require authentication. NSX-T Data Center API supports several different authentication schemes, which are documented in link above. Multiple authentication schemes may not be used concurrently.

For our purpose is enough to use the Basic encoded Authentication. To do this, we modify the following call:
curl -k -u 'admin:VMware1!VMware1!' https://<nsx-mgr>/api/v1/logical-ports
in the

curl -k -H "Authorization: Basic YWRtaW46Vk13YXJlMSFWTXdhcmUxIQ==" https://<nsx-mgr>/api/v1/logical-ports
To encode the string 'admin:VMware1!VMware1!' it's enough execute, on a linux machine the command

echo -n 'admin:VMware1!VMware1!' | base64
Now, we need to retrieve the proper information regarding the EDGE (in my case "edge01a") we want to collect; executing the following command:

curl -k -H "Authorization: Basic YWRtaW46Vk13YXJlMSFWTXdhcmUxIQ==" https://<nsx-mgr>/api/v1/transport-nodes
From the outcome let's look for the display name row with the edge name (in my case edge01a as shown below) and take note of the identifier "id" indicated in the line above ("id": "32340c58-6f28-412c-9f75-c455f8d11323").

If we run the modified command as below, we get detailed information about the edge.

curl -k -H "Authorization: Basic YWRtaW46Vk13YXJlMSFWTXdhcmUxIQ==" https://<nsx-mgr>/api/v1/transport-nodes/32340c58-6f28-412c-9f75-c455f8d11323


Now we have collected all the information we need we can create the bash script as the following
#!/bin/bash
#
# Author: Lorenzo Moglie (ver.1.0 28.05.2021)
#
# IP = Active Service/IP that we want monitoring by pinging every $sleeptime (in seconds). 
#      After 3 unsuccessful attempts it performs (in our case) the failover forcing the maintenance of the EDGE (edge01a)
# sleeptime = can be set (below), time between one ping and the next by default is 1
# NSX = NSX-T Manager on which we want to launch the command
# WARNING : NSX-T Parameters to use in Basic Authorization according to your own needs, in my case:
#           Username = admin
#           Password = Vmware1!VMware1!
#           EDGE ID must be found earlier in my case 32340c58-6f28-412c-9f75-c455f8d11323
#

IP='<IP>'
sleeptime=1
NSX='<nsx-mgr>'

NPing=0
while true; do
 if [ "$NPing" -eq 3 ] 
 then
   NPing=0
   curl -k -X POST -H "Authorization: Basic YWRtaW46Vk13YXJlMSFWTXdhcmUxIQ=="  https://$NSX/api/v1/transport-nodes/32340c58-6f28-412c-9f75-c455f8d11323?action=enter_maintenance_mode
 else
 fi
 ping -c1 $IP 2>/dev/null 1>/dev/null
 if [ "$?" = 0 ]
 then
  NPing=0
  echo "OK"
 else
  echo "Failure $NPing"
  NPing=`expr $NPing + 1`
 fi
 sleep $sleeptime
done 
let's see how the script it works below...... as soon as the IP become unreachable .... after three failed attempts.. send the command to put into maintenance mode the edge.

That's it.

How to set by script a new unique UUID.bios

Issue


A colleague of mine asked me help with creating a powershell script, to change the UUID.bios value in the .vmx file due to a problem related to VMs restored from backups with the same UUID. The issue is related to the fact that both VMs (source and recovered) with the same UUID.bios are present on the execution environment at the same time.

Solution


Googling around I found an old thread on the VMNT community answered by Luc Dekens.
There are several ways of doing in it, from manual to programmatic (as can be seen in this KB article)

I've chosen to write a PowerCLI script. So, I took the Luc's code (thanks for sharing with the community) and readjusted for my needs as described below.
The steps to follow are:
  • shutdown the VM
  • get the current UUID
  • change the UUID (with one autogenerated)
  • power on the VM

The new UUID is generated by a static prefix plus the date in the format Year, Month, Day, Hours, Minutes, Seconds, where first 2 digits are taken for all of them (example Get-Date -UFormat "%y%m%d%H%M%S").
############################################################################################
#
#  File  : Change-UUID.BIOS.ps1
#  Author: Lorenzo Moglie
#  Date  : 12.08.2021
#  Description : This script disconnect can be used for generate a new UUID for the target VM
#
#  Usage: .\Change-UUID.BIOS.ps1 <vm-name>
#
############################################################################################

if ($args[0].length -gt 0) {
 $vmName = $args[0]
} else {
 Write-Host -ForegroundColor red "Usage: .\Change-UUID.BIOS.ps1 <VM Name>"
 exit 40
}


Connect-VIServer -Server <VCENTER> -User <USERNAME> -Password <PASSWORD>

$vm = Get-VM -Name $vmName
#Write-Host OLD.UUID=$($vm.extensiondata.config.uuid)

if ((Get-VM -Name $vmName).PowerState -eq "PoweredOff") {
  Write-Host -foreground Green "- VM"$vmName "is already OFF"
}
else
{
    Write-Host -foreground Red "- VM"$vmName "is shutting down ..." 
    $vm | Shutdown-VMGuest  -Confirm:$false
    While ((Get-VM -Name $vmName).PowerState -ne "PoweredOff") {
        Write-Host -foreground yellow "... waiting for" $vmName "to power off"
    sleep 5
    }
}

$newUuid = "6d6f676c-6965-6c31-2e30-" + $(Get-Date -UFormat "%y%m%d%H%M%S")

$spec = New-Object VMware.Vim.VirtualMachineConfigSpec
$spec.uuid = $newUuid
$vm.Extensiondata.ReconfigVM_Task($spec)

Write-Host -foreground Green "- VM"$vmName "successfully updated."
Write-Host "OLD.UUID="$($vm.extensiondata.config.uuid)
Write-Host "NEW.UUID="$newUuid

Write-Host -foreground Green "- VM"$vmName": Restarting in progress ...."
Start-VM -VM $vm -RunAsync 

Disconnect-VIServer -Server * -Force -Confirm:$false

let's see below how the outcome looks like ...

a double check.

UUID.BIOS changed ... Everything look fine.

That's it.