mercoledì 4 settembre 2024

[vSAN - SRM] - Reduced availability without rebuild

Issue


One month ago, I came across this. After a disaster recovery test performed via SRM, we encountered the following error "Reduced availability without rebuild" for 27 objects, we tried to click "repair object immediately" without success.
There are no resync objects in progress.
The source and target infrastructure consists of two VCF 5.X environment based on vSAN file system, where Site Recovery Manager is used to replicate VMs.
This issue reduce the Health score rate to 60%.
Below is the GSS analysis

Issue Clarification:
Customer has 27 objects all using the same policy showing a reduced availably with no rebuild

Issue Verification:
We verified that the cluster has 27 objects that are in reduced availability with no rebuild.

Cause Identification:
We found that the customer is using a ftt2 policy with force provisioning for these objects.

Cause Justification:
As we see in the chart on https://docs.vmware.com/en/VMware-Cloud-on-AWS/services/com.vmware.vmc-aws-operations/GUID-EDBB551B-51B0-421B-9C44-6ECB66ED660B.html
In order to satisfy a ftt2 policy we will need 5 hosts.
Customer is using force provisioning but this will only provision the object if the primary number of is not met it will not allow a policy to be compliant till the policy is satisfied.

Force provisioning:
If the option is set to Yes, the object is provisioned even if the Primary level of failures to tolerate, Number of disk stripes per object, and Flash read cache reservation policies specified in the storage policy cannot be satisfied by the datastore. Use this parameter in bootstrapping scenarios and during an outage when standard provisioning is no longer possible.

The default No is acceptable for most production environments. vSAN fails to provision a virtual machine when the policy requirements are not met, but it successfully creates the user-defined storage policy.

Solution Recommendation:
Change the policy to match the current host configuration or add a host to match the policy

Solution Justification:
Once the policy is set to match the cluster the objects will be compliant and will no longer be in the reduced availability with no rebuild status.
Unfortunately, the recommendations, in this case, did not solve my problem.
I also changed, created a new policy, re-applied the storage policy on those VMs/Objects without success.

Solution


We have been able to solved the issue performing the following steps:

Short Answer
  • Create a Protection Group and a new Recovery Plan on SRM;
  • Check that the VMs(/objects with the issue) were correctly replicated (in sync) with the target;
  • Migrate the VMs(/objects with the issue) to the new Protection Group;
  • Check the configuration of the VMs with the Edit Setting (no disks connected);
  • Active the Recovery Plan test and the VMs should turned on correctly;
  • Check again via Edit Settings whether the disks are present, and indeed they are correctly attached.
  • Verify in "vSAN Object Health" there are no more objects in "reduced availability with no rebuild".
  • Perform the test clean up and migrate the VMs back to the original Protection Group.
  • Re-check the configuration of the VMs with the Edit Setting (no disks connected, maybe they are managed by SRM and connected when required).
  • Perform a double check and make sure everything is working fine.


Long Answer (with screenshots and details)
  • Create a Protection Group and a new Recovery Plan on SRM:
     - Connect to the Site Recovery Manager where the VM with the error are present
     - Create a new PG; in my case I named it "BA-Test-vSAN-Issue"
     - Creare a new RP; in my case I named it "TEST-BA-MGMT_RecoveryPlan"
     - In "Virtual Objects" we can see that the VM is in the "Reduced availability without rebuild" state

  • Check that the VMs(/objects with the issue) were correctly replicated (in sync) with the target:
     - Check on the current PG that the virtual machine is synchronized

  • Migrate the VMs(/objects with the issue) to the new Protection Group (BA-Test-vSAN-Issue):
     - Edit the original PG
     - Unflag the VM (to remove it from the PG)
     - Edit the new PG and Add the VM

  • Check the configuration of the VMs with the Edit Setting (no disks connected):
     - Edit Settings on the VM and check it

  • Active the Recovery Plan test and the VMs should turned on correctly:
     - The virtual machine is synchronized
     - Go to the new Recovery Plan (in my case "TEST-BA-MGMT_RecoveryPlan") and activate it

  • Check again via Edit Settings whether the disks are present, and indeed they are correctly attached:
     - When the RP is in progress and the VM is turning on, check the presence of the disk on the VM via Edit Settings
     - Wait untill the test is completed
     - Check that the machine is up and running

  • Verify in "vSAN Object Health" there are no more objects in "reduced availability with no rebuild":
     - Verify that the VM is no longer present in the object list with "reduced availability with no rebuild"

  • Perform the test clean up and migrate the VMs back to the original Protection Group:
     - As soon as the Cleanup procedure is completed ...
     - ... and the VM is in Ready state, move it back to the original Protection Group

  • Re-check the configuration of the VMs with the Edit Setting (no disks connected, maybe they are managed by SRM and connected when required):

  • Perform a double check and make sure everything is working fine:
     - Once you have performed the above steps for all the VMs with the problem, you should see the "Cluster Health score" at 100% as shown in the image below



Reactivating the entire Recovery Plan would probably have solved the "Reduced availability without rebuild" issue.
However, this approach is more granular and aim to solve the problem of the single VM, without negatively impacting the performance of the entire target environment. It is not mandatory to proceed one VM at a time.
Obviously, it is possible to migrate multiple VMs simultaneously into the temporary Protection Group, power them on simultaneously via recovery plan and then bring them back into the original protection group once the problem has been resolved.

That's it.

martedì 7 maggio 2024

Always Trust Certificate on Microsoft RDP for MAC

Issue


After updating my macOS to Sonoma 14.4.1 (23E224) I can no longer connect to Windows machines via "Microsoft Remote Desktop" version 10.9.6 (2188).

I click on the remote connection, and then "Continue" as usual....
... the connection change into connecting state, but it hangs ...
... and no connection will be established to the remote PC.

Solution


Open the connection, then click "Show certificate" and expand "Trust"
change options from "Use System Defaults" ...
... to "Always Trust".
Confirm the changes by entering the system password ... and click continue to establish the remote connection.
If it doesn't work right away. Close the "Microsoft Remote Desktop" application, reopen it and try connecting again. It should work without even showing the certificate.

That's it.

martedì 23 aprile 2024

[vCenter] - How to connect to vCenter via Rest API

Issue


Recently I had to create a script that grab some information from vCenter Server via Rest API calls. To do so, I had to create a few lines of code to authenticate on the vCenter, obtain a sessions and the c. Let's see bellow how does it works ....

Solution


The script must run on an Ubuntu machine, so I decided to make a bash script and use cURL. Information regarding api call, can be found at the following link https://developer.vmware.com/apis
More specific information regarding, for example, how to list the VMs already present on the inventory, can be found here.

First of all we have to create a session with the API. This is the equivalent of login. This operation exchanges user credentials supplied in the security context for a session token that is to be used for authenticating subsequent calls. To authenticate subsequent calls clients are expected to include the session token. For REST API calls the HTTP vmware-api-session-id header field should be used for this.

The call looks like this:
curl -sk -u username:password -X POST https://{vCenter}/api/session
The authentication, can be also be passed in a Base64 encoded value of username:password as header parameter, like this:
echo -n 'username:password' | base64
in a single line of code:
curl -ks -H "Authorization: Basic `echo -n 'username:password' | base64`" -X POST https://{vCenter}/api/session
For the rest API calls we can use the returned vmware-api-session-id.

Assembly the script all together it looks like the following :
#!/bin/bash

VC=192.168.1.90
ADMIN=administrator@vsphere.local
PASSWORD=VMware1!

Session_ID=`curl -sk -u ${ADMIN}:${PASSWORD} -X POST https://${VC}/api/session`

# Request sent through session ID
curl -ks -H "vmware-api-session-id: ${Session_ID:1:-1}" https://${VC}/api/vcenter/vm
Outcomes bellow

That's it.

giovedì 11 aprile 2024

[MS Windows] - How to extend trial version

Issue


Working within a LAB and/or in a nested environment often requires the deployment a Windows Server Machines (for instance, for a local services such as Active Directory, DNS, NTP and so on).
It may happen that the tests have not been completed, but the Windows trial period has expired.
Below I'm reporting few steps to extend the Windows Trial Period for other 180 days.

Solution


It is possible to "rearm" MS Windows (in my case Server 2022), extending the Trial Period of 180 days for 6 times each time. Let's see how to do that ...

Taking a look at the desktop, we can see the countdown in the corner down right. In my case, the period is expired.
Let's "run as administrator" a powershell, and run ....

slmgr -dlv
.. as we can see above, we still have 6 shots (Remaining Windows rearm count).

.. then, we rearm Windows .. running the command below and verifying that the command complete successfully ..

slmgr -rearm
.. we must restart the server, performing in powershell "Restart-Computer" or ..

shutdown /f /t 0 /r

When the Server will be newly up and running let's check that everything worked fine, performing the following commands...
slmgr -dli
slmgr -ato
slmgr -dlv

That's it.

[Nested ESXi] - How to properly configure it

Issue


Working within a LAB and/or in a nested environment often requires the deployment of new ESXi hosts. The fastest way to have new ESXi hosts deployed quickly is to clone them from a master VM.
Below few steps to follow to properly create a working VM clone in a nested environment.

Solution


  • First of all, we do a normal installation in a neste environment of our ESXi master VM, giving to it minimal resources, as:
    - 2 vCPU
    - 8 GB of RAM
    - 20GB of Disk (Thin Provision)
    - 4 vNIC

  • When started, we get into ssh and we change the VMkernel MAC address by running the following command:

    esxcli system settings advanced set -o /Net/FollowHardwareMac -i 1

  • To have the unique UUID on each host, we need to delete the "/system/uuid" record stored in /etc/vmware/esx.conf. To do this we can edit the file and delete the corresponding line, or launch the command below, which replaces the corresponding line with an empty line.

    sed -i 's/^\(\/system\/uuid\).*//' /etc/vmware/esx.conf

  • It is possible to shutdown the ESXi master VM and convert it to a template, or leave it as is to be cloned later when needed.

  • When needed, we can clone the master VM to deploy our ESXi nodes. On starting up of the VM, a new UUID will be generated.
    Once the ESXi VM clones is pawered on, we can change network settings on them (IP addresses, hostnames, etc.).

  • Let's generate a new certificate, typing the following command:

    /sbin/generate-certificates

  • Let's restart the following services, in order to make the host ready for our LAB.

    /etc/init.d/hostd restart && /etc/init.d/vpxa restart

That's it.

venerdì 23 febbraio 2024

NSX UI does not load information

Issue


NSX UI does not load for one manager node holding the VIP
NSX Version 4.1.2.1.0.22667794

Error message:
Feb 8, 2024, 3:22:39 PM : Error: Failed to fetch System details. Please contact the administrator. Error: 400 : "{<EOL> "details" : "SEARCH_FRAMEWORK_INITIALIZATION_FAILED, params: [manager]",<EOL> "httpStatus" : "BAD_REQUEST",<EO> "error_code" : 60525,<EOL> "module_name" : "nsx-search",<EOL> "error_message" : "Search framework initialization failed, please restart the service via 'restart service manager'."<EOL>}" (Error code: 513002)

Solution


In my case the solution was quite simple. I restarted the service manager on the NSX Manager appliance indicated by the VIP, as per the image below...

> restart service manager
... and it worked

That's it.