martedì 26 gennaio 2021

NSX-T 3.0 - While deleting Logical Segment, it got stuck in “Deletion in Progress” state and grayed out from UI.

Issue
Today, I have been called by a customer to solve a rather unusual problem. The process of deleting a Logical Segment on an NSX-T 3.0 infrastructure it got stucked in "Deletion in Progress" state for days and grayed out from UI without therefore to have the possibility to take any kind of action.


I tried to look for further information regarding the LS stuck

I tried to verify if by editing, in some way it was possible to undertake and/or force cancellation operations ...

I switched to Manager mode to have a different point of view ....
I noticed that there was a "Logical Port" that could block the deletion process ...

open, I tried to delete it, but here too the DELETE botton was grayed out ...

Then I decided to deep investigate inside the NSX-T Manager console in the following way as detailed below....


Disclaimer: Procedures described below may not be officially supported by VMware. Use it at your own risk. Before to perform any action described be sure that you have a valid backup. The best way is to open a Service Request to the support.


Solution
First of all I logged in via SSH to NSX-T Manager with the admin user and then raised the permissions to the root user level.

I moved into the directory "/var/log/policy" ...
root@NSXM-01:~# cd /var/log/policy/ 
Checked the policy.log log file for information about the switch to delete. In my case (verifying the dates) the information I was looking for were present in the .1.log.gz file ...
root@NSXM-01:/var/log/policy# zcat policy.1.log.gz | grep -i <LS NAME> 

I therefore wanted to check the realized state. When you make a configuration change, NSX Manager typically sends a request to another component to implement the change. For some entities, if you make the configuration change using the API, you can track the status of the request to see if the change is successfully implemented.

The configuration change that you initiate is called the desired state. The result of implementing the change is called the realized state. If NSX Manager implements the change successfully, the realized state will be the same as the desired state. If there is an error, the realized state will not be the same as the desired state.

Below the command executed to check the Logical Segment in stuck and the output ...
root@NSXM-01:/var/log/policy# curl -k -u admin -X GET "https://localhost/policy/api/v1/infra/realized-state/realized-entities?intent_path=/infra/segments/LS-670-Bridge-DATADOMAIN"
Enter host password for user 'admin':
{
  "results" : [ {
    "extended_attributes" : [ {
      "data_type" : "STRING",
      "multivalue" : false,
      "values" : [ "/infra/tier-1s/T1-Gateway" ],
      "key" : "connectivity_path"
    }, {
      "data_type" : "STRING",
      "multivalue" : true,
      "key" : "l2vpn_paths"
    } ],
    "entity_type" : "RealizedLogicalSwitch",
    "intent_paths" : [ "/infra/segments/LS-670-Bridge-DATADOMAIN" ],
    "resource_type" : "GenericPolicyRealizedResource",
    "id" : "infra-LS-670-Bridge-DATADOMAIN-ls",
    "display_name" : "infra-LS-670-Bridge-DATADOMAIN-ls",
    "path" : "/infra/realized-state/enforcement-points/default/logical-switches/infra-LS-670-Bridge-DATADOMAIN-ls",
    "relative_path" : "infra-LS-670-Bridge-DATADOMAIN-ls",
    "parent_path" : "/infra/realized-state/enforcement-points/default",
    "unique_id" : "13fd4bf5-f067-4806-9317-93013928b0d0",
    "intent_reference" : [ "/infra/segments/LS-670-Bridge-DATADOMAIN" ],
    "realization_specific_identifier" : "11104acc-5aac-4e50-8689-c31492d455f4",
    "realization_api" : "/api/v1/logical-switches/11104acc-5aac-4e50-8689-c31492d455f4",
    "state" : "ERROR",
    "alarms" : [ {
      "message" : "Unable to delete logical port with attachments of LogicalPort LogicalPort/f5dc4a5b-f49f-465b-9832-86151b3670cf.",
      "source_reference" : "/infra/realized-state/enforcement-points/default/logical-switches/infra-LS-670-Bridge-DATADOMAIN-ls",
      "error_details" : {
        "error_code" : 8402,
        "module_name" : "NsxSwitching service",
        "error_message" : "Unable to delete logical port with attachments of LogicalPort LogicalPort/f5dc4a5b-f49f-465b-9832-86151b3670cf."
      },
      "resource_type" : "PolicyAlarmResource",
      "id" : "REST_API_FAILED",
      "display_name" : "a2759a0c-0c98-4e6b-abd0-c58ea326c5be",
      "relative_path" : "a2759a0c-0c98-4e6b-abd0-c58ea326c5be",
      "unique_id" : "fe05784c-3f69-4ca8-9a01-9df9e3750e8b",
      "_system_owned" : false,
      "_create_user" : "system",
      "_create_time" : 1611589503519,
      "_last_modified_user" : "system",
      "_last_modified_time" : 1611589503521,
      "_protection" : "NOT_PROTECTED",
      "_revision" : 0
    } ],
    "runtime_status" : "UNINITIALIZED",
    "_system_owned" : false,
    "_create_user" : "system",
    "_create_time" : 1610550767404,
    "_last_modified_user" : "system",
    "_last_modified_time" : 1611229785697,
    "_protection" : "NOT_PROTECTED",
    "_revision" : 42
  } ],
  "result_count" : 1
}
root@NSXM-01:/var/log/policy#
From the previous output it is evident that the Segment is in an ERROR state, because it is blocked by the logical ports which are still being attacked .....

"message" : "Unable to delete logical port with attachments of LogicalPort LogicalPort/f5dc4a5b-f49f-465b-9832-86151b3670cf."

Next step, is to attempt to delete the STALE Logical Port via UI. Owever, as we realized before is not possible to delete Logical Port via UI. The second option is to delete the logical port via API which fails too if you use vanilla Logical Port DELETE operation.
Before to delete it, I retrieve information about the Segment and perform NSX-T backup... in case they will be needed later....
root@NSXM-01:/var/log/policy# curl -k -u admin -X GET "https://localhost/api/v1/logical-ports/f5dc4a5b-f49f-465b-9832-86151b3670cf"
Enter host password for user 'admin':
{
  "logical_switch_id" : "11104acc-5aac-4e50-8689-c31492d455f4",
  "attachment" : {
    "attachment_type" : "BRIDGEENDPOINT",
    "id" : "1f4bd81c-83c6-43be-b690-a9aac5a6edcd"
  },
  "admin_state" : "UP",
  "address_bindings" : [ ],
  "switching_profile_ids" : [ {
    "key" : "SwitchSecuritySwitchingProfile",
    "value" : "47ffda0e-035f-4900-83e4-0a2086813ede"
  }, {
    "key" : "SpoofGuardSwitchingProfile",
    "value" : "fad98876-d7ff-11e4-b9d6-1681e6b88ec1"
  }, {
    "key" : "IpDiscoverySwitchingProfile",
    "value" : "64814784-7896-3901-9741-badeff705639"
  }, {
    "key" : "MacManagementSwitchingProfile",
    "value" : "1e7101c8-cfef-415a-9c8c-ce3d8dd078fb"
  }, {
    "key" : "PortMirroringSwitchingProfile",
    "value" : "93b4b7e8-f116-415d-a50c-3364611b5d09"
  }, {
    "key" : "QosSwitchingProfile",
    "value" : "f313290b-eba8-4262-bd93-fab5026e9495"
  } ],
  "ignore_address_bindings" : [ ],
  "internal_id" : "f5dc4a5b-f49f-465b-9832-86151b3670cf",
  "resource_type" : "LogicalPort",
  "id" : "f5dc4a5b-f49f-465b-9832-86151b3670cf",
  "display_name" : "f5dc4a5b-f49f-465b-9832-86151b3670cf",
  "_system_owned" : false,
  "_create_user" : "admin",
  "_create_time" : 1610731219571,
  "_last_modified_user" : "admin",
  "_last_modified_time" : 1610731219571,
  "_protection" : "NOT_PROTECTED",
  "_revision" : 0
}
root@NSXM-01:/var/log/policy#
We are now ready to perform forceful deletion of the Logical Port inside NSX-T database, in the following way ...
root@NSXM-01:/var/log/policy# curl -k -u admin -H "Accept:application/json" -H "Content-Type:application/json"  -X DELETE "https://localhost/api/v1/logical-ports/f5dc4a5b-f49f-465b-9832-86151b3670cf?detach=true"

Removed the blocking object, in our case the Logical Port, the deletion process of the Logical Segment was successfully completed by removing the Segment itself.
It takes a few moments for the deletion process to be completed ... after that, checking into Networking > Segments, the Logical Segment has gone.

That's it.

venerdì 8 gennaio 2021

VMware NSX for vSphere 6.4.7 not more available to download

Issue
Today I found out, by performing a compatibility check between the various versions of ESXi and NSX to upgrade a customer's farm, that NSX 6.4.7 version due to a serious bug has been removed from download in favor of 6.4.8 (release notes).

" What's New in NSX Data Center for vSphere 6.4.8
VMware NSX for vSphere 6.4.8 resolves a specific issue identified in VMware NSX for vSphere 6.4.7, which could affect both new NSX customers as well as customers upgrading from previous versions of NSX. See Resolved Issues for more information. ”


As you can see from the images below, version 6.4.7 is no longer present from the download ....


... and from the interoperability matrix tables as well.


Solution
It is a good idea to upgrade VMware NSX Data Center for vsphere to version 6.4.8 at least.

That's it.