Issue
Recently I faced out on this issue:
Recommended Action 1. If a lot of VMs are accommodated along with edge by the hypervisor then edge VM might not get time to run, hence the packets might not be retrieved by hypervisor. Then probably migrating the edge VM to a host with fewer VMs. 2. Increase the ring size by 1024 using the command `set dataplane ring-size tx
Solution
Checking for recommended actions ...
edge-bridge-cluster3-A> get dataplane | find ring Wed Nov 20 2024 UTC 12:16:02.831 Bfd_ring_size : 512 Lacp_ring_size : 512 Learning_ring_size : 512 Livetrace_ring_size: 512 Rx_ring_size : 4096 Slowpath_ring_size : 512 Tx_ring_size : 4096... rx and tx ring_size was already at 4096. Looking for flow-cache ...
edge-bridge-cluster3-A> get dataplane flow-cache config Wed Nov 20 2024 UTC 12:16:50.970 Enabled : true Mega_hard_timeout_ms: 4955 Mega_size : 262144 Mega_soft_timeout_ms: 4904 Micro_size : 262144... we saw that the value can be incremented up to 524288. We incremented it and restarted the dataplane service
edge-bridge-cluster3-A> set dataplane flow-cache-size 524288 edge-bridge-cluster3-A> restart service dataplane edge-bridge-cluster3-A> get dataplane flow-cache config Wed Nov 20 2024 UTC 12:25:38.810 Enabled : true Mega_hard_timeout_ms: 4955 Mega_size : 524288 Mega_soft_timeout_ms: 4904 Micro_size : 524288What does flow-cache do?
Flow Cache helps reduce CPU cycles spent on known traffic flows. NSX Edge node uses flow cache to achieve high packet throughput. This feature records actions applied on each flow when the first packet in the flow is processed so that subsequent packets can be processed using a match-and-action procedure.
When the key collisions rates are high, increasing the flow cache size help process the packets most efficiently. However, increasing the cache size might impact memory consumption. Typically, the higher the hit rates, the better the performance.
After this change I also proceeded to free up the host where the Edge was running by migrating the VMs elsewhere.
This was enough to solve my problem. The VMs on the bridged segments became newly available.
Looking around; we saw a nice analysis done by Giuliano on a similar issue at this link.
Further helpful information on the same topics can be found at the following links:
https://docs.vmware.com/en/VMware-Telco-Cloud-Platform/3.0/telco-cloud-platform-5g-edition-data-plane-performance-tuning-guide/GUID-64EEE4A0-23C1-49DB-AE4D-F235F8AB8EAB.html
https://knowledge.broadcom.com/external/article/330475/edge-nic-out-of-receive-buffer-alarm.html
https://knowledge.broadcom.com/external/article?legacyId=80233
That's it.