Massive TCP Retransmits with ESX 3.5 U5, HP ProCurve 3500yl and Equallogic SAN – UPDATED

I’m using serveral ESX 3.5 U5  connected to 2 HP ProCurve 3500yl.
When I monitor my Equallogic SAN with SAN Headquarters 2.0 there are a lot of TCP Retransmits. Sometimes more than 3%. After some troubleshooting I discovered the TCP Retransmits are caused by the ESX Nic Teaming connected to the HP ProCurve switches.

Every ESX server has 2 NICs connected to the iSCSI for VMFS and iSCSI Initiators from the VM’s. Both NICs are connected to a different switch for redundancy. So for example ESX NIC 1 is connected to port 1 from switch X and ESX NIC 2 is connected to port 1 from switch Y. Port 1 from both switches are configured as a trunk  ( Trunk1 ) not using LACP. Between the 2 switches there is a different trunk configured for the data that needs to be directed to another switch.


I configured my ESX Nic Teaming like this.

As you can see 1 NIC is disconnected. This is done for testing. When only 1 NIC is connected the TCP Retransmits disappears.

Port 1 trough 4 on both HP ProCurve switches are used to link the 2 switches together. Here is the relevant config lines from the switch.

interface 1
name “Switch Trunk”
flow-control
no power-over-ethernet
exit
interface 2
name “Switch Trunk”
flow-control
no power-over-ethernet
exit
interface 3
name “Switch Trunk”
flow-control
no power-over-ethernet
exit
interface 4
name “Switch Trunk”
flow-control
no power-over-ethernet
exit
trunk 1-4 Trk1 Trunk
vlan 1
name “DEFAULT_VLAN”
no untagged 5-36,38,40,42,44,46,48,Trk1-Trk7
no ip address
exit
vlan 19
name “iSCSI”
untagged 5-36,38,40,42,44,46,48,Trk1-Trk7
exit
jumbo ip-mtu 8982
jumbo max-frame-size 9000
spanning-tree Trk1 priority 4
primary-vlan 19

Here is the config for the port used by the ESX server.

interface 47
name “VM-OZ-01”
flow-control
no power-over-ethernet
exit
trunk 47 Trk2 Trunk
spanning-tree Trk2 priority 4

Anyone some bright ideas how to resolve this problem?
When browsing the internet for config problems between HP ProCurve switches and ESX 3.5 I found the following websites. After reading this I came to the conclusion my config is the same as recommended here.

http://blog.scottlowe.org/2008/09/05/vmware-esx-nic-teaming-and-vlan-trunking-with-hp-procurve/
http://blog.scottlowe.org/2008/07/16/understanding-nic-utilization-in-vmware-esx/
http://universitytechnology.blogspot.com/2008/09/vmware-esx-vm-network-aggregation.html
http://blog.scottlowe.org/2008/09/05/setting-vmware-esx-vswitch-load-balancing-policy-via-cli/

UPDATE –

As it turns out HP does not support teaming over 2 switches ( yet ).
This means the ports in Trunk 1 of switch x does not share the MAC adresses with the ports in Trunk 1 of switch y.

So there are 2 solutions left.

  1. Connect both NICs from the ESX server on 1 HP switch.In VMWare configure the VSwitch with both NICs as active.  Setup a trunk on this switch with 2 ports.
    Load balancing works, but no redundancy.
  2. Connect 1 NIC to switch x and 1 NIC to switch y. In VMWare configure the VSwitch with 1 NIC as active and 1 NIC as standby. Setup a trunk in not necessary.
    No Load balancing, but you will have redundancy.

I settled for now with solution 2.
Monitoring the NICs from my ESX servers with Cacti to monitor the load from the NICs.

Still searching for a solution with both load balancing as redundancy.

Posted in VMWare and tagged , .

One Comment

  1. Have you tried setting up the pair of 3500’s as a mesh? My understanding is that this allows the switches to share mac’s and other information regarding trunks. This was the direction I was hoping to go with a pair of 5400’s.

Leave a Reply