|
Objective 2.4 - Manage Failover and Failure Detection |
|
|
|
Written by Matthijs van den Berg
|
|
Sunday, 25 January 2009 19:38 |
KNOWLEDGE
- Describe how to map port groups to physical NICs
Before your virtual machines can access external networks via a port group in a Virtual Switch this vSwitch must be connected to the physical network. To do so you must map physical NICs to a vSwitch to allow external traffic. Settings for the vSwitch will be valid for all Port Groups in that vSwitch. It is possible to change the active NIC on a per port group basis. This allows you to tweak performance settings. Example; you can dedicate a NIC to a port group allowing all bandwidth of a NIC for this port group without interference from other port groups. When two NICs are configured using a load balancing mechanism this load balancing only applies to outgoing traffic. More about “reversed load balancing” or “reversed teaming” under bullet “Configure reverse teaming” in this Objective.
This can be done via the GUI or the CLI. When no NIC is attached to a port group this is best shown in the CLI. In the GUI you’ll have to open the properties of the port group, the drawing of the vSwitch shows connected.
GUI:
- Select an ESX server
- Go to the tab Configuration
- Select Networking
- Select “Properties” of a vSwitch
- Select the vSwitch configuration and click “Edit”. The following screen appears

- To change the NIC setting as desired you must add multiple NICs to a vSwitch and make 1 NIC active and one NIC standby.
- Close this open dialogs
- Select “Properties” of a vSwitch
- Select a Virtual Machine Port Group and click “Edit”
- To change the NIC failover properties of this port group go to the tab “NIC Teaming”
- Click “Override vSwitch failover order”
- Change the NIC failover order as desired.
- Close all dialogs by clicking OK.
- CLI:
The set the NIC failover settings on a vSwitch level use the following commands: To set the load balancing type use (example for Source ID Load balancing, other options are: loadbalance_ip and loadbalance_mac):
vimsh -n -e "hostsvc/net/vswitch_setpolicy --nicteaming-policy loadbalance_srcid <vSwithch>"
Pierre (see commends down below) found that I did not mention the option for explicit failover. I am very glad he eventually did find the switch to enable this. To set the nic teaming policy for expliced failover use:
vimsh -n -e "/hostsvc/net/portgroup_set --nicteaming-policy=failover_explicit <vSwitch>"
To set the active and passive NICs per vSwitch you can use the following command:
vimsh -n -e "hostsvc/net/vswitch_setpolicy --nicorderpolicy-active=<vmnic> --nicorderpolicy-standby=<vmnic#> <vSwitch>"
To add or delete a NIC for a port group you can use the following commands: Add a NIC to a port group:
esxcfg-vswitch --add-pg-uplink <vmnic> -p <PortGroup> <vSwitch>
Delete a NIC from a port group:
esxcfg-vswitch --del-pg-uplink <vmnic> -p <PortGroup> <vSwitch>
- Understand failover order for physical NICs and attached port groups
Looks a lot like the thing described above here. You can determine the physical NIC failover order on a per vSwitch and per port group basis. The failover order determines what NIC will be primary active and what NIC is in standby mode.
When you have multiple active NICs you can distribute the traffic over these NICs by selecting a load balancing policy. Traffic can be load balanced by:
- Explain options for detecting link failures
This can be configured via the CLI or the GUI. When using the GUI follow the above procedure (with the screenshots) to modify the failover policy.
- Detecting failover using link speed check
Detect the failover using a link speed check. When the speed is not the configured value the a failover will occur.
vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-check-speed=failurecriteria-check-speed”
- Detecting failover using link duplex check
Detect the failover using a link duplex check. When the duplexity is not the configured value the a failover will occur.
vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-check-duplex=<bool>”
- Detecting failover using link error percentage
Detect the failover using a percentage of error packages on a NIC. When the threshold is exceeded the failover will occur.
vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-check-error=<bool>”
- Detecting failover using the beacon
Sends out and listens for beacon probes on all NICs in the team and uses this information, in addition to link status, to determine link failure. This detects many of the failures mentioned above that are not detected by link status alone. To detect a failover using beacon probing you can use the following command.
vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-check-beacon=<bool>”
Example:
vmware-vim-cmd /hostsvc/net/vswitch_setpolicy --failurecriteria-check-beacon 1 vSwitch0
- Troubleshoot failover operations
NIC failover and other issues are logged in the /var/log/vmware/hostd.log logfile. You can lookup the active NIC in the GUI screen (properties of a vSwitch or port group) of via the CLI using:
esxcfg-vswitch -l
SKILLS AND ABILITIES
- Use CLI commands to manage uplinks
See the bullet “Describe how to map port groups to physical NICs” in this Objective for more information.
- Configure failover order
Use the GUI to configure:
- Active Adapters
- Standby Adapters
- Unused Adapters
- NIC promotion
- Configure beacon probing
Use the GUI to select Beacon Probing as the Load Balancing Policy.
- Configure reverse teaming (http://lycos.dropcode.net/)
Reversed teaming means that traffic originating from the switch also matches a load balance policy. In a VMworld presentation VMstates the following:
- All ports need to be on the same switch (or stack or VSS) to create an aggregated link.
- When Reverse teaming is configured the switch sends the broadcasts only once to one of the uplinks
- The switches distributes the traffic over the uplinks
- When using port ID or MAC hash based teaming, do not enable link aggregation on the switch
- Only enable link aggregation on the switch when using IP hash based teaming.
- Set advanced network failover options
- Failover detection
See the above text at the bullet “Explain options for detecting link failures”.
- Failback
Failback automatically sets the primary configured adaptor back as active when this comes back online after a failure. When changing between adaptors causes downtime in you network (ARP tables have to be rewritten etc.) then setting this option can be dangerous because restoring your network causes downtime. You might want to determine the failback moment manually in this case.
- Link state tracking (http://blog.scottlowe.org/)
This is cool, but not really a VMware setting. When yourVMware ESX server uses 1 or more uplink switches these switches are connected to the core using a uplink, most likely via a port channel (Cisco). When the uplink switches looses their connection to the core the ports to the ESX server remain UP, the ESX server does not have a link down and therefore not a failover.
To solve this Cisco switches (and maybe others as well) have a link state tracking option. When the port channel goes down all other ports on the switches are marked down as well resulting in the ESX server having a port down an failover over the NIC. This is especially useful in blade server environments win in-chassis blade switches. The above URL links to a really good article, so if you need to know more …
TOOLS
|