Objective 2.4 - Manage Failover and Failure Detection Print E-mail
Written by Matthijs van den Berg   
Sunday, 25 January 2009 19:38

KNOWLEDGE

  • Describe how to map port groups to physical NICs
    Before your virtual machines can access external networks via a port group in a Virtual Switch this vSwitch must be connected to the physical network. To do so you must map physical NICs  to a vSwitch to allow external traffic. Settings for the vSwitch will be valid for all Port Groups in that vSwitch. It is possible to change the active NIC on a per port group basis. This allows you to tweak performance settings. Example; you can dedicate a NIC to a port group allowing all bandwidth of a NIC for this port group without interference from other port groups.  When two NICs are configured using a load balancing mechanism this load balancing only applies to outgoing traffic. More about “reversed load balancing” or “reversed teaming” under bullet “Configure reverse teaming” in this Objective.

    This can be done via the GUI or the CLI. When no NIC is attached to a port group this is best shown in the CLI. In the GUI you’ll have to open the properties of the port group, the drawing of the vSwitch shows connected.

    GUI:
    • Select an ESX server
    • Go to the tab Configuration
    • Select Networking
    • Select “Properties” of a vSwitch
    • Select the vSwitch configuration and click “Edit”. The following screen appears
      24-vswitch_properties
    • To change the NIC setting as desired you must add multiple NICs to a vSwitch and make 1 NIC active and one NIC standby.
    • Close this open dialogs
    • Select “Properties” of a vSwitch
    • Select a Virtual Machine Port Group and click “Edit”
    • To  change the NIC failover properties of this port group go to the tab “NIC Teaming”
      24-portgroup_properties
    • Click “Override vSwitch failover order”
    • Change the NIC failover order as desired.
    • Close all dialogs by clicking OK.
  • CLI:
    The set the NIC failover settings on a vSwitch level use the following commands:
    To set the load balancing type use (example for Source ID Load balancing, other options are: loadbalance_ip and loadbalance_mac):
    vimsh -n -e "hostsvc/net/vswitch_setpolicy --nicteaming-policy loadbalance_srcid <vSwithch>"
    Pierre (see commends down below) found that I did not mention the option for explicit failover. I am very glad he eventually did find the switch to enable this. To set the nic teaming policy for expliced failover use:
    vimsh -n -e "/hostsvc/net/portgroup_set --nicteaming-policy=failover_explicit <vSwitch>" 
    To set the active and passive NICs per vSwitch you can use the following command:
    vimsh -n -e "hostsvc/net/vswitch_setpolicy --nicorderpolicy-active=<vmnic> 
    --nicorderpolicy-standby=<vmnic#> <vSwitch>"
    To add or delete a NIC for a port group you can use the following commands:
    Add a NIC to a port group:
    esxcfg-vswitch --add-pg-uplink <vmnic> -p <PortGroup> <vSwitch>
    Delete a NIC from a port group:
    esxcfg-vswitch --del-pg-uplink <vmnic> -p <PortGroup> <vSwitch>
  • Understand failover order for physical NICs and attached port groups
    Looks a lot like the thing described above here. You can determine the physical NIC failover order on a per vSwitch and per port group basis. The failover order determines what NIC will be primary active and what NIC is in standby mode. 

    When you have multiple active NICs you can distribute the traffic over these NICs by selecting a load balancing policy. Traffic can be load balanced by:
    • Originating Virtual Port ID
      Depending on source port of the traffic a NIC is chosen.
    • Route based on IP Hash
      Depending on the hash outcome based upon the IP address of the virtual machine source and destination IP a NIC is chosen.
    • Route based on source MAC hash
      Depending on the hash outcome based upon the MAC address of the virtual machine a NIC is chosen.
    • When you prefer to determine the active NIC yourself you can choose “Use explicit failover order” and set one active and one standby NIC.
      This can be configure by using the GUI option from the previous bullet (also exemption on port group level can be made). You can also use the CLI:
      vimsh -n -e "hostsvc/net/vswitch_setpolicy --nicteaming-policy loadbalance_srcid 
      <vSwitch>"
      You can change srcid to another loadbalancing type.
  • Explain options for detecting link failures
    This can be configured via the CLI or the GUI. When using the GUI follow the above procedure (with the screenshots) to modify the failover policy.
    • Detecting failover using link speed check
      Detect the failover using a link speed check. When the speed is not the configured value the a failover will occur. 
      vimsh -n -e "hostsvc/net/vswitch_setpolicy 
      --failurecriteria-check-speed=failurecriteria-check-speed”
      • The speed for link speed check method
        To set the speed for the obove mentioned method use the following command:
        vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-speed=<int>”
    • Detecting failover using link duplex check
      Detect the failover using a link duplex check. When the duplexity is not the configured value the a failover will occur.
      vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-check-duplex=<bool>”
      • The duplexity for link duplex check method
        To set the duplex failover criteria to full or half duplex.
        vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-duplex=<bool>”
    • Detecting failover using link error percentage
      Detect the failover using a percentage of error packages on a NIC. When the threshold is exceeded the failover will occur. 
      vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-check-error=<bool>”
      • The error percentage for link error percentage check method
        To set the percentage when a failover occurs:
        vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-error=<int>”
    • Detecting failover using the beacon
      Sends out and listens for beacon probes on all NICs in the team and uses this information, in addition to link status, to determine link failure. This detects many of the failures mentioned above that are not detected by link status alone. To detect a failover using beacon probing you can use the following command. 
      vimsh -n -e "hostsvc/net/vswitch_setpolicy --failurecriteria-check-beacon=<bool>”
      Example:
      vmware-vim-cmd /hostsvc/net/vswitch_setpolicy --failurecriteria-check-beacon 1 vSwitch0
  • Troubleshoot failover operations
    NIC failover and other issues are logged in the /var/log/vmware/hostd.log logfile. You can lookup the active NIC in the GUI screen (properties of a vSwitch or port group) of via the CLI using:
    esxcfg-vswitch -l

SKILLS AND ABILITIES

  • Use CLI commands to manage uplinks
    See the bullet “Describe how to map port groups to physical NICs” in this Objective for more information.
  • Configure failover order
    Use the GUI to configure:
    • Active Adapters
    • Standby Adapters
    • Unused Adapters
    • NIC promotion
  • Configure beacon probing
    Use the GUI to select Beacon Probing as the Load Balancing Policy.
  • Configure reverse teaming (http://lycos.dropcode.net/)
    Reversed teaming means that traffic originating from the switch also matches a load balance policy. In a VMworld presentation VMstates the following:
    • All ports need to be on the same switch (or stack or VSS) to create an aggregated link.
    • When Reverse teaming is configured the switch sends the broadcasts only once to one of the uplinks
    • The switches distributes the traffic over the uplinks
    • When using port ID or MAC hash based teaming, do not enable link aggregation on the switch
    • Only enable link aggregation on the switch when using IP hash based teaming.
  • Set advanced network failover options
    • Failover detection
      See the above text at the bullet “Explain options for detecting link failures”.
    • Failback
      Failback automatically sets the primary configured adaptor back as active when this comes back online after a failure. When changing between adaptors causes downtime in you network (ARP tables have to be rewritten etc.) then setting this option can be dangerous because restoring your network causes downtime.  You might want to determine the failback moment manually in this case.
    • Link state tracking (http://blog.scottlowe.org/)
      This is cool, but not really a VMware setting. When yourVMware ESX server uses 1 or more uplink switches these switches are connected to the core using a uplink, most likely via a port channel (Cisco). When the uplink switches looses their connection to the core the ports to the ESX server remain UP, the ESX server does not have a link down and therefore not a failover.

      To solve this Cisco switches (and maybe others as well) have a link state tracking option. When the port channel goes down all other ports on the switches are marked down as well resulting in the ESX server having a port down an failover over the NIC. This is especially useful in blade server environments win in-chassis blade switches. The above URL links to a really good article, so if you need to know more …

TOOLS