Objective 2.5 - Administer Advanced Service Console Networking Configurations Print E-mail
Written by Matthijs van den Berg   
Tuesday, 03 February 2009 15:49


KNOWLEDGE

 

  • Define configuration options for VMkernel ports (same as chapter 2.3)
    • Peer DNS
      I never used this one, and the internet is not really a big help.  If anyone can help me to explain what a peer DNS is, please use the comment system! Never the less I give it a try:
      When a peer DNS is configured name resolving will be done via a peer. Hostname will be set via DHCP?
      You can use the following command to configure a peer DNS server:
      esxcfg-vmknic  -P|--peerdns
      VMware explanation: Set peer dns. If set the system will use the HostName, HostIPAddress Domain returned by DHCP. Valid only for DHCP
    • MTU
      The Maximum Transmission Unit (MTU) of IP packets can be changed in VMware from the default (1500) to maximum 9000 (Jumbo Frames).  Note that MTU sizes must be changed throughout the network including switches. Also not all hardware supports a larger MTU size than the default 1500, check with your vendor (NIC, switches etc.).  The different MTU must be configured via the CLI in two staps,  the following example show a configuration for Jumbo Frames:
      • First configure the MTU on a existing vSwitch
        esxcfg-vswitch -m 9000 <vSwitchName>
        optionally check this with to see that the MTU for the switch = 9000 with:
        esxcfg-vswitch –l
      • Secondly create a VMkernel interface with the following
        Use the following command to add a VMkernel NIC to a newly created portgroup with Jumbo Frame support:
        esxcfg-vmknic -a -i <IPaddress> -n <netmask> -m 9000 <port group name>
        This created the portgroup with the supplied IP settings and MTU size (in this case 9000). You can check if the MTU settings for the NICs are correct with:
        esxcfg-nics –l
      • Check the end-to-end configuration (including the switches and NICs) using a ping with a large payload:
        vmkping -s 9000 <IPofNASwithJumbo or IPofESXwithJumbo>
    • TSO
      TCP Segmentation Offload (TSO)
      TSO is enabled on the VMkernel interface by default, but must be enabled at the virtual machine level. TSO support through the Enhanced vmxnet network adapter is available for virtual machines running the following guest operating systems:
      - Microsoft Windows 2003 Enterprise Edition with Service Pack 2 (32-bit and 64-bit)
      - Red Hat Enterprise Linux 4 (64-bit)

      To enable TSO for a VM you must add a new NIC to the VM (possible replacing the exisiting one) if the current NIC is not a Enhanced vmxnet NIC. A new NIC comes with a new MAC address. Optionally you can lookup the existing MAC and use this for the new NIC.

      You can disable the Off-load function when creating a vmknic with the –t option in the esxcfg-vmknic command:
      esxcfg-vmknic -a -i <IPaddress> -n <netmask> -t <port group name>
  • Understand VMkernel routing
    Well, since the Kerel does not actually route traffic I think they mean that you can configure the default gateway for routing and additional IP routes here. VMware allows a default gateway for iSCSI, VMkernel and VMotion (when separated on different networks) but does not require one.  You can set the VMkernel default gateway / gateway of last resort with the command:
    esxcfg-route <DefaultGatewayIP>
    You can add additional or specific routes with the following command:
    esxcfg-route -a default <Subnet> <GatewayIP>
  • Troubleshoot VMkernel configuration issues
    You can troubleshoot the VMkernel in several ways. First most likely errors will show up in the logfile. All warnings are logged in the vmkwarnings log file

    You can also lookup many VMkernel setting using : (source: http://www.b2v.co.uk)
    esxcfg-advcfg

SKILLS AND ABILITIES

  • Inspect Service Console network configuration
    The service console is used for accessing the ESX server remotely via the CLI or the Virtual Center Server / Client. If the service console is not redundant and the network configuration is corrupt you need to trouble shoot from the command line. To do so the following commands can be used:
    • Lookup the IP address of the Service Console
      esxcfg-vswif –l
    • Lookup the default gateway for the service console:
      esxcfg-route –l
      Or echo the network config file
      cat /etc/sysconfig/network
      The file could look like:
      NETWORKING=yes
      HOSTNAME=esx001.domain.local
      GATEWAY=10.10.10.101
      GATEWAYDEV=vswif0
    • Lookup the vSwitch of the service console including MTU size, VLAN id and Uplink ports:
      esxcfg-vswitch –l
      Example output:
      Switch Name    Num Ports   Used Ports  Configured Ports  MTU     Uplinks
      vSwitch0       64          5           64                1500    vmnic1,vmnic0

      PortGroup Name      VLAN ID  Used Ports  Uplinks
      Service Console     0        1           vmnic0,vmnic1
  • Enable/Disable vswif interface
    The Virtual Switch Interface (vswif) is the virtual NIC or Service Console port in a vSwitch. This must provide an IP address when configuring a port to access the ESX server for management from the network. VMware recommends to add two Physical NICs to the vSwitch the Service Console is placed in for redundancy.

    To enable to a vswif interface you can use the esxcfg-vswif command with the option –e. Example
    esxcfg-vswif –e vswif0
    To disable a vswif interface youc an use the –s option. EXTREME CAUTION as you might disable the only interface on the system (even when redundant connected to the physical network – it is a virtual interface) and loos connectivity to you ESX server!
    esxcfg-vswif –s vswif0
    To set the IP address, subnet and portgroup name for the vSwitch you can use the following command
    esxcfg-vswif -i <IPaddress> -n <subnet> -p "Service Console" <vswifname>
  • Configure advanced service console networking
    • Redundant HA heartbeat (http://virtualgeek.typepad.comhttp://virtualgeek.typepad.com)
      Vmware ESX uses two hartbeats to monitor the ESX hosts, other hosts and network connectivity.
      • The inter-node heartbeats and synchronization that occurs BETWEEN ESX nodes in a cluster (by default every 5 seconds)
      • The node-to-isolation address heartbeats that are used to determine if the node is isolated from the rest of the cluster (by default every 15 seconds).
    • The hart beat uses the Service Console network as the network layer to connect to other devices. Based upon the results of the heart beat the ESX server chooses to go into isolation mode or not. When a servers goes into isolation mode the VMs running on that ESX host will be shutdown and restarted on another host. (HA functionality, must be enabled on cluster). Isolation response is started when:
      • A host has stopped receiving heartbeats from other cluster nodes
      • The isolation address cannot be pinged. The default isolation address is the ESX service console gateway
    • The default isolation response time is 15000 milliseconds (15 seconds). To change these values you can edit the advance networking properties (see further down).

      To make sure that you service console is connected to the network via multiple links you can configure the Service Consoles and Physical NICs in a couple of different scenarios:
      • 1 Service Console in 1 vSwitch with 2 NICs
        You configure 1 service console with one IP address for management. The redundancy is based upon the fact that two physical NICs are used, preferably connected to two different switches. When a NIC or switch fails the redundant part will take over the functionality. You can loadbalance between the NIC via the Port ID policy.

        To do so you can configure two NIC to the vSwitch via the GUI of via the CLI with the command :
        esxcfg-vmknic -a -i <IPaddress> -n <netmask> -m 9000 <port group name>
      • 2 Service Consoles in 2 vSwitches and a minimum of 1 NIC per vSwitch
        You can configure 2 service consoles divided over two vSwitches. The Service Consoles need to be in separate IP subnet (both service consoles share IP and MAC addresses!)

        When you choose to configure two service console networks for the same purpose you can decrease the Failure Detection Time (time to let HA kick in) en increase availability. (read more here: http://www.yellow-bricks.com/2008/01/14/service-console-redundancy/).

        To do this you need to (did not test myself, only one ESX server :-( ):
        • Create a second Service Console in a separate subnet and on a second NIC (see picture):

          25-vswitch
        • Configure the failover options in the HA properties.
          Next you need to make sure that when the primary network fails the second service console can ping its gateway of designated isolations address.

          The theory (could not test) should be that one service console fails, for example the default gateway router goes down, the second service console takes over.

          ? - I however do not know how virtual center handles this. How does Virtual Center respond when the primairy IP address is not pingable?


          Default the settings are as shown in the following example.
          25-advancedha

          You can add additional Advanced HA options here. The next options are one I found on the internet:

          das.isolationaddress / das.isolationaddressX
          Sets the address to ping to determine if a host is isolated from the network. If this option is not specified, the default gateway of the console network is used. This default gateway has to be some reliable address that is available, so that the host can determine if it is isolated from the network. Multiple isolation addresses (up to 10) can be specified for the cluster: das.isolationaddressX, where X = 1-10.

          das.usedefaultisolationaddres
          By default, HA uses the default gateway of the console network as an isolation address. This attribute specifies whether that should be used (true|false).

          das.failuredetectiontime
          Changes the default failure detection time (with a default of 15000 milliseconds). This is the time period when a host has received no heartbeats from another host, that it waits before declaring the other host dead.

          das.failuredetectioninterval
          Changes the heartbeat interval among HA hosts. By default, this occurs every second (1000 milliseconds).
    • Packet tracing
      Packet tracing of sniffing can be used to trace errors in network traffic. You need a network analyzer of sniffer to capture the network traffic and analyze this.  Depending on the network traffic you need to capture it depends where you need to place the sniffer. There are a couple of options:

      Capture the traffic in the ESX Service Console (n to 1)
      You can use the command tcpdump to dump all traffic (source, port destication, packet sequence number etc.) to the display or file. See for details further down in this objective.

      Capture traffic on the hardware switch (1 to 1)
      You can create a SPAN port on a (Cisco) switch to copy all traffic from one Ethernet port to anther. Doing so you can send all this traffic into a sniffer port and look for network problems in the traffic. You can also sniff Service Console traffic in this way.

      Capture traffic on the vSwitch (N to 1)
      You can capture all traffic on a vSwitch when placing the vSwitch in promiscuous mode. When this option in enables (default disabled) the vSwitch forwards all packets to all ports making it act like a HUB. You can create a VM with sniffing software to capture the network traffic on the switch.

      Capture the traffic on the virtual machine
      Another option is to capture the traffic entering en leaving the virtual machine. You can achieve this by installing sniffing software in the virtual machine. Note the when you connect to the server over the network this traffic is also captured and depending on the traffic you are looking for the might influence the readings.

      To sniff network traffic you can use one of the many commercial third party tools or one of the few good free available tools like Wireshark (in packet inspecting) (http://www.wireshark.org/ ) or NTOP (per layer 3 /4 protocol bandwith measurement) (http://www.ntop.org/).

    • CHAP authentication for iSCSI
      iSCSI traffic can be secured by using CHAP authentication. CHAP verifies identity using a hashed transmission. The target initiates the challenge. The secret key is known by both parties. It periodically repeats the challenge to guard against replay attacks. CHAP is a one-way protocol, but it may be implemented in two directions to provide security for both ends. Because the current version of the iSCSI specification defines the CHAP security method as the only must-support protocol, the VMware implementation uses this security option. However, bidirectional CHAP is not currently supported on ESX Server 3.

      To Configure iSCSI with CHAP on ESX you can use the GUI or the CLI
      GUI
      - Select a ESX host
      - Go to the tab Configuration
      - Select the iSCSI HBA
      - Click properties in the details pane
      - Click the tap “CHAP authentication”
      - Click configure and make the necessary adjustments.
      25-chap

      CLI
      You can lookup the CHAP authentication parameters with the following command:
      vmkiscsi-tool vmhba34 -A -m CHAP
      Example output:
      CHAP Authentication Parameters for Adapter vmhba34:
      Retries:       0
      Name:          matthijs
      Name length:   8
      Min Secret Len:0
      Max Secret Len:0
      To configure a iSCSI target with authentication you can use vimsh. Add a target:
      vimsh –n –e “<iSCSItargetname>
      Example:
      vimsh -n -e "hostsvc/storage/iscsi_set_name vmhba34 
      iqn.2009-01.local.b3rg.s3rver:storage.disk1.sys1.xyz"

      To configure thevimsh -n -e "hostsvc/storage/iscsi_enable_chap vmhba34 <chapUsername> <chapPassword>"
  • Configure hostname resolution
    • /etc/hosts
      It is possible to configure the hosts file for name resolution. The ESX server looks in this file for (reversed) resolution of hostnames. This is used for HA etc. To add servers you can add them by FQDN and shortname. Example:
      192.168.1.203        esx001.b3rg.local esx001
      192.168.1.204        vcs001.b3rg.local vcs001
    • /etc/nsswitch.conf
      This is the name service switch configuration file. The line “hosts:        files dns”  determines the lookup order, first local file, second DNS servers. This is the advised lookup order.
    • /etc/resolv.conf
      This file holds the DNS servers for name resolution and the local domain name. Example:
      nameserver 192.168.1.1
      nameserver 195.241.77.55
      search b3rg.local
  • Monitor traffic over a Virtual Switch
    • Bandwidth
      To monitor the traffic over a vSwitch you can use the solution written down in this chapter onder “Packet tracing”, “Capture traffic on the vSwitch (N to 1)”
    • Dropped packets
      You can use a vSwitch in promiscuous mode to monitor the traffic over the switch. Use a program like whireshark to monitor the traffic. Look if the sequence numbers of the packets match up.

      Another option is to use tcpdump to monitor traffic. When the command ends it will show how many packets where received by the filter and how many were dropped by the kernel. (See example at the next bullet).
  • Identify and resolve network issues using network monitoring tools
    • Tcpdump
      You can use tcpdump in the CLI to capture packet information. Example:
      tcpdump -i vswif1 host 192.168.1.203
      will provide the following output (partial):
      15:25:32.097177 esx001.b3rg.local.ssh > 192.168.1.189.50315: P 2911248:2911632(384) 
      ack 2209 win 8712 <nop,nop,timestamp 2410764 1064842821> (DF) [tos 0x10]
      15:25:32.097666 192.168.1.189.50315 > esx001.b3rg.local.ssh: . ack 2893584 win 65535
      <nop,nop,timestamp 1064842821 2410761> (DF) [tos 0x10]
      15:25:32.097676 esx001.b3rg.local.ssh > 192.168.1.189.50315: P 2911632:2912016(384)
      ack 2209 win 8712 <nop,nop,timestamp 2410764 1064842821> (DF) [tos 0x10]
      15:25:32.098015 192.168.1.189.50315 > esx001.b3rg.local.ssh: . ack 2893968 win 65535
      <nop,nop,timestamp 1064842821 2410761> (DF) [tos 0x10]
      15491 packets received by filter
      18 packets dropped by kernel
    • Snoop
      TODO

 

TOOLS

  • CLI
    • esxcfg-vswif
    • dig
    • netstat
    • route
    • nslookup
    • hostname
    • vmknic
    • esxcfg-route
  • VI client