|
Objective 8.4 – Perform Basic Troubleshooting for Storage |
|
|
|
Written by Matthijs van den Berg
|
|
Thursday, 10 December 2009 23:02 |
Knowledge
- Identify storage contention issues
The first thing here is; what is storage contention. Storage contention is the battle of several, in our case, VMs for storage performance. A SAN is limited in performance, mostly limited by write I/Os of sometimes in bandwidth. This can cause a higher than usual latency before a write to disk is committed. This latency depends on many things like the performance of the array (duh), the network, the disks, synchronous replication to a second site, the amount used, etc. To effectively find the one issue that is causing delay in your SAN / NAS network might be quite a quest. Usually I p think SAN / NAS tooling is used to find the delays. It all starts with finding the cause of the contention. For this reason VMware a build Performance section in vSphere. To look at disk latency:
- Select and ESX host
- Select the tab “performance”
- Find the chart that says “Disk (ms)” It shows a chart like this one:

- Look at the milliseconds to see if your SAN has structural latency (congestion) issues.
You can also use ESXTOP on the command line to find latency information. Read more on ESXTOP here and look for davg / gavg / kavg. Also read my blof on how to log to a remote share for a longer period of time. VMware recommends smaller LUNs to reduce the contention of storage.
- Identify storage over-commitment issues
I could not find any official documentation. To look for overcommitted storage arrays look at:
- Latency
- Number of I/Os verses you storage array’s maximum
- The Queue depth used
- Etc.
- Identify storage connectivity issues
The way to troubleshoot this depends on your type of storage; IP based or Fibre Channel based. When you use an IP based storage solution, for example an iSCSI solution, you can use ping, esxping, trace route etc. to check the network layer. Do not forget to ping with a larger packet size when using jumbo frames to check if all the components in the network between you ESX host and the storage array (and including those two) support jumbo frame size (9000). To troubleshoot fibre channel you need to take a look at:
- LUN presentation at the storage array (presented to the correct ESX host (WWN Name))
- Zoning of the Fibre Channel Switches
- LUN masking on the ESX host, configuration of the HBA, are you using boot from SAN? Check the HBA BIOS, etc.
- Identify iSCSI software initiator configuration issues
In the previous step we checked the network layer of the iSCSI / NAS based storage layer. The problem might also be the (virtual) iSCSI adaptor it self. Some of the things you need to check:
- IP address
- Jumbo Frames
- Subnets configures for iSCSI
- VLAN ID
- Etc.
- Interpret Storage Reports and Storage Maps
The VMware vSphere interface produces many graphs and topology layout images for your convenience. These images can give you a good insight into the VMware performance and layout. To find thore reports:
- Select and ESX host or Virtual Machine
- Select the tab “Performance”
- Look at the graphs you find interesting.
- To interpret those graphs
- Look at the Disk (ms), more latency might indicate contention
- Look at the Disk (KBps) to see how busy you SAN is and relate this to the maximum performance of you SAN.
Tools
|