Storage refresh key to VDI success

Background

Recently I’ve been involved in a VDI refresh for one of our customers, around 800 desktops using VMware’s vSphere and View products.  As with any VDI solution the success can only be attributed to careful planning and design as well as thorough understanding of the environment.

This blog post came to life after the initial workshops and subsequent discovery that a previous upgrade of the customer’s VMware View environment was no longer able to meet the adoption demands. A major pain point related to the performance of the infrastructure and virtual desktops for the users. A full assessment identified the storage architecture as one of the major bottlenecks.  Multiple RAID5 LUN groups (5 disks each) had been provisioned, with as many as 100 virtual desktops or ‘Linked Clones’ located on each datastore.  With the lack of spindles, IOPS, throughput and use of RAID5 with write penalty (x4), this all resulted in a less than desirable architecture and performance to handle the desktop workloads (generally a 20% read, 80% write I/O profile), which differ greatly from server workloads.

Objective

The VDI refresh project was granted additional funding to identify and implement a storage solution that would eliminate these performance pain points. The VDI provision was to remain through VMware and updated to Horizon View v5.2 but had to deliver a measurable performance within 10% of a native physical desktop. Administration overheads were to be reduced where possible by using a less complex infrastructure.

As a truly agnostic organisation we provided the necessary assistance and guidance in conjunction with our customer to ensure the technologies they were reviewing would be fit for purpose. As it transpired the technology and offerings provided by Tintri addressed our customer’s requirements.

In this blog post I wanted to share the initial decision making features followed by a brief overview of some of the product’s features that assisted us during the deployment.

Technology chosen

Following a successful proof of concept and extensive load testing, the Tintri 540 was selected. You can read more about the company and their offerings on their website but for this solution here are the key items identified:

  • NFS solution – Simple and can leverage existing Ethernet infrastructure and eliminate one of the previous problem points, VMFS locking and SCSI reservations.
  • Minimal configuration and setup required.
  • A self-optimising storage appliance without the overhead of manual tuning.
  • Comprised of 8  3TB disks and 8  300GB SSD (MLC), providing the required total capacity (13TB), and a good amount of flash to serve read and write I/O.
  • Instant performance bottleneck visualisation using real-time virtual machine (VM) and vDisk (VMDK) level insight on I/O, throughput, end-to-end latency and other key metrics.
  • Support for up to 1000 VMs providing enough capacity for day 1 and predicted future growth.
  • Supports up to 75,000 IOPS. All read and write I/O is delivered from flash and provides low latency performance for VMs.

Note: To achieve the highest possible VDI density Tintri appliances require the use of 10GbE connectivity between the appliance and core switching. The Ethernet-based infrastructure in this implementation consisted of dedicated redundant switches for storage traffic but running only at 1GbE. While greatly reduced this still permitted 80  virtual machines per ESXi host which was well within the design and capacity planned consolidation ratio.

Previous VDI

The initial deployment of the virtual desktop infrastructure was based on using MS Windows 7 serving approximately 700 users. The majority were ‘Linked Clones’ with less than 20 persistent desktops.  The virtual desktops remained powered on and controlled by each Horizon View pool policy to enable quick access and logon to each desktop. Various desktop workloads were in use however, none of these were extreme use cases in terms of I/O profile and were typically task workers, knowledge workers, with a small number of power users.

As with all our VDI engagements we completed an assessment of the physical and virtual desktops before proceeding. Identifying use cases and mapping these to the different pools to ensure the new environment was sized correctly and able to handle peaks and additional overhead.

The Tintri Dashboard

Here’s a quick overview of the management user interface, focusing on items used during the initial deployment to assist with performance measurement.

  • The dashboard provides real-time insight and monitoring. You’re able to drill down further into all of the metrics for further analysis and pull in metrics from VMware’s vCenter to provide deeper insight.
  • Within the Datastore performance sub-heading the main IOPS, throughput, latency and Flash hit ratio counters are presented in real-time (10 second average), and a 7 day range (10 minute average).
  • To the right hand side, you can view which VMs are ‘changers’, in terms of performance and space and by what degree of change.

Note: Other VM names have been removed from the screenshot to protect the customer’s data.

tintri1

The Diagnose, Hardware screen allows visibility into the status of the hardware for components such as disks, fans, memory, CPUs and controllers.

tintri2

Real world performance

IOPS  can be monitored in real-time, 4hrs, 12hrs, 1 day, 2 day or 7 days to a granular level that can even reveal details of a single I/O from any VM.

Using the Datastore chart you can click on different points to view specific offenders (such as VDI-T2-48 in the screenshot below) or hover over a point to bring up the data on screen.

During the first week of production the chart below reveals statistics for the deployed 715 virtual desktops (with a peak of 400 concurrent active sessions). The total IOPS generally remained under 4000 with bursts highlighted by various logon storms throughout the day. The dramatic peaks are largely due to replica VMs or maintenance (recompose) operations.

tintri3

Note: Horizon View Storage Accelerator (VSA) is enabled on each pool, which can dramatically decrease the number of read I/O that is required from the backend storage system. This feature caches common blocks across the desktops and serves them from a content based read cache (CBRC). This requires and consumes physical RAM (max size 2048MB) on each ESXi host.

You can read more about the VSA feature here.

IOPS versus throughput

The ability to compare two charts, side by side, proved to be very useful feature during the testing and go-live. In the screenshot below there’s a comparison between IOPS and throughput.  The total IOPS peaked at 10444 at 6:10 PM, with 8396 read I/O (shown in yellow) and 2048 write I/O (shown in blue).  The replica disk shown below, is contributing 13% to the overall total IOPS.

tintri4

Latency

Latency is a vital statistic to be aware of and monitor because it measures how long it takes a single I/O request to occur end to end, from the VM (guest OS) to the storage disks. If latency is consistently greater than 20 – 30ms then all round performance of the storage and virtual machines will suffer greatly.

In the example screenshot below, green indicates the latency occurring at the host (guest OS), rather than the network, storage or disk.  The total latency is 2.68ms, which results from the host (2.05ms), network (0.12ms), storage (0.51ms) and disk (0ms).  Maintaining consistent latency around this point will provide excellent end to end performance.

tintri5

Flash utilisation

This chart excerpt reveals the amount of I/O (read and write) that’s being served from the flash disks.  As can be seen, 100% is being served with only a couple of small drops to 98% meaning the best possible I/O performance is being delivered from flash rather than mechanical, spinning disk.

tintri6

Virtual Machines

Drilling into IOPS and throughput is all very well where forensic analysis and investigation is required, but what is interesting is how this correlates to virtual machines.

This screenshot is taken from a real-time graph. Virtual machines can be seen running on the same Datastore and usual ‘sort’ activities can be completed by clicking on the metric column headings. Double-clicking on a VM will display a graph, which allows historical data and ability to display two graphs side by side, perhaps comparing IOPS and throughput, for example.

tintri7

Contributors

On each of the graphs which are presented throughout the management user interface, you can observe ‘Contributors’ shown down the right hand side, allowing visibility into individual virtual machines and the contribution to the overall number of IOPS, throughput or latency.  Below we can clearly see a couple of Replica VMs recording high IOPS, a result of Linked Clones reading from the parent image (Replica) disk.

tintri8

7 day zoom – IOPS versus Latency

Taking advantage of the side by side view again, a 7 day view of IOPS and latency can be observed clearly revealing the peaks and troughs of IOPS throughout the week. In this example the majority of I/O activity on the Tintri storage is write based (shown in blue) which means the VMware View Storage Accelerator is really taking the initial hit and reducing the read I/O requirement from the storage.

Total end to end latency (host, network, storage & disk) remains consistently low (around 3ms) with the occasional spike which is to be expected. In the example screenshot below, green indicates the latency is occurring at the ESXi host (guest OS), rather than in the network, storage or on disk.

tintri9

Conclusion

For this project the Tintri storage appliance has proven it’s been able to deliver in terms of reduced management, with no additional performance tuning required and the capability to handle all workloads during peak periods. Performance monitoring evidences the I/O throughput is well within the device’s capability and delivered using a high flash percentage (that’s I/O served from SSD) with a low end-to-end latency. Virtual desktop performance has been validated to ensure it meets the initial requirement to be within 10% of native physical performance. The testing revealed in certain use cases, the virtual desktop performance exceeded that of physical performance.

In this write-up it’s very clear to see the time investment and due diligence completed by the customer provided a solid starting point and a contributing factor to the success of the project.

If you would like talk to us about accelerating your VDI platform, please contact us.