VMware vSAN™ is VMware’s hyper-converged storage solution. It has quite a few different methods of providing resilience for the data hosted in it, but these all require some careful consideration when designing the cluster. In this blog I take a look at the configuration options and the design repercussions.
Storage Policy with respect to Resilience
All the approaches are ultimately defined by the VMware vSphere Storage Policy definitions. These policies are defined in VMware vCenter and are applied to objects (usually VMs, but can be individual disks, VM configuration files), subject to the compatibility with the underlying vSAN cluster. However, there are limitations.
The most basic, default policy is RAID-1 configured with a Failures to Tolerate (FTT) of 1 – classic mirroring. In vSAN, this means a complete mirror of the given object, creating two components. In addition, there’s a smaller Witness component. This means we must have a minimum of three nodes in a cluster (though four would be recommended for maintenance and failover). We can up the ante here by increasing FTT value, but while resilience improves there is an obvious impact on capacity.
We can also look at using Erasure Coding to give us RAID-5/6 protection. This takes our object and breaks it out into components, with the data striped across them for resilience. From a configuration perspective, all that separates them is that the FTT value of 1 means RAID-5, while FTT of 2 is RAID-6. However, this does impact the number of components created. In RAID-5, for any object, we have three components created with data striped with a parity bit across them. So rather than the 200% footprint as with RAID-1, we only need 133% instead and get the same resilience, though at a performance cost. RAID-6 requires a fourth object and an extra parity bit, so the footprint is in between at 150%. In either case, we still have a witness component too, so RAID-5 needs a minimum of 4 hosts (5 recommended) and RAID-6 needs 5 hosts (6 recommended).
Fault Domains
Of course, physically our clustered hosts are usually installed in one or more physical racks. Ideally, we’d spread the nodes across racks to mitigate the loss of a rack. vSAN provides Fault Domain functionality to provide an awareness of these physical boundaries.
Consider a Fault Domain as a grouping of hosts within the cluster to represent a boundary of resilience to ensure that vSAN objects subject to a Storage Policy are deployed across the domains, so providing resilience to a loss of a domain (i.e. the loss of a rack) outage.
However, it’s not quite as easy as slinging your hosts into a pair of racks and setting up two zones. As stated above, even RAID-1 requires three hosts due to three objects. This has a knock-on impact when defining Fault Domains too. For RAID-1 based policies, the number of Fault Domains can be represented by the formula (2 x FTT)+1, while RAID 5/6 is (2 x FTT)+2. Therefore, RAID-1 at a minimum requires three domains while RAID-6 would require at least six domains (6 = (2 x 2)+2).
This obviously has limitations as to how useful the feature is when mapped to physical racks – Are you going to spread a small 6-node cluster across six racks to achieve six domains? Or, perhaps across two racks and configure 6 zones, but you could lose three in the event of a rack failure so negating the effort of configuring Fault Domains in the first place.
We now see that the scale of the cluster and physical placement are important factors when deciding whether to use Fault Domains.
Stretch Clusters
This approach is designed to be used across different locations (latency permitting) and largely builds on the concept of Fault Domains. Here, we have two defined Fault Domains (a Preferred and Secondary). Alongside these, we also deploy a Witness appliance in a third location – this serves as a third Fault Domain. We also break out the Failures to Tolerate into two values.
- Primary Failures to Tolerate – Refers to Site level failure and can be a value of zero (no resilience, all data resides on either the Preferred or Secondary site) or 1, where the data is replicated between the two domains in the manner of RAID-1.
- Secondary Failures to Tolerate – Analogous to FTT in a single site, this is the resilient behaviour within the single site and so can be configured as RAID-1 or RAID-5/6 with various FTT values.
The PFTT value effectively sits above the SFTT value, resulting in the following:
Logically speaking, for a given object, such as a VMDK, the Primary value essentially pushes a RAID-1 mirror across each site, with a Witness component in the appliance. The Secondary element in concert with the RAID setting defines the policy within a site, here a straightforward RAID-1 mirror.
For the numbers of hosts, we therefore need to adhere to the needs of the local policy (be that RAID-1, 5 or 6) and double-up for the second site.
As this resides in a stretch vSphere cluster, we also need to build in some stickiness with respect to compute – so designing DRS and HA accordingly to match the preferred sites.
Closing Thoughts…
When embarking on a vSphere with vSAN design, one should consider not just the compute factors when deciding the number of hosts, but what level of storage protection will be required for the VMs. In fact, by using the protection as a starting point (i.e. ‘We want RAID-6, so a minimum of five hosts is recommended’), the RAM and CPU can be defined above that – whether scaling upwards (five big hosts) or outwards (more, smaller hosts). Even the physical rack design might be a determining factor – whether you can spread across racks or not.
If you’re about to embark on a VMware vSphere based hyper-converged solution, then Xtravirt can help. We have considerable experience in this area and can help with advisory, design and implementation services to create the right solution for your organisation. Contact us and we’d be happy to use our wealth of knowledge and experience to assist you.