Configuring a UCS chassis with VMware VMotion

One of the big improvements in vSphere 5.0+ was the introduction of Multi-NIC VMotion.  According to VMware's best practices, the VMotion vmk ports should all be on the same subnet.  If you are configuring a UCS chassis this might not seem ideal to you.  Why is the single subnet best practices?  What difference does it make on a UCS chassis?  And what is Multi-NIC VMotion good for?  Keep reading to learn more...

The Multi-NIC VMotion feature in vSphere 5.0 and up allows for a single VMotion to occur across multiple vmk ports on both hosts.  It also allows multiple VMotions to occur over each vmk port.  If you are placing a host into maintenance mode, this give you a much quicker evacuation time.  Also, for very large VMs (64GB of RAM or more), being able to spread a VMotion across multiple physical network connections can allow for a quicker and more reliable VMotion.  The best part of Multi-NIC VMotion is that it does not require virtual distributed switches, link-aggregation, or Enterprise Plus licensing.  Setting up the feature is simple and should be very straightforward, but the UCS chassis from Cisco throws a bit of a monkey wrench into what are the best practices according to VMware.  

The UCS chassis has two IOM modules for all IO.  Each IOM module connects to it's own dedicated Fabric Interconnect.  The two Fabrics are not connected together, and in fact to connect the two you need to connect each to an upstream switch.  Cisco's vision is to have a 5k or 7k pair of switches upstream with stacking cables.  Fabric A connects to both switches and Fabric B connects to both switches.  You now have a fully meshed network, and all is well with the world.  But what about VMotion?

All traffic leaving a blade must go to the IOM module, up to the Fabric Interconnect and to its destination.  It doesn't matter if the destination is another blade in the same chassis or another chassis in the datacenter.  It still has to travel that path.  Fortunately, the links on the IOM module are all 10Gb, so you can have up to 80Gb of bandwidth on both IOMs.  A VMotion going up to the FI and back down is doing so at 10Gbps minimum, more likely 20Gbps or 40Gbps.  Traffic going from Fabric A to Fabric B however, needs to travel from the blade, up to IOM, to the FI, to the upstream switch, and then all the way back down.  That's latency and no guarantee that you are getting 20 or 40Gbps.

If you are planning a Multi-NIC setup, you are going to be placing vNICs on both Fabrics, but you are not going to get the best inter-chassis throughput if the traffic has to travel outside of the FIs and back in.  It would be better to keep the Fabrics isolated for VMotion traffic and minimize the number of hops the traffic has to take.  The best way to do this is to create two VLANs for VMotion traffic, one for each Fabric, and assign a different subnet to each.  This is not the recommended practice from VMware, but it does work without issue.

Why does VMware want you to put all the VMotion traffic on one subnet?  I think they have a few reasons, though this is just my own best guess.  First, I think they recommend it for simplicity.  The software will work all on one VLAN and subnet, so why complicate things further by adding more VLANs and subnets to keep track of?  The second reason is probably for failover and redundancy between hosts.  Let's say I have hosts A, B, and C.  Each host has two VMotion vmks called vmk1 and vmk2.  They are all on the same VLAN and subnet.  So any vmk can talk to any vmk.  If vmk1 on host A fails, vmk2 can still talk to both vmks on hosts B and C.  That keeps maximum bandwidth available for host A even with one downed vmk.  Even if host B loses vmk2, host A can still talk to host B from vmk2 to vmk1.  Life is good.  Of course, two NIC failures across two hosts is kinda unlikely, but it could happen.  Why not have all the vmks on the same subnet?

The case of UCS is a special case for certain.  And there are a few things you lose by making a design choice to separate VMotion traffic on two VLANs.  First, you lose some level of redundancy, though not much.  Second, you could potentially lose some bandwidth in the case that you have a NIC failure.  Third, you are no longer using VMware best practices which may catch you some flack if you have to contact support.  The gain though is a possible 160Gbps of throughput for a VMotion.  I think that's worth the risk, don't you?

Labels: , , , , , ,