After I made a particularly snarky comment about an article that touted inter-data center VM mobility as the ultimate tool to reach the 100% availability heavens (this is why that argument is totally invalid), someone asked me why I don’t believe in workload mobility, disaster avoidance and follow-the-sun data centers. I am positive that some businesses have the need for all three of the above-mentioned functionalities, but I also...
know that live VM migration isn't the right tool for any of them.
Let’s focus on the most bizarre of the three ideas: using VM mobility to implement follow-the-sun data centers. The underlying business requirements are sound and simple – moving the servers closer to end users reduces latency and long-distance bandwidth requirements. Reduced latency also improves response times and throughput. However, you cannot reach this goal by moving virtual machines around data centers; you simply can’t move a running virtual machine over long-enough distances.
The maximum round-trip latency supported by vSphere 4.0 is 5 msec. While the timing requirements have been relaxed a bit in vSphere 5.0, the maximum round-trip latency is still 10 msec - way too low to implement the follow-the-sun model. After all, you need more than 100 msec to get from Central Europe to Ireland, let alone across the Atlantic.
Even if you were able to move a running VM between continents, you’d still face a number of other challenges. Bridging (the traditional mechanism used to support long-distance VM mobility) over such distances is out of question; most layer-2 protocols (like ARP) would time out when faced with round-trip delays measured in hundreds of milliseconds. You might be able to support the VM mobility with LISP, but even that approach has a number of drawbacks until someone implements LISP within hypervisor soft switches.
So, is it impossible to implement follow-the-sun data centers? Of course not. The Googles of the world solved the problem more than a decade ago using DNS-based load balancing (or anycast) between data centers and local load balancing within the data center. You can also use Amazon’s EC2 cloud and create elastic resources based on geographic load distribution. Both approaches do have one thing in common: they rely on properly architected scale-out applications.
In short, if would be nice if some of the high-level consultants took some time to check product data sheets and laws of physics (like the speed of light) before selling totally impractical marketectures, but I don’t expect that to happen any time soon.
About the author: Ivan Pepelnjak, CCIE No. 1354, is a 25-year veteran of the networking industry. He has more than 10 years of experience in designing, installing, troubleshooting and operating large service provider and enterprise WAN and LAN networks and is currently chief technology advisor at NIL Data Communications, focusing on advanced IP-based networks and Web technologies. His books include MPLS and VPN Architectures and EIGRP Network Design. Check out his IOS Hints blog.