I am not a big fan of long-distance vMotion and the associated traffic trombones. I am positive that some application developers and server administrators believe that moving a working virtual machine from one data center to another is a prerequisite for unlimited flexibility and high uptime necessary for quality of service and flexibility, but what they don’t consider are the risks that long-distance data center bridging imposes on the underlying network.
I wasn’t persuaded to change my opinion on long-distance data center bridging during a Packet Pushers podcast I did with virtualization veteran Scott Lowe on Layer 2 Data Center Interconnect (DCI). The only good reason Scott could give for long-distance data center bridging was disaster avoidance -- the ability to move workloads from a data center that is about to be shut down.
But if you have a mission-critical application, it’s probably running on multiple servers (for redundancy) and you (hopefully) already have a load balancer in front of those servers. In that case, it is safer to start new instances in the new data center (using new IP addresses belonging to the new data center), shut down instances in the old data center and use load balancers to slowly shift the load across the DC interconnection link. You might still get traffic trombones until you move all the interconnected servers and the primary LUNs to the secondary data center, but at least you don’t have to deal with the inter-data center bridging mess.
Using load balancers to avoid long-distance data center bridging
VMware-aware load balancers like F5’s BIG-IP can automatically adjust load-balancing pools based on vCenter information, placing preference for local machines over those in the distant data center. In this scenario, there is no long-distance bridging or fancy new untested technologies --just old time-tested recipes and a bit of orchestration (the magic juice that makes this approach work).
If your answer is that you don’t have load balancers or the budget to buy them, how can you claim the applications are mission-critical and must be moved intact? Or maybe it’s one of those legacy applications designed in the last millennium that can only run on a single server. Last I heard, 80% of service failures were caused by software and human error, the other 20% by various hardware failures. Yet some people still have the guts to claim an application that cannot run on more than a single server is mission-critical. That is only justification for application folks to be able to move their precious applications between data centers without touching or changing the outdated environment they use.
If you ever encounter a situation like this (and I’m positive you will), make sure you make your position very clear: Long-distance bridging is potentially dangerous and can bring down not one, but two (or more) data centers. Then let the CIO decide which risk he’s willing to take: short application downtime or potential network infrastructure meltdown.
About the author: Ivan Pepelnjak, CCIE No. 1354, is a 25-year veteran of the networking industry. He has more than 10 years of experience in designing, installing, troubleshooting and operating large service provider and enterprise WAN and LAN networks and is currently chief technology advisor at NIL Data Communications, focusing on advanced IP-based networks and Web technologies. His books include MPLS and VPN Architectures and EIGRP Network Design. Check out his IOS Hints blog.
This was first published in February 2011