Manage Learn to apply best practices and optimize your operations.

Live migration networking: Enter long-distance data center bridging

Live migration networking poses all sorts of challenges, including the potential need for long-distance data center bridging that results in traffic trombones. When it comes to server virtualization networking, is it possible to avoid the pitfalls?

Unless you’re running a huge computing cluster, you will probably deploy server virtualization to reduce operating...

expenses. That virtualization will introduce input-output challenges in the LAN (not to mention the SAN). Among those challenges will be live migration networking or vMotion networking that could require long-distance data center bridging, potentially destabilizing data center networks.

The hypervisor software that allows multiple virtual servers to run on the same physical hardware enables these servers to communicate amongst each other and with the outside world. To do this, most hypervisors include Virtual Network Interface Cards (vNICs) to emulate physical NICs, as well as virtual switches that emulate bridging (aka Layer 2 switching).

Sometimes you must run virtual machines belonging to more than one security zone on the same physical hardware. In that case, you would use virtual LANs (VLANs) to implement different security zones in a traditional (non-virtualized) data center. In a virtualized environment there is no difference; VLANs are implemented by the virtual switch.

Note: Until the virtual switches start using VLANs, the network administrators could pretend the servers running hypervisors were simple hosts like the traditional servers. With the introduction of VLAN-based switching in virtual switches, the physical servers have to be treated like any other Layer 2 networking device, blurring the demarcation line between the networking and server infrastructure.

Live migration networking

Years ago, VMware implemented vMotion -- live VM migration or the ability to move a running virtual machine between two physical servers (Microsoft Hyper-V and Xen support similar functionality). Live migration is heavily used to distribute the VM load across a cluster of physical servers, reduce the power consumption during non-peak hours and allow for planned maintenance. With live migration, VMs can be concentrated on a few physical servers while others are shut down as needed.

The beauty of live VM migration is its ability to retain user sessions, resulting in seamless operation that does not affect service availability metrics. To support this highly desired feature, virtual machines must retain their IP address and all their LAN and SAN connectivity.

Note: To improve your SLA metrics, ensure that the uptime is measured with TCP sessions, not with ping. The migration process might lose a few data packets (which will impact SLA metrics based on ping), but never enough to disrupt an already-established TCP session.

There are a few Layer 3 routing mechanisms that could help you reach this goal, but they either have to be implemented in the guest operating system (mobile IP or loopback interfaces) or aren’t reliable enough to guarantee sub-second convergence (Local Area Mobility implemented by Cisco Systems more than a decade ago). Data centers implementing live VM migration are thus forced to use bridging between source and destination physical server.

When you implement live VM migration within a cluster of physical servers, the “bridge everywhere” requirement is not too harmful. The clusters are usually quite limited in size. For example, the latest VMware release, vSphere 4.1, supports up to 32 physical servers in the cluster. When using high-density blade servers, 32 physical servers easily fit within a single rack and connect to a set of top-of-rack switches, making the bridging requirement largely irrelevant.

However, when server administrators want to move virtual machines between clusters (usually across the whole data center), the networking infrastructure starts feeling the strain. The networking team has to implement VLANs spanning a large number of switches and serving a large number of end-hosts. Stretched VLANs reduce the overall stability of the data center and may exacerbate the impact of aggressive worm infections (the infamous SQL slammer worm is a prime example). The disconnect between the physical architecture and the logical Layer 3 architecture also results in a large number of long-distance traffic flows (sometimes known as traffic trombones), as the migrated virtual machine stays pinned to the same Layer 3 neighbor (for example, the default router).

The best way to minimize the impact of live migration on the networking infrastructure is to restrict the VM migration to groups of servers that are physically close together in the data center. If you can’t achieve that goal (sometimes the unpredictable shifts in traffic prompt you to take more drastic actions), consider deploying data center switches that integrate easily with your virtualization platform. As one example, the VM Tracer embedded in Arista’s Gigabit Ethernet switches extends VLANs on the fly as VMware vCenter moves virtual machines between physical servers.


About the author: Ivan Pepelnjak, CCIE No. 1354, is a 25-year veteran of the networking industry. He has more than 10 years of experience in designing, installing, troubleshooting and operating large service provider and enterprise WAN and LAN networks and is currently chief technology advisor at NIL Data Communications, focusing on advanced IP-based networks and Web technologies. His books include MPLS and VPN Architectures and EIGRP Network Design. Check out his IOS Hints blog.

This was last published in February 2011

Dig Deeper on Server Virtualization Networking