When disaster recovery planners at one financial institution set a 15-minute mean-time-to-recovery goal, nightly backups to a virtualized iSCSI storage area network (SAN) at a warm site seemed like the answer—provided, of course, that the block data replication process would finish by morning. But vendor marketing doesn't always match reality.
Even with a dedicated wide area network (WAN) link connecting TopLine Federal Credit Union's data center to a 15 TB Dell EqualLogic SAN at one of its branch offices, WAN replication always lagged days behind schedule, according to Colleen Jakes, director of information services for the Minneapolis/St. Paul-based credit union.
"It was supposed to happen every night and be up-to-date every night so, at the very latest, the data would be 12 hours old," Jakes said. "[But] you never knew really—did it replicate it all?—until Sunday. A weekly backup is what, logically, it worked out to be."
Because she had configured the WAN replication process to stop at the start of the business day, the SAN was always falling behind, Jakes said. Blocks of data that had failed to synchronize one night might receive more changes in the morning. That meant if disaster struck in the afternoon, any changes made in the last 36 to 48 hours could not be recovered, she said.
The backlog worsened every day as the WAN replication process fell further and further behind, Jakes said.
"It was pushing too much information through," she said. "It [became] a perpetual cycle of never getting through the replication."
Optimization cuts WAN replication cycle from days to hours
Prior to the WAN replication project, TopLine's branches connected to its data center with legacy point-to-point T1 connections. During a recent server virtualization initiative, Jakes replaced point-to-point branch connectivity with a shared Multiprotocol Label Switching (MPLS) network.
A single point-to-point T1 circuit remains, dedicated to backup, Jakes said. The SAN could only support one WAN connection, she said. Although the process was timed to avoid interfering with business traffic, Jakes said she wanted to minimize the risk of potential performance issues on MPLS by confining WAN replication traffic to its own T1 link.
"A point-to-point T1 is going to be faster than an MPLS any day," she said. "It's direct—it doesn't have to go through any hops from here to there. There's no hopping; there's no switching. So, it would be the faster of the two communication methods, but from my perspective... [I wondered], 'Why am I not using this MPLS as well—especially after hours?' We were looking at how we were going to do it."
Adding bandwidth wouldn't guarantee that the WAN replication timeframe requirement would be met, Jakes said. She conducted a proof of concept with Riverbed Technology's Steelhead WAN optimization appliances after getting strong endorsements for it from peers within other organizations and the systems integrator who deployed the credit union's VMware environment.
When Jakes was designing her disaster recovery plan with WAN replication, she did not anticipate having to deploy, support and spend thousands of dollars on WAN optimization appliances. But the risks associated with the alternatives were even more costly, especially given the regulatory compliance requirements TopLine faces as a financial institution, Jakes said.
"We were now able to use both network [connections]—both the point-to-point and the MPLS—so now I no longer have an MPLS sitting open but with no traffic in the evenings," she said. "And after a couple days to catch up [a month's worth of data], the replications now happen overnight in about two-and-a- half to three hours. It's a huge difference."
Optimizing WAN replication: It's all (well, mostly) about hardware
Networking pros considering WAN optimization for DR and WAN replication should evaluate vendors' various appliances with a slightly different perspective than they would for branch deployments, according to Andre Kindness, senior analyst at Forrester Research.
While compression engines and traffic-shaping capabilities remain important to optimizing WAN replication and other data center-to-data center traffic, the focus should be on finding a platform with robust hardware, Kindness said. Shelve the caching discussion for branch deployments, he said.
"In a data center-to-branch office [deployment], your bandwidth is a lot smaller, and you're dealing with a lot of unique traffic," Kindness said. "When it comes to data center-to-data center, your pipe is a lot larger ... and the data is flowing in one direction ... but your [appliance's] CPU and memory has to be a lot more robust and larger."
Riverbed's heritage in the WAN optimization market is end user-focused, with an emphasis on headquarters-to-branch deployments. It is not as well known for data center-to-data center scenarios—such as WAN replication and disaster recovery (DR)—as niche vendor Silver Peak Systems, which focuses exclusively on that market.
But Riverbed has built up its data center and DR portfolio over the years to remain competitive, including last year's release of Whitewater for optimizing public cloud storage services. In early 2010, Riverbed released the high-end Steelhead 7050 series of appliances, which have solid-state drives (SSDs) and more capacity to improve WAN replication for disaster recovery and data backup.
Although Riverbed isn't as visible or invested in the space as Silver Peak, such recent hardware and software updates have made Riverbed a valid contender in the data center market, Kindness said.
Let us know what you think about the story; email: Jessica Scarpati, News Writer.