NEW YORK -- Preparing a company for disaster, above all, involves planning, both in terms of technological redundancy and staffing. This was the overarching message from a panel of experts at CeBit America 2003.
When looking at a diagram of a company's network infrastructure and wide area network links, one should be able to put a finger over any device on the chart -- as if it were down -- and know that the network can keep running, said Gary Gunnerson, an architect and platforms/integration specialist with the McLean, Va.-based media giant Gannett Co.
As companies have grown more reliant on WAN links, more of them are adding redundancy to this connectivity through multihoming, the practice of physically connecting a host to multiple data links. Globally, 10,000 companies have turned to multihoming as a way to ensure that they can continue to operate if there is a problem with the Internet.
But simply paying for two separate Internet connections does not guarantee that problems will not occur. While most large providers, such as UUNet, have their own backbones, other second-tier providers may use a large provider's backbone. That means that, even though a company may have two service providers, its data is traveling on one path, Gunnerson said.
He said that IT managers should ask questions about how their primary and backup providers are routing information, so they know that their data will have separate routes in the case of a disaster.
Another challenge with multihoming is ensuring that data avoids bottlenecks on the networks. Border Gateway Protocol (BGP), which is commonly used to direct traffic over multi-homed connections, is great at recognizing when a connection is down, but it cannot recognize traffic bottlenecks on the Internet, said Brendan Hannigan, vice president of strategy at Sockeye Networks Inc., in Waltham, Mass. BGP uses criteria that does not take traffic into account; it judges the route merely by the number of hops and distance.
Hannigan recommended an approach -- not coincidentally, one from Sockeye -- called route optimization. This method judges traffic conditions on the network and helps to ensure that data takes the quickest route given the current conditions on the Internet.
Beyond WAN connectivity, companies have a multitude of scenarios to consider when it comes to disasters. Conference attendee Robert Graham, chief technology officer with Buffalo Grove, Ill., consultancy NuVista Technology Solutions, said that one of his clients, Bankers Trust (now part of Deutsche Bank), had very rigorous criteria. He said that the company, for which he does disaster recovery consulting, wanted to ensure that its IT operations could continue even if all of its personnel were unable to access its buildings.
Graham worked with the company to set up a second operations center in another location. Gunnerson has a similar facility for Gannett. However, Gunnerson's is an unmanned facility used only in emergency situations. Graham actually staffed this center and made it part of the daily functioning of the company. That way, he said, the people who work there know how to run all of the systems, so there will not be a knowledge gap in the event of a disaster.
As important as the redundancy of the technology is the separation of personnel. "If there is a disaster, a person's priority is not the company. In a disaster, people don't always make the best decisions," Graham said.
With a separate fully functioning facility with its own staff, he said, the workers are separated from the disaster and those affected by the disaster, and they're better able to work.
Conference attendee Matthew Lagana, vice president of security and telecommunications with MBIA Inc., an Armonk, N.Y.-based financial services company, said that, for his company, the No. 1 priority in the event of a disaster is people.
Having learned from the chaos that followed in the days after the September 11 terrorist attacks in New York and Washington, he said that his company has focused part of its disaster recovery efforts on employee tracking. The company created a Web site and an 800 number that employees can use to check in following a disaster. They can use both resources to find out news about the company and co-workers.
FOR MORE INFORMATION:
Read why you should beware of disaster recovery planning recipes