Problem solve Get help with specific problems with your technologies, process and projects.

Data deduplication technology in enterprise wide area networks (WANs)

How does data deduplication technology work in the setting of enterprise wide area networks (WANs)? Find out in this tip and learn what a WAN manager needs to know about WAN deduplication.

Data deduplication technology, the ability to identify and eliminate redundant data segments, is being implemented everywhere, and the wide area network ( WAN) is no exception. Data deduplication means different things to different people (and vendors), though, as does its role in optimizing WAN segments. Enterprise WAN managers need to be prepared to understand how and when to leverage data deduplication on the WAN.

How deduplication works

Data deduplication technology is typically hosted on a server or appliance that is performing a task like storing backup data. Most deduplication devices do this by sectioning a file into sub-file segments ranging from 4 KB to 64 KB. These segments are then processed by an algorithm that generates a unique hash code for each segment. The code is unique to that data segment. It can be thought of as similar to a fingerprint. As new data segments are processed, they are passed through this same algorithm. If the algorithm generates an identical hash, the device knows that it has stored that data before and just creates a reference to the data instead of storing it a second time. The result can be a significant saving in space.

Further reading on data deduplication in wide area networks
See how data deduplication works in this section of the data storage handbook.

In this FAQ, learn about WAN acceleration in disaster recovery.

The goal of deduplication, no matter where it is implemented, is to reduce the amount of data that needs to be handled by a given process. Many IT professionals consider deduplication a storage technology only, when, in fact, it can be implemented in multiple areas of the data center.

Data deduplication technology in wide area networks

One of the best uses of data deduplication is on the wide area network (WAN). The role of data deduplication over the WAN is to allow WAN segments to be more efficiently utilized in order to delay or even prevent the need to purchase additional and expensive WAN bandwidth. A WAN optimization device with deduplication capabilities will have a local cache on both the sending and receiving ends of the connection. If either end (through the process described above) calculates that the data has already been sent to the other site, then only the reference information -- not the entire data set -- is sent. This can dramatically reduce the amount of traffic that needs to traverse the WAN.

Typically, WAN deduplication devices are used to speed performance of remote office locations. For example, instead of placing a file server in each location or an email server, just the WAN deduplication device is installed. Then, for example, if an email with a large attachment is sent more than once, the repetitive attachments are pulled from the WAN deduplication cache rather than across the WAN segment multiple times.

Storage deduplication devices, today, are often single purpose. For example, they may deduplicate only backup data. WAN deduplication devices can be leveraged across multiple purposes. The email attachment mentioned above may eventually be stored to a file share, and it may also be copied again as part of the backup process. With WAN deduplication, the copy is sent only once, multiplying the savings.

networking pointer Continue reading part 2 of this article to learn about a WAN manager's role in storage.

About the author:
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the U.S., he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.

This was last published in April 2010

Dig Deeper on WAN technologies and services

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.