BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Do you know what type of traffic is flowing between data centers over your wide area network (WAN)? How about understanding which storage applications and data most affect your WAN links?
Being aware of different storage-related application characteristics enables smart decision-making for sizing data-center-to-data-center WAN links. An important ingredient in making that decision for both current and future storage needs is to keep in perspective such factors as bandwidth or throughput, as well as response time (or latency), and activity (transactions, I/O operations, messages).
Analyzing storage-related traffic
Get to know your storage-related traffic and applications using sniffer or other analysis tools. Talk with server and storage I/O admins or architects. Use analysis tools to determine if there are large sequential reads and writes or smaller transfers where latency would be more of a focus. This also means determining if the reads and writes are in support of synchronous or asynchronous operations like copy, rsync, replication, remote mirroring or distributed parity and dispersal data protection.
Are you seeing saw-tooth-like spikes that last for a few minutes (or more) every 15 or 30 or 60 minutes? If so, do you know what is causing them and where they're coming from? The spikes could be an application, database, file system or operating system either doing a snapshot, moving a log file or performing some other similar sync, migrate, replication or copy function that you need to be aware of.
Data coherency and consistency integrity is another important characteristic of data storage applications. This means that 100% of the data must be intact or loss-less 100% of the time, as opposed to applications that can tolerate some level of loss (e.g., drop a bit or byte here or there), such as when streaming video for playback (e.g., pixelization).
Know your storage-related applications, too, such as Hadoop
A quick summary of some common storage-related applications include big data analytics like Hadoop, statistical analysis software and other traditional warehouse approaches. These applications can result in large files or data sets and objects being moved between data centers (which require a great deal of bandwidth). On the other hand, if you are running them in a distributed or wide area cluster mode, the traffic pattern may be smaller and bursty, depending on the work being done. Archiving for data preservation and backups for business continuity or disaster recovery results in large bursts of sequential traffic. For remote file or data access, system-to-system replication -- for applications like databases and email -- will need some bandwidth, albeit with a sharper focus on lower latency.
For other applications, there will be more of a focus on bandwidth and stream bit-rate compared to those that have sensitivity to lower response times. Know your network and the applications that use it.
Another step is to identify workload simulation or test tools -- as well as metrics that include bandwidth use, activity, response time, errors and retransmissions) -- that you can use to capture sizing and troubleshoot storage and WAN configurations.
Playing well with others
Leverage existing tools for a data center WAN and work with your fellow storage or system admins. Storage admins can also learn from you about the WAN. Meet with storage admins and share your WAN reports with them so they can tell you what's occurring from a storage perspective. Likewise, you can share with them what you are seeing to help them understand the factors that measure WAN cause and effect.
All in all, the result should be a more effective wide area storage networking environment.