"Knowledge is power," so the saying goes, but nobody knows better than a network administrator that there's an enormous difference between information and knowledge.
And so it goes with big data. Big data
The key to successful use of big data in network applications is to focus on issues, not on data points.
In networking, big data -- the often enormous pool of traffic, trunk and device information that is available from standard management systems and interfaces -- is collected from probes deployed at various points, as well as from network layer software in client and server devices. When it's presented in a standard management-system interface framework, some of this information may map to current fault, configuration, accounting, performance and security (FCAPS) management practices, but most companies can't correlate data from client/server devices or probes with current operations activities. That's where big data and big data analysis can help.
The single most critical element in making network big data work for you is insuring accurate event timing for all the data elements. Networking is about momentary conditions and juxtapositions of events. Loss of time synchronicity means complete loss of context in analyzing information. If all data collection is clocked from a common source, time synchrony should be adequate. If not, it could be helpful to introduce synchronizing events into big data collection points to align the timing of all the recordings at regular points.
Establish mapping to pinpoint network problem
Once it's certain that the timing of events can be accurately correlated, the next step is to establish a mapping between this common timeline and network problems. Information about the sources of the problem can come from current FCAPS process, from user complaints or from client/server telemetry. The latter may also be able to recover such quality of experience information as response time, as well as network performance data measuring rate of packet loss and delay (from TCP window sizes, for example). This mapping permits big data analytics to explore the correlation between these problem points and the metrics of the period before the problem first appears.
This kind of big-data assessment can be invaluable for root cause analysis of network problems, something that often is nearly impossible to do properly by other means. Because network conditions change so quickly, administrators often chase problems from place to place but never manage to be looking at the right thing when something happens. Big data analytics can correlate thousands (or millions) of data elements against known problem points to spot correlations, which then can be traced through data analysis to their causes.
Finding normal operating conditions
Another strategy for applying big data to network problems is to use big data for baselining normal network conditions. If the previous step -- mapping problem points to a big-data common timeline -- is completed properly, it also will reveal when there aren't any problems. Analysis of network data collected during these "good" periods will allow an administrator to baseline what is represented as normal network behaviors and to quantify what "normal" means in terms of the masses of data collected.
At this point, baselined normal behavior then can be used to analyze periods of network operation that are not considered a problem but also are not convincingly correlated with normal operating behavior. Most seasoned network administrators know that networks sometimes enter an unstable state but don't actually fail or generate complaints. There are also conditions in a network, in overall demand or in a server resource state, which can affect network operations. Baseline data can help find any and all of these.
Big data analysis can help find ways to remedy network conditions
One behavior to look for is where analytics indicate a network condition that fails to generate a problem report even when it closely mimics a problem period. Here, the goal is to use analytics to explore what might have mitigated the expected problem; this could refine your root cause analysis or suggest other ways to remedy conditions.
Another component to examine is how resources are affected by a network event, an application or server event, or a change in user traffic load. When a significant change occurs in any of these areas, the network should respond in a predictable way. For example, a significant change in application traffic generally would cause a noticeable increase in response time, as well as an uptick in the rate of dropped packets, among other problems.
Such behaviors occurring without an accompanying major change in traffic would suggest that resources are overloaded. Similarly, it might indicate that the network is overbuilt if a significant change in traffic occurs that's not accompanied by a concomitant increase in response time or network packet loss overall. In this case, some capacity reductions could be tolerated, thus paving a path to securing a lower operating budget.
Focus on actionable conditions only
One last piece of advice: Avoid data-shopping for problems. Some administrators will dig through big data reports looking for unusual patterns even when there is no indication that these can be associated with any of the processes or tasks outlined above. You may just discover that people use their applications and the network more at a given time of day. (This usually is easily determined: Just look around the office!)
The key to successful use of big data analysis in network applications is to focus on issues, not on data points. Information about network conditions -- whether they're consistent and normal, or aberrant and problematic -- is only relevant when it can be classified. Conditions that can't be classified are difficult to turn into action items, and thus it's difficult to justify spending analytic time detecting and managing them. Running a network is a challenging job, and to be a viable tool, big data has to make it easier, not more difficult. Keep focused on actionable intelligence, and you can't go wrong.
This was first published in May 2013