It's important to understand the distinction of machine learning versus other forms of automation analytics. Older-generation...
network management systems relied on rule-based systems. These systems are relatively easy to build, because they rely on simple rules network experts can use to diagnose network problems. Rule-based systems define an action to take when a rule is matched. Actions range from generating an alert to launching more complex remediation tasks.
The problem with rule-based systems is they require maintenance and frequent updating as new rules are needed. It is often too cumbersome to create rules where numerous changes in the conditions require very different results. In addition, these systems are not very flexible. The rule sets may miss a problem if the rule set in question doesn't exactly match the problem's symptoms.
It's much better to build a system that can learn about problems from the network experts who use it -- much like training a person who is new to the field of networking. Then, as new problems and solutions are found, the system would learn the symptoms and the resulting actions to take.
Most of the industry agrees the integration of AI is among the benefits of machine learning. For our purposes, think of machine learning and deep learning as examples of neural network technology. A neural network is trained when it is fed a lot of data from the domain in question -- along with the appropriate answer or response. The neural network learns the appropriate response when presented with new data. In a sense, these networks are not learning as much as being trained to respond to general classes of input data.
Benefits of machine learning and network management
The past few years have seen machine learning being applied to network management. This is an exciting development, because it has the potential to significantly improve the management software's ability to detect and accurately handle problems. It is even possible to include remediation in the actions that result from accurately determining the cause of a problem.
For example, Moogsoft, a developer of network management software built upon AI, has used machine learning technology to enable its product to significantly reduce the number of network and system events -- from Simple Network Management Protocol traps, syslog and server log messages -- IT staff must handle.
It takes a few weeks for the software to learn how the IT staff handles events before it can begin to correlate them. Input from the IT team is key. That is what helps Moogsoft's software understand what is important in the event stream. The machine learning engine, in turn, is simply automating what it learned from the IT staff.
Machine learning algorithms are not capable of learning business impact, nor can they learn how to remediate any problems. But they can significantly reduce the number of events IT staff must handle on daily basis. The reduction in event volume is significant -- from thousands or even millions of events each day to just a few critical events.
For example, the loss of connectivity to a remote site could trigger a large number of reachability failure events. The Moogsoft software is able to comb through and eliminate the many secondary events that occurred as a result of the loss of connectivity, enabling IT staffers to capitalize on the benefits of machine learning, as they focus on why the connection was lost.
Tetration and how it works
Cisco's Tetration analytics system, meantime, uses an entirely different approach to perform big data analytics on packet header data. The first 160 bytes of each packet are sent to Tetration for collection and analysis. Yes, that's a large volume -- almost 2% of the traffic that's being generated. To pore through the data it collects, Tetration relies on a lot of horsepower -- a 42U rack full of servers and storage systems; there is a smaller version and a cloud version, too. It uses Hadoop processing to identify application performance problems, detect security threats and identify network problems. It also learns how applications work, creating baselines of application performance.
When an application's performance changes, Tetration attempts to determine why, or at least generate an alert about the change. Among the many performance characteristics Tetration monitors is higher latency, packets that indicate a cyberattack or retransmissions that indicate packet loss. This type of approach helps the IT organization determine if a slow application is due to the application internals -- slow database -- or due to a network problem. A fall 2017 Network Field Day presentation featured more information about the analytics engine.
Benefits of machine learning in reducing resources
Having a network management system that actually helps reduce the effort involved in monitoring and managing a network will bring many benefits to IT professionals. Most network management tools are delivered as frameworks, which imply there is a lot of effort required to turn the framework into a functioning system. Some products require a full-time administrator to refine their operation.
Wouldn't it be great if the system could learn what actions to take from the actions of the network administrators? Even if we don't enable remedial action functionality, machine learning highlights new problems and provides alerts when they happen. Imagine a system that can learn the error and utilization thresholds on different link types.