IP routing is one of network management's most complex domains. And no wonder: not only are routing protocols highly automated and distributed in nature, but routing errors can have a large-scale, sometimes calamitous, impact on quality and availability across the entire network.
While best practices define controlled processes for planning and executing routing maintenance operations, they aren't enough to overcome the fact that existing tools for performing routine routing management tasks leave network engineers at a significant disadvantage today. Tools such as ping, traceroute and CLI show commands provide neither the global visibility to understand distributed routing dynamics nor the automation to quickly gather and interpret dynamically changing routing state. This lack of accurate information can lead to costly and time-consuming errors.
An emerging technology known as route analytics can augment best practices policies, making a huge difference in the accuracy of these routine operations. Route analytics taps into the information in the routing protocols themselves to understand how the network's dynamic topology is operating – logically – at any given moment. Route analytics solutions record all routing protocol updates and create an interactive, true-to-life model of the routing topology that is as accurate as the actual network – because it is the actual network. Using this "live model," route analytics provides a broad array of analysis tools, including interactive network-wide routing topology views, detailed and filterable tabular data on all aspects of routing state, and animations of streams of routing changes over selected timeframes. And because the as-running state of the network is continuously recorded, the historical state of the network can be "replayed" from any point in the past. These capabilities provide the global visibility and complete granularity on routing state needed to plan and execute highly accurate routing maintenance operations.
Best practices for performing routing change operations during a maintenance outage typically dictate following a standard plan:
- Pre-maintenance work
- Installation plan
- Back out plan
- Approval process
- Perform Maintenance
- Test changes
- If tests successful, then end maintenance.
- If tests fail, either correct issues or back out of maintenance.
- Confirm that only the changes required of the maintenance happened.
An example of how route analytics can enhance routing operations best practices is the case of a routine maintenance outage to add Interior Gateway Protocol (IGP) routes, such as with OSPF, IS-IS or EIGRP.
During the planning stage, engineers can take advantage of route analytics' full knowledge of all routed prefixes, metrics and netmasks through a network-wide prefix list that shows all active routes in the network. This prefix list provides 100 per cent assurance that the prefix and mask do not already exist in the current infrastructure during the planning stage.
Using the same prefix view, network engineers can also confirm the summary block that holds this prefix and the source it is advertised from. This ensures that the summary will be advertised correctly.
Before executing the maintenance, network engineers can view the routing event streams of the router and region to be changed, thereby confirming that they are not experiencing significant churn. If there is significant churn, the network engineer can work with the responsible person on call to see if these events can be remedied to return the router and region to a quiescent state. This is important because starting maintenance while other problems are occurring in a region can impact diagnostic efforts for another engineer who might be working in the region. In addition, this prevents the change operation from becoming the unwitting scapegoat for problems that were already occurring before the maintenance.
Once the route is added to the protocol on the router, network engineers can also easily verify that the router has propagated the change by examining prefix lists, from both inside and outside the area or level, to see if the summary is working correctly. They can also verify from which router the route originated, where it is summarized, what netmask and metric are associated with it, and how traffic from other parts of the network will use the route to get data across the network.
If the route is not behaving in the network, route analytics can be utilized to model metric changes to route traffic correctly. The network engineer highlights a routed path across the network in the topology map, then clicks on the relevant link, adjusts the metric, and finally observes how the path changes.
Alex Henthorn-Iwane is Senior Director of Marketing for Packet Design, Inc. in Palo Alto, California.