Any new technology includes new risks. Or maybe the risks are just familiar risks that are packaged in an unfamiliar...
way. Software-defined WAN (SD-WAN) networking principles are no different. Everyone talks about the benefits of SD-WAN, but it's time to take a look at some of its risks and what you can do to reduce them.
Apply good networking principles
New technology does not eliminate the need for good networking principles, design and operational practices. Every technology has a risk/benefit continuum, and there is an optimum design and optimum way to operate it. It pays to learn about a new technology with the perspective of identifying its strengths and weaknesses. Vendors help us a lot with learning the benefits, but we often have to dig deeper to learn a technology's weaknesses and failure modes. Understanding those weaknesses and failure modes allows us to avoid using technology that will have a significant negative impact to our organizations.
Understanding failure modes
Controller failures are a concern where an SD-WAN system uses an external controller. However, many of the so-called SD-WAN devices available today are more like legacy routers, switches and WAN optimizers with integrated controllers. If the controller fails, the whole box gets replaced.
SD-WAN systems (and SDN systems) with separate controllers, however, create a different problem -- how does the system detect a controller failure and how does it function when a failure is detected? This is an area where each system is potentially unique.
Work closely with prospective vendors to understand controller failure modes. Investigate how the failure is detected and what level of performance you can expect during the failure. Don't rely on a single notification of a failure. Configure your network monitoring system to periodically check all important components, just in case the initial event was missed.
Another common failure mode is called split brain, in which multiple redundant controllers lose the ability to communicate with each other. Some failures (i.e., an incorrect access control list) can cause controllers to lose contact with each other, yet they can communicate with all the data plane switches. Again, the important points are to detect and report the failure while providing some level of useful functionality.
Link failures are likely to be the most common problem. Is the resulting link capacity sufficient to carry critical business traffic when a failure occurs? Yes, having unused backup capacity costs money, but so does a service-impacting outage.
When a link failure happens, the SD-WAN controller should provide priority treatment to important traffic. How easy is it to set policies that prioritize your critical traffic? Is there a mechanism to dynamically provide feedback from the SD-WAN system to the unified communications and collaboration admission control system?
Traffic flow modification
Investigate how traffic is distributed over multiple links. Many networking protocols don't work well with out-of-order packet delivery and that is something you want to know before making an expensive equipment purchase. Packet capture analysis may be required during vendor evaluations to validate that the introduction of the SD-WAN controller isn't causing other problems.
Finally, don't forget basic networking principles for monitoring and troubleshooting. Make sure the monitoring systems can detect and report on all failure modes. Don't rely on a single alert. The monitoring system should perform periodic checks of the infrastructure and generate alerts whenever a problem is detected.
Remember, good principles dictate SD-WAN is part of a larger network and should integrate with it. Understand the new modes of operation and its failure modes, and apply the best networking principles.
Understanding use cases for SD-WAN
Survey shows moderate uptake of SD-WAN
Differences between hybrid, virtual and SD-WAN