Software-defined WAN is embracing traditional security functions -- and even wireless LAN capabilities, in some cases -- as it morphs into a software-defined branch. But this increase in scope can cause an increase in complexity.
Browse any vendor's SD-WAN or SD-branch offering and you'll find a bewildering list of features. Underneath all these features, however, the core of SD-WAN remains the same. You need three pillars of quality of service (QoS) to keep the structure from collapsing and to keep applications running smoothly. These QoS pillars are traffic shaping, path control and forward error correction -- although they sometimes go by other names.
With the massive expansion of SD-WAN and talk of advanced functions like intent-based networking, it is easy to forget that all those features still rely on traffic that moves across an often unreliable WAN. Unsurprisingly, vendors like to keep prospective customers focused on the big picture and shy away from the details. In fact, it's often difficult for enterprises to find detailed information on the technical underpinnings of SD-WAN options.
Be ready to ask your prospective vendors specific, detailed questions about SD-WAN QoS. Get back to the basics. If a vendor can't or won't explain the core technologies of its product, you're flying blind and can't develop a suitable test plan for a proof-of-concept trial.
Here are some questions for each of the three QoS areas to get your SD-WAN research started.
Ultimately, SD-WAN QoS is about apportioning a limited resource. No matter how much you upgrade your bandwidth, you can count on applications coming along to consume it. Traffic shaping starts with identifying and classifying traffic according to your business priorities. It then queues that traffic so the most important traffic receives sufficient WAN resources to meet your service-level requirements.
Step one. It all begins with identifying traffic. Ask your vendor to detail its identification process. How many applications can the vendor identify dynamically just by monitoring your traffic? Request a specific list and ask or look to see how granular that list is. For example, does the vendor detect YouTube traffic as being different from Facebook Live traffic, or is it all classified as a broader video stream?
Step two. Once traffic is identified, how does classification take place? Ultimately, this is the step that determines who gets what in terms of precious bandwidth. What classification criteria are available? While VoIP is likely more important overall than file backup, for example, not all VoIP conversations will be equally important. Business executives using VoIP will need and demand better service than lower-level branch-to-branch communications.
Step three. Finally, ask the vendor how it makes all this information visible to you. Networks are rarely static, so ask about analytics. As the suite of applications running across your network evolves, you want both real-time and historical analytics so you can stay one step ahead of your network.
Single-link networks are simply too risky for companies that have business-class service-level agreements. Thus, today's SD-WANs likely consist of two or more links that are used to load share and provide backup. Many networks will even have cellular network links available as a backstop in case traditional wired links fail. Of course, simply having links available means very little. What's important is the software intelligence that decides how to use the additional link or links.
We typically think of extra links as backup resources -- and they are. But switching sessions over to the backup link in case the main link fails -- or blacks out, in industry parlance -- is really the easy scenario. The link goes down and you migrate the app.
Brownout scenarios are more challenging. These are situations where the link is still up, but it can't deliver important applications to their desired service levels. Perhaps there is high packet loss, latency or jitter. Perhaps the link itself is going in and out of service because of a faulty connecter. Whatever the cause, your vendor needs to tell you what specific conditions its software can detect and how it handles transitioning apps to the secondary link.
Step one. Have your prospective vendor tell you what conditions you can set to trigger a transfer of an application session. Can you specify a maximum tolerable delay? Can you specify a lowest tolerable throughput? Can you specify certain thresholds? Can you configure a link transfer to occur only if the degradation persists for a specified period of time? In short, how much control do you have over this migration?
Step two. Next, what happens when the conditions that triggered the migration subside or are mitigated? Perhaps the secondary link is a slower link than the primary one. You want your applications back on the best possible link.
What is the vendor's default action when it comes to restoring applications back to the original link, and does it have controls you can specify accordingly? What if the main link is having sporadic problems? You don't want important applications flip-flopping back and forth between links.
Step three. Here, too, ask about analytics. What kind of visibility does the vendor provide into the link paths the applications use? You will want and need to know if your primary links frequently have issues that force your application onto backup links.
Forward error correction
Even when your links are working fine and you can shape traffic as required, events beyond your control can degrade application performance. A bad link or overloaded router somewhere in the network can cause one or many of your application's packets to be discarded -- i.e., lost -- resulting in session delays or even session failure.
Forward error correction (FEC) is a proactive approach to making sure packets arrive. In short, multiple packets are sent via different routes with the assumption that at least one will arrive at the other end. Should multiple copies arrive, the extra packets are discarded.
FEC solves a problem, but at a cost. Bandwidth is potentially wasted by carrying extra copies of packets that the destination may not ultimately need because the original packets arrived without a problem.
Step one. Ask your vendor for specifics. What triggers FEC? Is it triggered dynamically in response to detected packet loss? Or is it a static option that is either on or off? Can you specify FEC only for certain important applications or do you need to select it for all apps? Because of the overhead involved, you want to be able to control how and when you use this type of feature.
Step two. Yet again, ask your vendor about analytics. You need to be able to see statistics about FEC use. This can also give insight into the quality of your links and the service provider links your traffic traverses. Ideally, those service providers will provide sufficient bandwidth so user packets aren't lost. FEC stats might help you locate a problem that needs to be fixed rather than simply patching the problem by sending extra packets.
Final thoughts on SD-WAN QoS
Ultimately, a vendor's grand vision of application delivery can only be delivered if the technology at the core is solid and sophisticated. The only way you can find that out is by understanding how the technology is supposed to work and then using your proof-of-concept phase to confirm it works as advertised. Remember the Russian proverb "doveryai, no proveryai" -- trust, but verify.