bluebay2014 - Fotolia
I recently spent an afternoon debugging a VoIP issue with what should have been a simple SIP connection. I'll spare you the gory details, but the final outcome is that something was producing asymmetric call quality issues during forwarding from the carrier through the gateway out to SIP.
Actually, that's putting it rather gently. In reality, this produced a 4,500-ms latency and 75% packet drop, but only in one direction. Now normally, this would have been a piece of cake to troubleshoot -- you'd simply find a spot to grab traffic with apps specializing in packet sniffing, like Wireshark, to see what's going on. You could determine if it's the phone or the gateway and troubleshoot from there. The challenge, however, is that it was an SDN route.
Okay, it wasn't technically SDN in the pure sense, but it was software defined in that the network adjacent to the gateway was being programmatically managed by the controller. This wasn't just calls routed --route definitions, ACLs and other elements that allow the VoIP traffic to change on an as-needed basis are also directed by the controller.
Getting a handle on where to grab traffic took some time because yesterday’s packet-sniffing tools aren't designed to sort out Layer 4-7 virtualization. So, in a software-defined network, what will replace our most trusted and indispensable resources? Manual packet reassembly by Virtual MAC, anyone?
Deep packet inspection for virtualized networks
At first glance, this shouldn't even be an issue. Many SDN solutions tout application-level traffic monitoring that combines the aggregate utility of NetFlow with the discrete analytics of application firewalls. In practice, however, there are some new complexities not present in admin-defined networks.
Yes, more network devices offer Deep Packet Inspection (DPI) inline, but fire-hosing that data off for analysis somewhere else multiplies the data storage headache started by NetFlow. You shouldn't need big data to get application awareness on a virtual network, and your SDN controllers shouldn't have to do Map Reduce to find the Skype needle in the vHaystack.
At a minimum, our networks should have been focused on applications for some time, and more recently, aware of their existence. And in the world of SDN, our networks --and the tools to manage them-- must become both application and policy aware to thrive. Where is the v0.1 release of vWireshark that can differentiate between application traffic that is acceptable in one policy but not another? How will admins create traffic filters that divide captures based on policy definitions for virtual tunnels? Will we direct SDN controllers to create huge virtual port mirrors? At 10G and beyond, that will be a hoot.
Undiscovered country: monitoring at the app server NIC
It's easy to overlook the one safe place for application-aware network monitoring: the app server's NIC. Today, when we're troubleshooting applications and it's really tricky --something ugly down on the endpoint--packet sniffing right at the back of the box, or in a virtual machine right where the guest OS meets the host virtual switch, can help. This doesn't change in SDN networks -- in fact, SDN makes DPI at the server level even more attractive because at the NIC, virtualization is no factor. There, you’ll find raw traffic with all the access you're accustomed to.
As you transition to SDN, remember that your existing network performance monitoring solution likely already offers server-side DPI out of the box, along with a dozen other features you might not know about. There's no need to lose all the metrics you've come to depend on, especially if your application-aware networking approach is mature.
As the network becomes more opaque, moving DPI sensing to the can ensure your application service delivery metrics will remain clearly in view. This is doubly true for cloud and hybrid networks, where often the only reliable access is with the application server.
In the case of my VoIP issue, it turned out to be a classic misfeature of a QoS, traffic map. Botched policy was tagging the return traffic as best effort while pairing it with an odd combination of queuing and hard discards. I was fortunate it was downstream of the gateway controller. With no easy way to inspect traffic in the software controlled gateway, I might be on the phone with support right now, and no admin likes resorting to that.