Manage Learn to apply best practices and optimize your operations.

Troubleshooting SDN? You'll need a packet time machine

SDN promises automation without old configurations. But when networks dynamically change, troubleshooting SDN will require a packet time machine.

My pool is a cruel mistress. Oh sure, to the public it appears to be sparkling and warm; nestled into a mature backyard landscape, it seems like an oasis on a trouble-free summer weekend. But in private, it's a tangled mix of crumbling 30-year-old infrastructure along with last year's fresh, bleeding-edge upgrades.

In theory, it should be getting better over time as replacement parts are more robust and sometimes higher tech. In reality, however, the pool is no more reliable than it was years ago. In fact, troubleshooting is more complex than ever with equipment, technologies and brands of different vintages all tied together with generic 2" schedule 40 PVC.

With SDN, one fundamental change above all others will drive this complexity: configurations, as we know them, are going away.

I recently spent an hour sweating and bent over the plumbing, fiddling once again with the suction manifold as I tried to isolate why the pump refused to prime. After much cursing, I finally sat, waiting patiently for water to flow. I imagined the water trying to pass through the maze of fittings -- flowing here, blocked there, re-routed somewhere else. Perhaps it was the hot Texas sun, or maybe I stood up too quickly, but I started imagining packets trying to flow through PVC pipes. Before long, I realized the pool was a great model for the challenges of troubleshooting SDN.

While the dream of an all-programmable network is indeed as enchanting as a summer breeze, like everything in life, the reality is more complex. Once we begin implementing SDN, we're going to end up with a collection of related parts: legacy technology, plus new controllers, plus yesterday's first-generation products that were once bleeding edge. Overarching above all of this equipment is a new problem of unwinding, dynamic, fleeting configurations: the product of automation. When something goes wrong and we have to troubleshoot, we'll have to examine these configurations after the fact.

SDN troubleshooting means searching through automation's maze

Like the PVC on the pump pad, today's network configuration technology is the plumbing behind our infrastructure. Our current device configurations are unions, slip joints, T-connectors, reducers, etc. They're more or less fixed and designed to move traffic in a specific way, from point to point. SDN's rule-based polices are like the valves with powered controllers. They enable automation-driven control on top of the fixed plumbing beneath. In combination with other controller-adapted gear, they have become truly autonomous.

The automation controller tries to make them all work together in harmony, but when that doesn't happen, troubleshooting involves considerable guesswork on the effective configuration that was in place at the time of failure. By the time an issue is noted, however, ongoing configuration changes in the interim could make finding the root cause challenging.

When configurations get replaced by policy and automation

With SDN, one fundamental change above all others will drive this complexity: Configurations as we know them are going away. They're being replaced by policies. On devices themselves, they're either collapsing into a single layer as in Cisco's Application Centric Infrastructure model, or they will be created and managed by a new service layer on top, as is the case with VMware's NSX. Either way, we can't just SSH in and untangle a configuration.

Therein lies the challenge: Configurations are relatively stable. When we replace fixed configurations with policies empowered for autonomous change, it's like adding programmable valves to the system that will be activated outside the control of traditional static configurations. How will we untangle the effect when a configuration changes from minute to minute outside of our traditional control? When we're untangling a ticket from 48 hours ago, how will we re-assemble a snapshot of all the policies active at the time of the issue?

Magic packet time machine

With traditional networking, it's easy-ish to recreate an effective configuration. You back up your configurations every night and configure SNMP traps so your network configuration management solution automatically pulls backups after local changes. Looking at the configuration backups from any given day and time tells you exactly what all the rules were at that time.

Read more on Patrick's views on SDN and management

When SDN automation meets IoT, big problems loom

Can Cisco's almost-open SDN outrun VMware?

Secure SDN before investing

But with SDN, it's not so easy. With multiple layers of configurations, access and QoS policies, and automated network service delivery optimization, the change volume will be substantially higher. Back tracking a hypothetical packet traverse from 1:41 a.m. last Friday will require the ability to recreate the computed configuration effects of all related elements.

What's needed is essentially a configuration time machine with a scroll bar you can scrub back and forth to not only recreate a snapshot, but watch the dynamic changes that produced the unique moment of configuration. With such a time machine, my guess is that we'd find many problem resolutions in the dynamics or temporarily colliding policies, not preset expected configuration states.

As for debugging the pool, it turned out I only needed to sit a spell --somewhat defeated --on my overturned, faded Homer bucket before hearing the familiar sound of a nearly overheated pump finally gulping on cool water from the manifold. Was it replacing the impeller, lubing the spider-valve or just patience that got the flow going again? Without a time machine, it's hard to tell.

About the author:
Patrick Hubbard is a head geek and senior technical-product marketing manager at SolarWinds. With 20 years of technical expertise and IT customer perspective, his networking management experience includes work with campus, data center, storage networks, VoIP and virtualization, with a focus on application and service delivery in both Fortune 500 companies and startups in the high tech, transportation, financial services and telecom industries. He can be reached at

This was last published in June 2014

Dig Deeper on Software-defined networking

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Will difficult troubleshooting prevent you from investing in SDN?
The analogy here is awesome, and the point you are making is spot on.

One way to think about configuration is as persistent state. All of networking is driven by state. Whether that state exists as configuration that is readable and restorable, or whether it exists more ephemerally based on network conditions is an important detail.

The future of networking will require that we grab this state, offload it somewhere for analysis, and have a way of unwinding state to correlate it with events and conditions on the network. This is why I would expect analytics to play a much heavier role in some of the control efforts (think: OpenDaylight).

I wrote about this awhile ago actually. For reference:

Michael Bushong (@mbushong)