Warakorn - Fotolia
Sitting in the restaurant after wrapping up VMworld, I wiggled my feet slowly against the bar's foot rail, happy to finally be off my dogs. Humans are only designed to stand in a booth for so many hours and no matter the number of great conversations with attendees, it was great to be done. It also freed my mind to notice, again, that for many places in downtown San Francisco, dimmed lighting tends to pulse randomly, often and persistently. The bartender said she thinks it's the dimmer, not the mains, but I noticed it wasn't just the pendants over the bar, but also the chillers behind, brightening and dimming in sync.
I thought about fluctuating current's unhealthy effects on anything plugged into these circuits. I thought about virtual machines and data centers. I thought about infrastructure health. And after a few hundred conversations about virtualization, hybrid-cloud and software as a service (SaaS), I kept thinking back to one particularly dangerous area of underinvestment in infrastructure: our wide area networks (WAN) .
WAN infrastructure has become our grid
Not so long ago, our enterprises functioned almost normally when connection to the Internet dropped out. Phones still rang, and sales, order management and even fulfillment services continued to operate. We just weren't dependent on the outside world. We ran all our on services on premises, or in the cases where we didn't, we had old-school, telco-grade frame relay between campuses. Slow perhaps, but steady.
Today, that's all changed. First, we accepted telcos' tantalizing new, high(er) bandwidth Internet-based links that allowed us to create viable LAN-ish WAN links for the first time. We took advantage of that bandwidth to centralize previously campus-local enterprise services of every type. Once somewhat proven, we even replaced our tried-and-true dedicated PBXs for voice-over-IP, also using WANs.
Now we've gone even further, migrating locally hosted services to SaaS and pushing our physical racks to the cloud and infrastructure as a service. We do this because we bank on a reliable WAN just as we relied on local twisted copper LAN. We've basically bet the farm on apps like SalesForce, and our businesses would quickly grind to a halt if the WAN infrastructure was unreliable. But we're OK because the WAN is made of wondrous magic beans, right?
No, it's not. Never has been. Trusting business blindly to WAN links is a fool's paradise.
Hold ISPs to the 'agreement' in SLA
Old-school frame relay and the point-to-point technologies that followed it had one major advantage: visibility. We could monitor every aspect of the connection to ensure quality and troubleshoot, even if it could become a frustrating hairball. Then came the wonders of multiprotocol label switching (MPLS) and we could outsource 90% of the connection complexity, especially for multi-campus WANs. But there's a problem with that: Once you're running border gateway protocol and push Layer 2-3 routing out of the building, the details of your connections --and monitoring points --are lost. MPLS is largely opaque to network performance monitoring. Its primary benefit is also its biggest challenge.
At VMworld, I had a number of conversations about MPLS with customers and I was in awe of one in particular. I can't say who, but he runs the app delivery network for a big brand you know, and he used the size of his monthly WAN contract to bully Enormous International ISP™ into opening up monitoring ports on some of its internal plumbing. He could actually see what was going on. I'm in awe because if I try that, Enormous International ISP will tell me to pound sand. What I can do however, is use the tools he used to get past the velvet rope and into the VIP section of his carrier network.
His solution? Beat Enormous International regularly about the head and shoulders with detailed WAN infrastructure performance reports. Not anecdotal reports of underperforming networks, but pages of charts and graphs measuring service level agreement (SLA) fail quantity in reachability, bandwidth, latency and jitter. In fact, his automated monthly report emails had two other great features. First, they only fired if the SLA wasn't met. Second they always attached a PDF of the relevant SLA from the ISP contract. After a few months of cutting $100,000 refund checks, Enormous International ISP™ decided it might be better to allow enough monitoring to remedy any performance problems quickly and thus not issue painful service refunds.
Use traditional monitoring in new ways
Best of all, it wasn't that difficult. He was doing IP SLA monitoring across the MPLS cloud on gear he already had, watching NetFlow and sFlow. Lastly, he ran a little quality-of-experience application monitoring on selected servers as part of the routine observation. Then, he just rolled it up into a report that fired on the first of each month for each ISP.
I still have no idea why the lights seem to flicker in downtown San Francisco, but one other thing the bartender said made me think it was the grid and not something on premises. According to her, sometimes all the power goes out in the neighborhood. Nobody knows why and it's common enough that most people don't call PG&E anymore. They just wait a bit and the lights come back on. That might be an endearing neighborhood-ism of the Embarcadero, but the days of putting up with that kind of service from our WAN link providers are over.
About the author:
Patrick Hubbard is a head geek and senior technical product marketing manager at SolarWinds. With 20 years of technical expertise and IT customer perspective, his networking management experience includes work with campus, data center, storage networks, VoIP and virtualization, with a focus on application and service delivery in both Fortune 500 companies and startups in high tech, transportation, financial services and telecom industries. He can be reached at Patrick.Hubbard@solarwinds.com.
MPLS or Ethernet: What's best?
WAN in the cloud
When WANs go bad