Rawpixel - Fotolia
After a good bit of teeth gnashing and some cursing, yesterday I finally debugged a long-bedeviling access issue with a hybrid cloud network.
The virtual private cloud (VPC) supports a training program with free virtual machine (VM) time so students can follow along with a live instructor without the delay of product setup and install. But, the swarms of short-lived Amazon Web Services (AWS) VMs were resisting every effort to bring them reliably under the dominion of my botmaster. They couldn't seem to access the Amazon Simple Queue Service (SQS) with which I smite them -- verily in their hundreds -- to keep my Elastic Compute Cloud (EC2) light bill in check.
The VPC's private address space was linked, as any good hybrid cloud should be, back to my data center using AWS Direct Connect to present a single, coherent network. The frustration was all the systems in the data center could see SQS fine, but VMs in Amazon's own EC2 could not. There seemed to be no magic combination of VM instance access permissions to make it work, despite the nagging feeling that surely they'd automatically have visibility to core AWS services.
The solution ultimately involved routing. In resolving it, I've come to realize a new level of network complexity when it comes to cloud management. Not only are there such things as legacy cloud networks, there is something else I'm now calling hybrid-hybrid networks (HHN).
Hybrid-hybrid networks? What are those?
The current cloud adoption push by enterprise IT is Infrastructure as a Service (IaaS). Basically, IT is moving existing servers from physical racks in data centers to VMs in the cloud (co-lo cages in the sky). Of course, we need to make sure our Exchange servers can still route authorization requests back to on premise Active Directory, so admins set up persistent links using AWS Direct Connect, Azure Virtual Network or other similar services. Once you've done that you're using a hybrid cloud, pure and simple.
But what if you were an early adopter of real cloud services, like storage, cloud databases, queuing, transcoding, or others? For you, you were in the cloud even when all your servers were still in your racks. For example, if you know your database name, but have no insight to its host, then you're one of these users. Once you start moving your cloud service-consuming servers to VMs in the cloud, you enter a new level of network complexity and create something different: hybrid-hybrid cloud.
More networking, not less, is the result
For early adopters, IaaS should be a no-brainer. Your business already takes advantages of rich cloud services, so moving the bulk of your systems to hybrid cloud and IaaS should improve service delivery by better colocation of services and consumers. That should leave you to only worry about last-minute user interface issues. The heavy network traffic between servers and backends, you assume, will magically improve. Unfortunately, it doesn't always work out that way. Worse, our favorite debugging tools may not work at all with IaaS infrastructure. (I'm looking at you, NetFlow).
So first, remember that just because all your services are with one vendor doesn't mean they're all in the same place, or that network complexity will be any less of a headache. When your cloud service-backed apps were in your racks, they took advantage of the service's geo-routing font-ends. No matter where your data center was, the service created the most efficient route. Now, if you decide to move a server to a VM in Sydney, it may take a performance hit for services running in Virginia. Yes, that's an unexpected cloud networking bummer: You still have to make service topology Visio diagrams.
Second, use the tools you have on hand. You may not have access to the VSwitch of the IaaS platform, but your IaaS servers still have OS' and NICs, and you can install deep packet inspection sensors to watch traffic from the could servers' perspectives.
And finally, you still maintain total visibility of the VPC demarcation point into your data center, and more times than you think, you'll spot issues there.
Curing routing with this one weird trick
In my case of HHN, I added special VM-instance role restrictions to protect the safety of the network from the student VMs. Namely, I was routing all traffic directly back to my data center, where I could manage outbound Internet requests with the Palo Alto firewall I trust. This wouldn’t normally be an issue, but I also limited AWS service calls for a particular subnet in the data center. The firewall saw a swarm of incoming requests from these services, but originating from new – and untrusted -- private subnets in the AWS VPC space. The firewall did what it was supposed to do --it blocked the traffic. The resolution was actually easy after the exasperation of exploring red herrings in the EC2 dashboard.
As you move VMs to the cloud, just remember the magic is in not having to think about infrastructure. You'll still have plenty of planning and troubleshooting to get there. Solid networking and old-school detective work will be more important than ever, especially once we get to hybrid-hybrid-hybrid networks. That will still arrive before we implement IPv6.
About the author:
Patrick Hubbard is a head geek and senior technical product marketing manager at SolarWinds. With 20 years of technical expertise and IT customer perspective, his networking management experience includes work with campus, data center, storage networks, VoIP and virtualization, with a focus on application and service delivery in both Fortune 500 companies and startups in high tech, transportation, financial services and telecom industries. He can be reached at Patrick.Hubbard@solarwinds.com.