When Nicira founder Martin Casado was pursuing his doctorate in computer science at Stanford five years ago, he...
set out to transform the operational model of networking so it could keep pace with the automation that server virtualization brought to the data center.
Casado thought his invention, OpenFlow, would solve that problem by itself, but now he says he was wrong. OpenFlow hardware control -- now all the rage in the networking industry -- isn't the answer. He decided to take a different approach in overlay software for network virtualization, and VMware thought so highly of that strategy that it spent $1.2 billion to acquire Nicira.
"The problem is, we actually got it wrong, and I think a lot of the industry hasn't realized how wrong it was," Casado said during a whiteboard session he hosted this week with several journalists in VMware's Cambridge, Mass., office.
Casado created OpenFlow as a means of decoupling the control plane and data plane in network hardware and centralizing control in a central "brain" -- the OpenFlow controller. This innovation would enable programmability and completely transform network operations. "This was my thesis at Stanford -- that this is the way to automate networking," he said. "So, the first three engineers at Nicira wrote this protocol ... and we did a lot of the early work in understanding the limitations of SDN [software defined networking]."
OpenFlow still makes sense in many use cases, particularly for traffic engineering, Casado said. Google's data center interconnect deployment is a perfect example. But when it comes to network virtualization in the data center, OpenFlow for control of hardware forwarding is the wrong way to go, he said.
Virtual switches instead of OpenFlow hardware
"Within the first year, we realized that something really important was happening," Casado said. Server virtualization had transformed the network access layer in data centers. The virtual switches embedded in hypervisors, particularly VMware's vSwitch, had become the new network edge. If the new edge was in software on the servers, why bother using OpenFlow to control physical switches? A virtual switch is ideal for data center network virtualization, for two reasons. "First, it runs on x86, and x86 is super-flexible. We know how to program it. It's not like you're chiseling some algorithm in some proprietary ASIC. If I want to change how I do forwarding, I just write a new program.
"Second, it's close to the edge. Networking has a long, sordid history of trying to guess what's happening on machines. If you're there [on the server], you get access to these rich semantics in the edge that you've never had before. What addresses are being listened to? What users are connecting to the machines? The level of visibility you have is like a networking [professional's] dream."
These realizations caused Casado and his team to re-evaluate and take a different approach. "We had this aha moment," he said.
Nicira would use OpenFlow for network virtualization, but it would shift its focus from hardware to software control. It would control virtual switches. To Casado, this made perfect sense. After all, packet forwarding is not the problem in today's networks. Legacy networking is still extremely good at moving packets to the right destinations. It's all the policy and operational layers on top of traditional networking that cause problems and slow down operations. Specifically, the implementation of access control lists (ACLs), VLANs, network isolation, billing and accounting were once functions that networking professionals were able to set and forget in a static environment. When server virtualization accelerated the provisioning of new compute workloads and enabled virtual machine mobility, suddenly manual processes became unwieldy.
Casado figured these operational headaches didn't need to take place on the physical network hardware, but could instead be moved to virtual switches that could easily be controlled in software. That's how Nicira's Network Virtualization Platform was born, and that's why virtual network overlays have become such a hot topic alongside SDN.
The problem with direct control of OpenFlow hardware
Still, many vendors and network practitioners are interested in implementing OpenFlow hardware to enable network virtualization in the data center. But there are a couple reasons why it won't work, Casado said. The first obstacle is the network vendor ecosystem. "You're asking switch vendors to put OpenFlow in their switches, and there is not an enormous amount of incentive for them to do that, because in some way you're divesting them of value," he said. "I wrote the first OpenFlow protocols in 2007, and since then people have announced stuff, but you only have a couple of useful OpenFlow switches. Anyone who has a useful OpenFlow switch also has a controller, and I'm certain they use their controller and their switches in a way that binds them together so they can maintain control [over a customer's environment.] As far as creating an active community here, it's just too difficult because of business relationships."
Plenty of network vendors have enabled OpenFlow on their switches, so what does Casado mean by "useful OpenFlow switches"? Most vendors have not built switches with enough general-purpose forwarding table capacity to be truly useful in a data center, he explained. In a typical switch ASIC [application-specific integrated circuit], there is an ACL table, a Layer 2 table, a Layer 3 table, "all these special-purpose tables," he said. None of those tables can handle data center-class OpenFlow.
"OpenFlow says the world should like this: You have this table that has an 11-tuple look-up, which is this super-general thing, and you have a whole bunch of them," Casado said. "In order to get the OpenFlow checkbox, a lot of vendors will simply overload one of these tables, which will have maybe 5,000 entries. And they try to shoehorn OpenFlow there. These chips were actually not made to do that. OpenFlow is still trying to adjust to this, but it's going to be a very difficult thing."
The flow forwarding tables available on most OpenFlow switches today are fine for research and experimentation, Casado said. They work for traffic engineering, too. "But the amount of flows and traffic that happens in the data center means that you have to do something like Layer 3," he said. "OpenFlow will not work for building the forwarding fabric for switches in the data center."
Is there room for OpenFlow hardware in the Nicira-VMware solution?
Does this mean that Nicira and its parent VMware are content to focus solely on software? Casado says there are three areas where his technology will need to interface with hardware, and this will require something other than standard OpenFlow. "The first one is QoS [quality of service]," he said. "More queues is better. The more you have in hardware, the more layers of QoS you can provide to customers. If I have eight queues, I can only provide eight SLAs [service-level agreements]. If I have a million queues, I can do an SLA per tenant."
QoS and similar hardware-based features will require a simpler model for Ethernet Operations, Administration and Management, or OAM, so that Nicira and other technologies can troubleshoot and debug these capabilities across both physical and virtual workloads.
Network virtualization technology will also need to interface with top-of-rack switches for legacy workloads that have not been virtualized. "You need to control the top-of-rack in order to get those physical workloads incorporated into virtual networks and that requires an OpenFlow-like interface," Casado said.
Finally, network virtualization controllers need to interface with network appliances (firewalls, application delivery controllers, and so forth), and the "OpenFlow-like" interface will be needed there as well.
"I think OpenFlow is too low-level for this," Casado said. "So, we've been proposing a new one: OVSdb-config. It's what we use to manage the Open vSwitch along with OpenFlow. It allows us to manage higher-level state. That's what we're hoping people will use for these things, but it doesn't really matter."
Why doesn't it matter? Any protocol will do, as long as it's open, encourages innovation, and gets the job done, Casado says.
Let us know what you think about the story; email Shamus McGillicuddy, News Director.