It's been three years since Facebook released the Open Compute Project (OCP), an initiative designed to share data center and server designs with the IT community. Since then, the OCP's mandate has branched out, and in 2013, OCP said it would entertain proposals for OS-agnostic switches. The goal: to bring the same openness and disaggregation of software from hardware to network devices as now exists in servers.
In recognition of that effort, OCP is this month's winner of SearchNetworking's Network Innovation Award. To get a sense of where OCP is with that initiative and others, SearchNetworking spoke with Omar Baldonado, Facebook's manager for infrastructure software engineering. Baldonado also serves as networking co-lead for OCP's project leads group.
What's the status of the Open Compute Project's hardware open source initiative?
Omar Baldonado: It's just over a year old, and over that year we've seen amazing progress. As of August there were a number of companies that contributed designs for our top-of-rack [ToR] switches. And when we say design, it's the whole concept of open source for hardware. This includes schematics, bill of materials, everything someone would need to build that switch. This is a level of openness on the hardware side that nobody has ever seen in networking.
The second notion is the whole idea of disaggregating the software from the hardware. You will also have the ability to load other software onto those switches. So, the entire idea of getting software from someone other than the vendor of the hardware is pretty exciting, too.
What's the overall goal for OCP in the months to come?
Baldonado: We are not trying to be a standards committee in terms of saying there is only one ToR switch and that will be the standard. We expect a lot of folks to contribute -- and they have -- designs to optimize different things. Some focus on modularity, some on costs, and that will be great. We are not necessarily pushing one standard; and it's the same on the software side. We are not going to dictate that it must be OpenFlow or centralized SDN [software-defined networking] protocols, nor would we say it must be a Layer 2 stack or a Layer 3 stack. But what we will support is that it must be open and you can load whatever software you want -- whether it is centralized or distributed.
What specifically are you looking for?
Baldonado: We aren't just accepting everything. We make sure to evaluate each proposal across a few different lines: Is it built for highly scalable, efficient operations for working in a data center? Can this work without a lot of human intervention? Are the designs open enough to take advantage of advances in the rest of the compute infrastructure? So, for example, if someone has a particular chip, we might ask, 'Can you expose some of the interfaces so we can plug in more extensible flash modules into it?'
We do push back. But once we feel the bar has been passed, and that is the same bar we use across server, rack and storage, then we can take that and accept that as a contribution to OCP.
When do you expect to release your findings about the ToR proposals?
Baldonado: A couple of those switches are in the final stages of review, and those results will be announced pretty soon. Others are earlier in the process; but they're in the pipeline too.
Looking back at the past year, have there been any surprises?
Baldonado: First, I was surprised by the number of people who were being open and contributing. This is different for the networking industry to say, 'Let's go ahead and share these designs,' so I was pleasantly surprised by how many people were willing to contribute and how many were excited about that. When you think of the history of networking, the last two or three decades, the history of IP networking, it was, 'Here is the switch and here is the entire design,' so this [openness] is fairly unusual.
I was also surprised by how much people wanted to quickly branch out. We initially focused on top of rack, because that is the easiest way to drop in new technologies and to evaluate new switches. But we've already had requests at the last workshop: 'Can we look at out-of-band management, or spine switches?' The number of different things that people want to discuss is also quite surprising. I think there has been a pent-up demand for openness in the networking community.
What about challenges?
Baldonado: I think one of the challenges we've looked at has evolved over the past year. It's on the software side. Initially, we said we don't want to get too much [into] software. Let's start with hardware, the lowest layer, and we deferred the software discussions. What we've seen is that there are many users in other communities, not just Facebook, who said we need to push more on the software side. So, that's sort of a shift in strategy we've been going through in the past few months. OK. We've gotten some of the hardware designs. They are in the pipeline. We know how to work those and how to give feedback to make them more open. Let's start focusing on software to enable more of a software ecosystem on top of the hardware.
What do you say to critics who allege open source software isn't robust enough to perform necessary functions?
Baldonado: Within OCP we are not actually saying that all the software has to be open source, but what we are saying is that you should have a choice. It should be disaggregated -- I can get this software or commercial software -- so it's separate from the hardware. So, I think there will be different packages for different customers. Some might prefer software from a commercial vendor; others will take from open source, and some will take a combination.
How does OCP's initiative differ itself from initiatives, such as network functions virtualization, that attempt to make networking more programmable?
Omar Baldonadomanager for infrastructure software engineering, Facebook
Baldonado: I do think that network functions virtualization [NFV] is sort of also pushing on this idea of how do I get more programmability in the network? How do I compose network elements as services better? How can I avoid being constrained to a closed hardware appliance as my most basic building block? So, in that way, we have some similar goals. But for OCP, we are focusing on this: Here is some hardware you can have as an open design, and here is software you can run on top of it. It's our expectation that this hardware would support NFV or other SDN technologies, but if somebody also wanted to develop their own distributed protocol stack and release that for traditional data center networking, we'd support that too.
Do you remain focused solely on data centers?
Baldonado: Generally, it started at the data center and the scale required by large operators. That remains a focus across OCP, but some of the technologies might be applicable elsewhere.
What about other areas of networking, such as the WAN or security?
Baldonado: Not yet. There is an expansion beyond ToR where we started, but there is a lot more data center work to examine. There is all the way up the aggregation layers, the management network, the out-of-band network. Potentially, there is the routed portion of the network that meets the edge of the data center, and I think a lot of that is where large Web operators like Facebook and Microsoft are engaged in getting the most efficiency.
Have the initial goals and objectives of OCP changed over the past year?
Baldonado: OCP as a whole is about bringing the concept of open source to the hardware community, so we opened up our rack and our storage and our data center power and cooling designs. When the networking project started, that had the same charter, but also specifically within the networking domain, it was about disaggregating the hardware from the software.
For example, on the server side, there were definitely quote, unquote 'white box servers' available, but Open Compute really tried to make them open in terms of the designs, layouts and schematics.
On the networking side, a few people have been working on white box switches, but really not in an open way. It's really about disaggregating the network software from the hardware.
When you look at your crystal ball, what do you see?
Baldonado: As more workloads companies embrace the notion of cloud computing and add scale and cloud data centers, you'll see this focus on how can we get as much efficiency and scalability out of our infrastructure. So, I think that networking is a critical part of that. A lot of the interest we see going forward in OpenStack and cloud providers, the network is sort of the underlying fabric and infrastructure for that. As that grows, as more cloud computing happens, the networking piece will be part of that.
How have objections changed regarding OCP's strategy?
Baldonado: I think a year ago there were objections: Why would you ever get software from anybody [other] than your hardware vendor? I think there have been a number of pieces of education and changes in mind set around this. If I have a server and I want to write software for it, I don't go and talk to the hardware vendors about it. If I bought it from Dell or HP, I don't say, 'Hey, Dell, how do I write server software?' Nor do I talk to the chip manufacturer. Some of that discussion has taken place, and I think people are understanding that that's the level of programmability that lets folks understand this more easily -- rather than trying to push the whole centralized SDN-controller model.
I also think that a lot of people, coming from the SDN side, are asking, 'What's the problem you are trying to solve?' A lot of the education has been around: We are really trying to solve the management plane and monitoring and troubleshooting, along with maybe getting more programmability on the control plane side. But to operate at scale, you need more support to make it more automated to make the management of the network easier. And I think people are coming around to that. If you are going to operate networks at this scale, you need better software to automate and address those problems.
For us -- not just Facebook but other folks within OCP -- [we] are saying that we want to manage those networking devices a lot like we manage our servers. So, at Facebook, we have hundreds of thousands of servers, and those networking devices stand out as exceptions because they don't look like servers and we can't manage them the same way. Making the network feel more software-like makes it feeling a lot more like [managing] servers.