This content is part of the Essential Guide: Understanding the basics of bare-metal switches
News Stay informed about the latest enterprise technology news and product updates.

Cumulus Networks CEO: How Linux-based switches change network ops

Cumulus launched a Linux OS for bare-metal switches this week. We spoke to CEO J.R. Rivers about how this platform will affect network operations.

Cumulus Networks emerged from stealth mode this week with a version of Linux that it has developed to run as the network operating system on bare-metal switches. Cumulus Linux is tailored to appeal to Web companies, cloud providers and very large enterprises with dynamic data centers that are constrained by mainstream network technology.

With Cumulus Linux and a collection of supply chain, distribution and sales channel partners, the company hopes to become the Red Hat of networking by cracking open the vertically integrated switching market, much like Linux originally broke open the compute market with x86 servers. SearchNetworking spoke with Cumulus Networks CEO J.R. Rivers about how a Linux-based open switching platform will affect network operations in a data center.

You claim the Linux toolchain is ready to support networking. Can you elaborate?

J.R. Rivers: We've had a few customers come in and say, 'I want to use [Linux tools] for configuration management.' We made sure it installed and did appropriate things, and then we let customers go to town. More often than not, they were able to use their existing recipes they had built for their server environments pretty much intact on their network devices. They had to augment it some because things are a little bit different. But something like Chef is really about building out a server and making sure all of its packages are put together correctly. A lot of that is based on standard internal policy -- setting up your user authentication mechanisms correctly, setting up your IP table filter lists, tying back to your AAA servers, setting up your NTP servers. All of these are the same whether you are running on a networking device or a server.

The part that is slightly different from the server on the networking device is that you might be looking for a specific version of Quagga, or you might want to download a specific configuration file for the Linux bridge. But you'd be doing the same thing for an Apache Web server. So if you were running a Web server, you would put down the specific version of Apache and a certain configuration file. So operations are the same. You're just applying it on a different package.

On the monitoring side, a customer might wake up one day and say, 'Today is the day I set up a monitoring framework.' And next thing you know, they are [monitoring the network with] a whole bunch of tools that they were using on their servers that just work.

How does a bare-metal switch running Linux affect network operations?

Rivers: The other day I talked to [an executive at a] big bank and asked, 'How many servers do you have in your data centers?' He said he had 60,000 servers. I asked, 'How many server admins do you have?' Eighty-five. He absolutely knew these numbers crisp and cold -- nearly 1,000 servers per admin. Then I asked, 'What's the story on the networking side?' He stopped and smiled and said, 'I'm embarrassed. I don't know the answer, but I can tell you this: It's not anywhere as efficient.'

The deal is that networking equipment, as it's built today, is locked up and architected to make margin for suppliers. It is not architected to be easy to use for the customer.

[In the data centers of] one of our big customers, the systems based on our software account for 40% of their [infrastructure]. The other 60% is still rolling off incumbent [networking equipment]. Of the network operations team, 85% of the people are either manually maintaining or writing scripts and tools to maintain the incumbent equipment. The other 15% are maintaining and running tools to manage equipment using our software. As they continue to phase out [the incumbent equipment], they'll be able to take that 85% of that network operations team and deploy them off doing other things that are more important to the company, [things that] they've already identified.

Working with Linux like this on switches will require a lot of programming skills. Do you think network engineers are ready for that?

Rivers: Some number of network engineers I've worked with actually are reasonably good programmers. They've cut their teeth on screen scraping Cisco CLI and building Perl-based middleware. They do have some pretty good scripting skills. They just don't have platforms that enable them to do much with them. With that said, there is a much larger number who don't know how to do it.

There is a certain class of customers that get this immediately. So we're very focused on that set of customers because it's an easy area of adoption. The other set of customers -- big enterprises who have recognized the problem and their perspective is, 'My sys admins are phenomenally productive, and in some cases they know as much or sometimes more about networking than my network engineers do. I need to do something about that. I need a much more efficient operational environment and have a very functional skill set among my people.' The more forward-thinking ones we have worked with are going through the exercise of sending their network engineers back to scripting school or they are doing a proof of concept with us and making their network engineers write the scripts to get them over the hump. In many cases it's not like network engineers are dumb; they just haven't had the opportunity.

We have an extremely large customer where someone high up in the organization said we are going to change the way we do networking. They decided to buy hardware from the hardware supply chain and use Linux-based software systems to run their networks. They did an evaluation and chose us to work with. On that team they had a pretty rich set of people: application developers, some sys admins and a bunch of network admins. The network admins realized they had the cover to go ahead and learn all these things, so they [dove] in head first. It was utterly amazing to see what they did with the platform. They solved the problem they've been trying solve for almost a decade, and it ended up being pretty easy.

They wanted a very fine-grained perspective on the overall health of the network so application developers or people running applications over the infrastructure could look at issues in the application layer and see if there was any network event that might have caused it. So [the network engineers] spent some time, looked at the hardware and identified two specific metrics they wanted to capture at very fine-grained timing -- every 100 milliseconds. They took that back and had each networking device capture the data and push it back to a collector. Then they scripted a post-processor that ran across that every minute or so and did histograms and timelines. Then they published that as a self-help for people using the infrastructure. We heard the network engineers were just bouncing off the walls because they had always wanted to do this but just never had the ability. It's really exciting to see that kind of thing happen.

Cumulus Networks' initial launch focuses on top-of-rack switching. Are there plans for a modular platform? Do you need to evolve your Linux platform for an aggregation switch?

Rivers: It depends on your philosophy. In a traditional chassis you do. That's why people like Cisco or Arista build a lot of their software up in their proprietary user space application. A lot of that stems from the old expectations of what an aggregation layer switch does. The aggregation switch was not just a high bandwidth interconnect between things, but also served as a gateway in and out of the data center. It was a 'god box.' It was incredibly brittle, and that's where Cisco got lock-in on the network.

Now, a lot of the traffic is contained in the [data center], and that gateway in and out is not the aggregation platform anymore, but kind of off on the edge. The aggregation boxes are really just about high bandwidth. In our software implementation, we make each line card run a separate instance of the software. By doing that, we're able to use exactly the same software on the top-of-the-rack switch as we use on the chassis -- the same operating principles and everything.

Editor's note: Rivers added that Cumulus and its partner ecosystem plan to offer customers a modular aggregation switch soon.

Let us know what you think about the story; email Shamus McGillicuddy, news director.

Dig Deeper on Data Center Networking

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Initially when I read about cumulus networks releases two days back, I thought that the cumulus platform is meant for L2/L3 switch vendors.

It appears from this article that the end network operators are customers of this solution. If so, whose what HW platforms does cumulus support?

Is it fair to say that it is trying to emulate Redhat, but for the L2/L3 switches?

Some challenges I see are:
1. Drivers for various switch chip sets.
2. Convincing chip vendors to develop driver software on cumulus Linux.
3. How do various features of switch ASICs are exposed?

Further problem is: JR River says he's running the same software on each line card on a modular chassis? Then how does that become one switching system?