Facebook revealed an internally developed operating system and a hardware design for a bare-metal, top-of-rack...
data center switch that it plans to submit to the Open Compute Project.
The Facebook switch is based on a modular hardware design, code-named Wedge, and a Linux-based switch operating system called FBOSS.
FBOSS is a Linux distribution with multiple modules for protocol handling and control logic, configuration management, statistics packages and environmental management. There is also an application programming interface (API) layer in JBOSS to allow it to communicate with firmware on a switch's networking chipset.
For the most part, the Wedge design is for a basic top-of-rack switch, with a merchant application-specific integrated chip (ASIC), 16 40 Gigabit Ethernet (GbE) ports, dual power supplies and fans in a 1 rack-unit enclosure. The design can be expanded to 32 ports. Wedge will be truly modular. Customers will have the option of swapping out components like an embedded server component or the ASIC for alternative silicon, for instance.
Where Wedge differs from other switches is the presence of an Open Compute "Group Hug" micro server board next to the ASIC. The server board runs FBOSS.
"[The Group Hug chip] turns this switch into another server," said Jay Parikh, vice president of infrastructure engineering at Facebook. By running FBOSS on a micro server chip, Facebook can integrate the management and operation of the switch into its broader data center operation, he said.
"[FBOSS] was designed to allow us to leverage the software libraries and systems we currently use for managing our server fleet, including initial turn-up and decommissioning, upgrades and downgrades, and draining and undraining," Facebook said in a related blog post. "By controlling the programming of the switch hardware, we can implement our own forwarding software much faster. [W]e can also leverage existing Facebook tools for environmental monitoring that give us insight into the systems' performance, like cooling fan behavior, internal temperatures and voltage levels."
It's unclear why Wedge needs a micro server to run FBOSS. Cumulus Networks' own Linux-based network operating systems run on standard, bare-metal switch architecture. Despite the absence of a server processor, data center operators can run a Cumulus switch with the Linux management tools they use for servers. Other vendors have played with running server chips inside of switches, but for very different reasons. Pluribus Networks, for instance, sells a Freedom "Server-Switch," a switch with an on-board Intel Xeon server processor. Pluribus primarily uses the Xeon processor to embed applications and Layer 4-7 network services.
Facebook is testing FBOSS and Wedge internally, Parikh said, adding that the social media giant will formally contribute both to Open Compute at a yet-to-be-determined time.
While the modular software and hardware stacks are an impressive achievement, Facebook still has a lot of work to do, said Christian Renaud, senior analyst for 451 Research.
"The true test of [Wedge and FBOSS] will be how well it scales in production, and to the degree that scalability is dependent on internally-developed and tethered code within Facebook -- analogous to service provider OSS/BSS systems," he said.
In some respects, building the software and hardware stack of a switch is the easy part, Renaud said. The hard work begins when the switch is in production. That's when "you discover the flaws in your design, or traffic fundamentally changes in ways you didn't anticipate in your hardware design."
"I think this is great for innovation and changing the way we're doing things," said Andre Kindness, principal analyst with Forrester Research. "But network equipment like top-of-rack switches are meant to fit in a lot of different environments and carry a lot of different protocols."
To that end, Kindness said he believes Wedge and FBOSS are designed specifically for Facebook environments. Other data centers will not be able to take the technology and simply drop it into their own environments, he said. For instance, Facebook probably doesn't run security features like ARP protection because its network environment is "clean and vanilla," he said. "But other environments will need that and you don't know if you can turn all that stuff on and still have it run at line rate."
Facebook's switch concept comes as Open Compute sorts through a broad array of code contributions from Cumulus and Big Switch Networks and hardware specifications from Broadcom, Intel, Mellanox and Accton. It's unclear how all these contributions will coalesce, Kindness said.
Regardless of how Open Compute assesses various networking contributions, Facebook clearly wants to encourage the ecosystem of silicon and systems makers to innovate on top of Wedge and FBOSS.
"It's like Android," Kindness said. "They're putting that out there and seeing what people build on top of it."