When network architects build a network, they need to know how data will flow across the infrastructure. When big data analytics traverses the network, that knowledge can be elusive.
Network architects have always been good at optimizing infrastructure for predictable data flows. Client-server applications demanded north-south flows from the server to the user, so architects built tiered network designs with spanning tree. In the age of cloud and virtualization, architectures such as
"One of the big systems challenges with this complex data is moving it and having it where you need it to do the analytics," said Laura Haas, technology and operations director at IBM's Accelerated Discovery Laboratory (ADLab), a lab that hosts customers' big data applications. "Our goal is to find things we didn't know before; to be making discoveries from this data. You don't know in advance where the communications patterns will be; where the data will need to be. You can't necessarily place it in exactly the right location."
For this reason, Haas believes low-latency, single-hop, point-to-point connectivity is critical to big data networking. IBM has installed QFabric from Juniper Networks in ADLab to address these networking requirements, she said.
The nature of big data networking traffic
In most networks, applications and services will follow either a north-south or east-west pattern, and those flows persist, said Dhritiman Dasgupta, senior director of product marketing at Sunnyvale, Calif.-based Juniper. An architect can build a network with confidence that it will serve the requirements of a data center for years. With big data, the network will see bursts of north-south and east-west traffic that is harder to predict.
With big data, "you see a lot of rapid microbursts. A lot of the data is being processed and brought back to a certain node. And from that point, there might be another burst and traffic is sprayed out to dozens or hundreds of nodes. And then it comes back again," Dasgupta said.
Big data nodes work together in a cluster to run distributed algorithms. Then they burst traffic out to other nodes that analyze the data they dig up. Engineers can't anticipate which nodes will burst and which nodes will receive those bursts.
"For the network, that means you've got to have high buffers to be able to sustain microbursts as traffic comes in and out of the different nodes," Dasgupta said. "A lot of these [big data] applications are extremely sensitive to latency -- not just latency, but the unpredictability of the latency. What you're looking for from a network is the ability to place nodes in an independent way. The physical locations of the server can make a difference to the overall performance of a big data application."
Dasgupta said the any-to-any connectivity offered by QFabric, which Juniper describes as a single-tier network architecture, serves these big data requirements well. QFabric operates like a modular switch that has been broken out into its multiple devices. Each top-of-rack QFX switch functions as a line card, uplinking into a QFabric Interconnect chassis that operates as a modular switch blackplane.
"Any server is always exactly one hop away from other nodes in the network with QFabric," he said.
While IBM is still learning how to support the ever-evolving needs of big data in ADLab, Haas said one critical lesson has become extremely obvious: Engineers need a network that can move data from one cluster of nodes to another quickly and reliably.
"We're still very much learning [about big data]," she said. "We [have], at this point, only begun to tap the potential of it. We have clients who are doing some very cool projects, but they are mostly playing with more manageable data slices. We have learned that the point-to-point connectivity that QFabric offers is very critical for us because of the difficulty of planning [traffic flows]. We're looking for those unanticipated discoveries. Often there is a leap -- a new insight -- that gets generated as you're working with the data and tools."