Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Mulling network device types: Two switches better than many?

Using two large chassis switches instead of multiple ToR devices might be a good alternative for data center performance when deploying network device types.

At a certain scale, it just makes sense to deploy a couple large switches as a data center fabric. For instance,...

assume a fabric requires 140 physical servers, each supporting a modest 10 virtual servers -- or containers -- and each requiring dual homed connections for resilience. These requirements might suggest 280 10 Gigabit Ethernet (GbE) ports, which is a port count a number of large chassis switches can easily support.

Deploying two data center switches provided by a single vendor seems, on the surface, to be a very simple option. The vendor's platform is going to provide a strong suite of vertically integrated services, such as Layer 2 overlays and link aggregation. There will only be two network device types to manage, as well.

This means a smaller chance of having more than one version of code, or one set of command-line interface commands, to master. In fact, there is likely a network device at this scale that can be reduced to a GUI and a wizard -- a simple-to-automate, vendor-driven platform designed to be administered through the occasional visit from a consultant or vendor rep.

What would it take to replicate this two-switch design using a spine-and-leaf fabric built out of 1 RU white boxes? If you assume each top of rack (ToR) can support 24x10 GbE ports down, and 6x40 GbE ports up, you are going to need 12 ToR switches in your fabric. The easiest spine configuration is going to be six of the same switches, each configured with 12x40 GbE ports, for a total of 72x40 GbE ports in the spine, as shown below.

Leaf-spine approach requires many more ToR switches
A spine-and-leaf approach would require 12 ToR switches as opposed to using only two large chassis switches.

Software will need to be considered in addition to the hardware. The first question a designer is likely to ask is, which of these two options is going to be less expensive? For the sake of argument, assume both costs are about the same over a five- or 10-year period, so the issue of cost cannot be used as a real differentiator.

Or, maybe the desire is for one throat to choke? This seems to be a primary draw for most companies; if something fails, the vendor, who -- in theory -- has a lot more resources, can both send the right people on site to fix the problem and take the blame. There is, of course, another name for one neck to choke: If the vendor fails to support you, your one neck to choke becomes a single point of failure.

On what basis, then, can you decide which of these approaches is correct? Perhaps the best way to decide is to return to the basics. Note the specifications laid out above -- port count and speeds and feeds. These are the kinds of requirements engineers tend to ask, because they are the kinds of questions with which engineers tend to be comfortable.

What has not been asked is anything relating to the business. And that's one of the most important questions to answer. Here are some to consider.

What is the growth pattern for the business?

If the business tends to have increasingly larger requirements, then buying two large chassis switches means buying two network device types large enough to handle the growth projected, against the projected life of the purchased equipment. The chassis must be able to support enough blade slots and traffic to handle the largest load anticipated before the equipment is aged out. The two-large-switch model is based on scaling the network up, which means increasing the capability of a small set of large devices over time to keep up with requirements.

A spine-and-leaf design built out of a single kind of device, on the other hand, can be designed in a way that permits growth by scaling the network out. In the scale-out model, more components are added in parallel to increase capacity; the system is made larger, rather than the density or performance of the individual devices. A well-designed spine and leaf can be scaled up in a number of ways -- for instance, by increasing the speed of the fabric, by increasing the number of spine switches or by moving from a simple three-stage spine and leaf to a more complex five-stage using a Benes or butterfly topology.

Scale-out models tend to support variable workloads at a lower cost, and higher efficiency, than scale-up models.

What is the replacement model for the system?

No system is truly future-proof. Equipment and software eventually reaches end of life; vendors replace older, outdated architectures with new ones in order to offer new features.

In the case of a two-network-device-types switch data center fabric, replacing the system means building a new fabric and moving the workload when the equipment reaches end of life, or taking a downtime hit to replace the two switches. If one device fails at some point close to the end-of-life cycle for the switches, the cost of replacing the failed switch with a new one of the same model, or replacing the entire system, must be weighed out and considered.

A more disaggregated option, using a larger number of smaller devices, will require more ownership on the part of the business, but this also means the components of the network can be replaced over time through a natural cycle.

What features are required of the network?

When the application is owned by the business, or the business is willing to push back on the application developer to make the interaction between the application and the network a two-way street, the network needs to support fewer features. In an environment where the network is focused on doing a smaller set of things very well, a more white-box-oriented approach is going to more easily meet short-term needs, while providing the flexibility for future network requirements.

Remember: There is a direct tradeoff between feature requirements, complexity and flexibility; adding more features will always add more complexity and reduce longer-term flexibility.

The bottom line is this: Whether or not the two-switch option will be the best idea for your data center fabric is going to depend on a lot of different things -- the most important of which is understanding the business requirements to which you're designing. Either choice can be valid -- there is much more here to consider than how many hosts, servers and other network device types you need to connect to the fabric.

This was last published in December 2017

Dig Deeper on Data Center Network Infrastructure

Join the conversation

2 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Would a two-switch approach work in your data center? Why or why not?
Cancel
Hi Russ,

Thanks for sharing inputs. I have asked the similar question to an OEM team a month back while planning for a large scale data center fabric but they simply couldn't answer basic things. Here are few more things I would like to hear your thoughts about :

- Outside Webscales and Established Cloud Players, I still don't find a large market comprising customers that really knows :
+ How their Applications really work
+ Technical details behind the scenes from Apps perspective
+ Real technical requirements from network other than what App vendors tells them to do otherwise
+ Realistic SLA requirements are usually missing or everyone in Business has different view about it or lot of assumptions of individuals based on whatever past experiences
+ Realistic Business growth projections...Let's admit as Humans we are pretty bad with that in reality.

Now from Technology standpoint without going too deep or moving this thread into direction of Fixed vs. Chassis platforms, Why we shouldn't be picking minimum 3 spines instead of 2 to begin with ?

+ In an event of 1 spine failure , the over-subscription will be reduced to half (Even if we had taken 3:1 initially). Now unless I really do the load or stress test (Which is usually not possible in most cases), It's really hard to predict how a Virtualized Data Center application stack would behave or what kind of issues would arise at application layer.

Usually the argument is increased cost for most part instead of technical grounds I have seen. Your Thoughts ?

Regards,
Deepak Arora
Evil CCIE
Cancel

-ADS BY GOOGLE

SearchSDN

SearchEnterpriseWAN

SearchUnifiedCommunications

SearchMobileComputing

SearchDataCenter

SearchITChannel

Close