BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
At a certain scale, it just makes sense to deploy a couple large switches as a data center fabric. For instance,...
assume a fabric requires 140 physical servers, each supporting a modest 10 virtual servers -- or containers -- and each requiring dual homed connections for resilience. These requirements might suggest 280 10 Gigabit Ethernet (GbE) ports, which is a port count a number of large chassis switches can easily support.
Deploying two data center switches provided by a single vendor seems, on the surface, to be a very simple option. The vendor's platform is going to provide a strong suite of vertically integrated services, such as Layer 2 overlays and link aggregation. There will only be two network device types to manage, as well.
This means a smaller chance of having more than one version of code, or one set of command-line interface commands, to master. In fact, there is likely a network device at this scale that can be reduced to a GUI and a wizard -- a simple-to-automate, vendor-driven platform designed to be administered through the occasional visit from a consultant or vendor rep.
What would it take to replicate this two-switch design using a spine-and-leaf fabric built out of 1 RU white boxes? If you assume each top of rack (ToR) can support 24x10 GbE ports down, and 6x40 GbE ports up, you are going to need 12 ToR switches in your fabric. The easiest spine configuration is going to be six of the same switches, each configured with 12x40 GbE ports, for a total of 72x40 GbE ports in the spine, as shown below.
Software will need to be considered in addition to the hardware. The first question a designer is likely to ask is, which of these two options is going to be less expensive? For the sake of argument, assume both costs are about the same over a five- or 10-year period, so the issue of cost cannot be used as a real differentiator.
Or, maybe the desire is for one throat to choke? This seems to be a primary draw for most companies; if something fails, the vendor, who -- in theory -- has a lot more resources, can both send the right people on site to fix the problem and take the blame. There is, of course, another name for one neck to choke: If the vendor fails to support you, your one neck to choke becomes a single point of failure.
On what basis, then, can you decide which of these approaches is correct? Perhaps the best way to decide is to return to the basics. Note the specifications laid out above -- port count and speeds and feeds. These are the kinds of requirements engineers tend to ask, because they are the kinds of questions with which engineers tend to be comfortable.
What has not been asked is anything relating to the business. And that's one of the most important questions to answer. Here are some to consider.
What is the growth pattern for the business?
If the business tends to have increasingly larger requirements, then buying two large chassis switches means buying two network device types large enough to handle the growth projected, against the projected life of the purchased equipment. The chassis must be able to support enough blade slots and traffic to handle the largest load anticipated before the equipment is aged out. The two-large-switch model is based on scaling the network up, which means increasing the capability of a small set of large devices over time to keep up with requirements.
A spine-and-leaf design built out of a single kind of device, on the other hand, can be designed in a way that permits growth by scaling the network out. In the scale-out model, more components are added in parallel to increase capacity; the system is made larger, rather than the density or performance of the individual devices. A well-designed spine and leaf can be scaled up in a number of ways -- for instance, by increasing the speed of the fabric, by increasing the number of spine switches or by moving from a simple three-stage spine and leaf to a more complex five-stage using a Benes or butterfly topology.
Scale-out models tend to support variable workloads at a lower cost, and higher efficiency, than scale-up models.
What is the replacement model for the system?
No system is truly future-proof. Equipment and software eventually reaches end of life; vendors replace older, outdated architectures with new ones in order to offer new features.
In the case of a two-network-device-types switch data center fabric, replacing the system means building a new fabric and moving the workload when the equipment reaches end of life, or taking a downtime hit to replace the two switches. If one device fails at some point close to the end-of-life cycle for the switches, the cost of replacing the failed switch with a new one of the same model, or replacing the entire system, must be weighed out and considered.
A more disaggregated option, using a larger number of smaller devices, will require more ownership on the part of the business, but this also means the components of the network can be replaced over time through a natural cycle.
What features are required of the network?
When the application is owned by the business, or the business is willing to push back on the application developer to make the interaction between the application and the network a two-way street, the network needs to support fewer features. In an environment where the network is focused on doing a smaller set of things very well, a more white-box-oriented approach is going to more easily meet short-term needs, while providing the flexibility for future network requirements.
Remember: There is a direct tradeoff between feature requirements, complexity and flexibility; adding more features will always add more complexity and reduce longer-term flexibility.
The bottom line is this: Whether or not the two-switch option will be the best idea for your data center fabric is going to depend on a lot of different things -- the most important of which is understanding the business requirements to which you're designing. Either choice can be valid -- there is much more here to consider than how many hosts, servers and other network device types you need to connect to the fabric.