Problem solve Get help with specific problems with your technologies, process and projects.

Metro network complexity: Time to cut the Gordian knot?

Service providers have built multi-service metro networks with high-functioning equipment, but metro networks are now complex and expensive to operate. This IDC report looks at why carriers need to consider alternatives to get the service velocity they need at the network edge ,yet have open minds on how to deploy intelligence closer to the edge without adding complexity and cost.

Note: The Gordian knot is a legend associated with Alexander the Great. It is often used as a metaphor for an intractable problem solved by a bold stroke.

In 2008, the worldwide Carrier Ethernet switching and routing equipment market became a $5.4 billion market. IDC is predicting that by the end of 2012, the market will grow to $7.5 billion.

Clearly, there is ample opportunity for incumbent vendors as well as new vendors entering the market to take advantage of this growth. Service providers are beginning to see success in rolling out IP services, whether they are wireline providers competing for television services or cable operators adding VoIP and streaming media to their existing high-speed Internet offerings.

Building on this success, service providers now need to scale their IP services, which are often media-rich applications that are bandwidth hungry and require stringent guarantees for that bandwidth. At the same time, they must increase the speed of offering these services while reducing the cost of operating the overall network.

Metro networks are now complex, expensive to operate and don't deliver the service velocity providers need.
Eve GrilichesProgram Director, IDC

To achieve the service quality needed to deliver media-rich applications, service providers have had to compromise their original infrastructure goals of building simple and cheap metro Ethernet edge/aggregation networks. Instead, they have built multi-service metro networks with high-functioning equipment. Adopting this approach has gotten the job done, but metro networks are now complex, expensive to operate and don't deliver the service velocity providers need. Service providers need to consider a few alternative solutions.

  • Rethink how multi-service networks are built. Instead of incrementally adding service delivery features to expensive equipment, perhaps the time is right to extract the critical service management features into a purpose-designed session layer. This would have two benefits:
  • First, by simplifying the requirements on routers and switches, service providers would have more options to reduce the cost of packet transport.
  • Second, it would encourage innovation and discussion on how best to deliver critical service management functions. It might also give rise to a new category of product, which could deliver the dynamic, adaptive session-by-session Quality of Experience (QoE) required to support a world rich in media-heavy IP services.
  • Break the current paradigm. This will require a new approach, as well as a new vision of how to manage the metro edge. We believe that vendors and service providers open to new approaches will become more competitive and will be able to deliver services with velocity and quality as never before.

Examining the carrier edge

The past 10 years have seen increasing pressure on service providers to cost-reduce their networks. Their top line has been threatened by the accelerating decline in the traditional voice market, and their bottom line has been challenged because the price of bandwidth has been declining faster than the cost to produce that bandwidth.

In response, service providers came up with a two-part plan. First, increase bandwidth and reduce costs by rebuilding the metro network with Ethernet to take advantage of Ethernet's lower cost base. And second, introduce new IP services over this high-speed metro Ethernet network, generating more than enough revenue to fill the gap left by declining legacy voice revenue. Because these new services are bandwidth-hungry video services (multi-channel television, Video on Demand (VoD) and other media-rich services, such as interactive gaming and video conferencing), they require an entirely new network.

This looked like a good approach, but it turned out that the twin elements of new revenue from new services and bandwidth cost reduction are more difficult to put together. The basic problem is the nature of these new services.

Media-rich services are extremely sensitive to packet loss, jitter and latency. Many of them need constant bandwidth. And, most important, user satisfaction with these services is sensitive to all of these factors. In a video network, each and every video session must be delivered with well-defined bandwidth determined at session setup, and with zero packet loss. In a gaming session, there is less need for constant bandwidth, but it is increasingly important that latency and jitter be minimized.

In addition, the way consumers use IP services has begun to change. IP services were initially "source" driven, offered by service providers, and pushed to consumers. A rapid shift is occurring, however. As the number and variety of IP services are expanding, the focus is shifting from the provider delivering these services to the end user "pulling" them -- choosing and invoking different applications, on-demand, and depending on the topicality and content offered. In a world of pull-based, unicast applications, it is difficult or impossible to predict how much bandwidth users will require, much less where and when.

This problem is actually compounded now because of the behavior of new software applications that measure and take every bit of bandwidth available. It takes just a few individual users running applications like Move Networks' HD Adaptive Streaming, or any peer-to-peer application, to consume all the excess bandwidth in the network, affecting everyone's performance. To date, no satisfactory solution has been implemented that can evenly provide bandwidth across concurrent users.

The previous solution almost every provider in the world deployed to deal with this problem was to over-provision the network. In today's world, this strategy simply won't work. Over-provisioning is not practical because users choose for themselves from a wide range of bandwidth-intensive applications, and many of these applications take much of the available bandwidth. Because today's applications are media rich and quality sensitive, the degree of over-provisioning would need to rise considerably to deliver QoE, otherwise video traffic could easily get stuck behind a burst of gaming or peer-to-peer traffic.

Analyzing metro Ethernet networks today

The factors discussed above have had considerable impact on how service providers build multi-service metro networks. Instead of being able to deploy simple, inexpensive Ethernet switches, service providers have been forced to select much higher-functioning networking gear. Though the problems of multi-service metro networks are nothing like those of the Internet backbone, many service providers have felt forced to deploy the gear originally designed for the Internet core simply because the sophistication is there to successfully support multi-play services.

As service providers achieve penetration and success with their multi-play services, the strains between their original vision of a simple, cost-effective metro network and the network they actually built have become more apparent. Because the price of bandwidth continues to fall faster than Moore's law, the high cost of metro networks remains a critical issue for service providers.

In addition to the capital expenditure outlay, operational expenditures in today's multi-service networks are also an issue. The most commonly used architectural approach for metro networks is a combination of MPLS and DiffServ quality of service (QoS).

Each service is engineered into the network via a web of MPLS tunnels, and each tunnel is sized to the maximum expected load for each service from each subnet. Each tunnel is then carefully configured onto the network node by node and link by link to ensure that each link has the requisite capacity, with the DiffServ bits used to prioritize services against each other.

This approach is somewhat complex to configure and provision because it requires link-by-link engineering of services and tunnels. Perhaps worse, it is also static. Simple network operations like adding access nodes and trunks are difficult because they require re-engineering of the tunnels; this technique also affects the service providers' ability to deliver new services quickly. Prior to rolling out a new service, there must be a careful estimate of bandwidth needed for each subnet; tunnels need to be engineered for the maximum expected bandwidth, and then those tunnels have to be configured into the network.

This approach has worked for providers as long as each service on the network has been allocated enough bandwidth per service. But what happens when there is a snow day and everyone is working from home? The network suddenly cannot handle the bandwidth requested for the service, and quality of experience goes right out the window.

The problem here is that the bandwidth for each of these services has already been configured, and there is no protection if a particular session within that tunnel needs more bandwidth for its application. There is little opportunity to manually adjust, in real time, to optimize for this. So ultimately, a single user or time of day or event can affect the daily bandwidth of these applications with no real ability of the provider to make adjustments for the impending congestion, which often materializes.

Looking at the network or policy management approach

Another approach to the problem of managing services is through network management or policy management. Many vendors have supported this approach, but not for large networks where network state needs to be "sensed" and services should be configured in real time. The reason is that services traverse the network in multiple directions, not just from provider to consumer anymore.

Consumers are rapidly moving from accepting the "push" sourced model of consuming services to a "pull" model. The centralized management or policy administrator simply does not scale to handle the constant pings and requests from all the network elements. If the policy manager runs the network, it still has no real-time knowledge of bandwidth changes and new requirements within the network.

If you want the network to be aware of these changes, you have to be able to configure the network in real time. In fact, most network managers are understandably reluctant to have a policy manager constantly changing the state of key network elements for fear of destabilizing the entire network.

Also, the policy manager does not really scale as subscribers are added and changes are reflected in the network. Large networks in general have problems with real-time data changes, and what has been optimized for today will be out of date tomorrow.

Rethinking the problem

Let's rethink the crux of the problem. Media-rich applications need session-by-session service-specific bandwidth and QoS. Embedding service-aware functionality in switches and routers does not really solve the problem. Switches and routers are designed to forward packets, hop by hop. They don't provide full-function session management capabilities. The session management features the switches come equipped with have been added incrementally, without the whole problem really being thought through.

Embedding service awareness makes them more complex and expensive, and dilutes what they do best -- forwarding packets from origin to destination. In an ideal world, maybe the right place to start would be by separating the delivery of services from the transport of packets. If we started like this, then:

  • The transport layer would focus on packet delivery and no longer be service aware. Service providers could "flatten" the transport layer and eliminate protocols and complex configurations. Service providers would then be able to purchase hardware optimized for price/performance metrics in a cost-effective manner. This would actually stimulate investment and innovation as well as new approaches to the transport layer.
  • We could extract service creation and delivery to a separate session layer architected from the ground up rather than as a series of incremental band-aids forced on current equipment. The right session layer would no longer be link by link (which is how transport equipment functions today) but would be dynamic, with no preconfigured tunnels, and it would function end to end to secure bandwidth and QoS enforcement.

Perhaps most important is the ability to create and deliver new services quickly. To do this, the network must be flexible to the needs of the actual services. The session layer would not provide packet transport but would provide session processing of all kinds: session initiation, management of the quality of service, and scaling these services to an increasing number of subscribers. With session management, network occupancy levels can rise while simultaneously preserving quality of experience.

This introduces sophisticated congestion management to the network, a way to reduce calls, ratchet back errant usage, and provide "fair usage" to the huge number of subscribers to ensure they get what they've paid for. If the lower layers handle the transport and traffic management, then simple and cost-effective processing power can be used for session management. Interestingly, it almost sounds like a job for an off-the-shelf general-purpose processing computer or server.

Session management: How it might work

A new and intriguing way to potentially solve the problem is to provide a session-by-session-based management approach. Session-by-session service delivery would address or consider the entire bandwidth in the network and allocate that bandwidth based on policy and individual customer and service profiles -- assuming it is prioritized. Yet it could still allocate leftover bandwidth to lower-level customers, so that no consumer is ever cut off.

Related information

What's all this fuss about telecom carrier capex?

Carrier Ethernet, metro optical lead telecom industry trends

Alcatel-Lucent adds Carrier Ethernet services framework

Virtualization and telecom networks: A fundamental change is coming

In essence, there is a single point of management for each session rather than centralized management for all sessions. This actually decouples and extracts the service from the transport layer, allowing services to be delivered faster and cheaper and right in the data path. Video on Demand sessions can be admitted to the network based on actual network usage and availability. To maximize utilization, dual-rate scheduling is applied so that if the Video on Demand session cannot be initiated at that time owing to lack of bandwidth, a time when it can be initiated will be communicated to the customer. Also, selective suspension or suppression of individual sessions is possible so that ongoing authorized sessions are not affected.

Ultimately, a session-by-session approach enables dynamic traffic patterns to avoid congestion in real time, which is exactly what every provider is looking for. This ensures that the network has a much higher utilization rate without over-provisioning. It also enables the idea of fair usage in which everyone on the network gets equal bandwidth based on their service profile. This involves no new protocols and no complex software, and it helps simplify and bring significant cost reduction to the carrier edge.

Applications for session management

Perhaps the most intriguing application for session management is implementation of the "fair usage model." This is applicable for any MSO network as well as any wireless network, which will be required to manage the increasing data traffic. As we all know, it is often only 5% of consumers who hog the bandwidth, often doing it with P2P upstream loads, which inherently reduce downstream capacity. Attempts at throttling have succeeded but have also resulted in customer churn and FCC issues.

This approach offers a way for the provider to monetize new applications and ensure their delivery, all on the same network. This is a huge issue today for MSOs and will become one for any of the wireless operators, especially with the proliferation of the iPhone, mobile maps and session-based one-to-one mobile gaming applications.

IDC's Wireless Infrastructure service estimates that about 25% of all cell sites today account for well over half of the traffic. That means the heavy concentration in urban sites is putting huge demands on cell site capacity, which, of course, is physically limited. A big benefit of this fair-usage model is that it can actually be deployed for upstream and downstream sessions, so that as traffic becomes less and less predictable, the fair-usage model provides even and smooth coverage at much lower cost points and keeps customers satisfied with transmission in both directions.

For mobile operators just building out their backhaul networks, the cost to deploy these new networks is huge, and it is not clear, with the flat-rate services to date, whether the incoming ARPU will cover the capital and operational expenditures and provide reasonable payback in the near term.

In addition, P2P traffic has begun to increase on mobile networks, so the P2P users clearly are crowding out other mobile subscribers, leaving them with little to no bandwidth and, often, dropped calls. And what level of video quality will really be possible on the mobile network when the bandwidth from each cell site has clear limitations?

Here again, a network that can guarantee service quality session-by-session would enable providers to monetize the services. This approach would ensure that the limited bandwidth allocated to each cell site was being fully utilized for the priority customers and would still address the other customers with fair amounts of working bandwidth. This example enables the mobile operator to increase revenue, as well as cost-contain the network.

Outlook for metro networks

Service providers will need to rethink their approach to metro networks to simplify and speed service delivery and to cut costs. We believe traditional approaches work to some extent but ultimately do not and will not scale. Bandwidth usage has changed, becoming much more dynamic, which requires a shift in thinking about how to solve the congestion problem, as well as how to implement and fix the problem. Separating service delivery into separate session layers will benefit operators in several ways:

  • By simplifying the functionality needed in the transport network, providers could turn to less expensive solutions.
  • By flattening the network, providers reduce operational expenses.
  • Providers can deliver a better experience by protecting QoS session-by-session and run their network "hotter" by supporting more subscribers on the network.
  • Service creation can be simplified and rolled out faster, with less overhead and time spent on engineering studies of the network.
  • A focus on service creation and delivery could enable more sophisticated services over a wider footprint, irrespective of the network elements deployed.

We are intrigued by new approaches and believe some of them have a disruptive opportunity in the growing carrier edge market to help providers deliver services faster with the quality they deserve at much lower costs. Sometimes it's hard to assume that a major network can be deployed in a radically different way, but tough economic times do bring some of the best ideas to market.

This also reflects the shift from proprietary networks to more standards-based hardware, which leverages chip enhancements and declining cost structures. We expect service provider networks to look more and more like IT networks and large data centers, with standard processing equipment in clusters or grids, with smart software and advanced algorithms running the network.

Essential guidance

Today, we do need to think differently and have open visions on how to deploy intelligence closer to the edge without adding complexity and cost. Life in telecom is sinusoidal -- the edge used to be somewhat simple and easy to deploy. But life has provided us extensive applications and shifted the power to the users, enhancing our work and home life. Change must occur in carrier networks to meet this demand, and change is always hard. We know the edge has become the problem again. This time, let's solve it quickly and painlessly with a new low-cost approach.

The answer does seem to be clear -- start by separating the service from the transport. If this can be done, a lower-cost network can be deployed, and faster, richer service creation and delivery will be possible. This also opens up the opportunity for major innovation at both the service and transport levels.

As discussed, vendors that can break the paradigm of simple and cost-effective versus intelligent and high-cost will deliver true differentiation and innovation and will enjoy a competitive advantage. The ideas laid out in this study give some clues as to how this might be done.

About the author:
Eve Griliches is a program director within IDC's Telecommunications Equipment group. She provides in-depth insight and analysis on service provider routers and switches, as well as the optical networking market. Griliches also provides critical business intelligence on emerging technology trends and their impact on the overall telecom market space. She joined IDC in 2005, after 10 years in product management for a number of network equipment vendors.

This was last published in March 2009

Dig Deeper on Telecommunication networking

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.