It takes two to tango -- an old expression that gives insight into tackling application performance issues in today's large enterprises. The dance partners here are networks and applications, which spin complicated steps together. Your IT staff needs to find and solve application performance problems and predict the consequences of rolling out new applications. Myopic troubleshooting and planning procedures that examine just the network or just the application are like dancing a tango without a partner: time consuming and incomplete.
In other words, your network and application teams must view the enterprise IT infrastructure as a holistic entity, collaborating to address performance issues by accurately understanding network and application behaviors, assisted by state-of-the-art performance-management systems.
Thinking outside the network
The limitations of traditional performance management have become more apparent as enterprises consolidate data, voice and video applications onto single, multiservice networks. Network convergence streamlines operations and reduces costs, but also adds complexity as disparate applications compete for resources. Differences in latency tolerance, packet size, messaging architectures, and transport technologies -- between a Gigabit Ethernet local-area network (LAN) and a Frame Relay wide-area network (WAN), for example -- all affect application performance. Complicating things further are multitier applications deployed across several sets of servers.
Quality-of-service (QoS) technologies enable the network to allocate bandwidth and prioritize queuing, yet balancing these mechanisms for the optimal performance of every application is tricky. It requires an understanding of the unique interactions among applications, networks and servers. Traditional methods of monitoring network performance -- thresholds and alarms -- do not offer the insights that an IT staff needs in order to understand application behavior, making it difficult to allocate resources to optimize application performance.
Application or network?
Application environments lack the visibility that network operators take for granted. Tracing the causes of application performance issues has historically required a combination of expertise, experience and luck. The default attitude blames performance problems on the network when application design or coding may be the cause.
Help-desk personnel and IT administrators need holistic performance-management systems that can help them quickly identify the source of a problem, fix it, and prevent its recurrence. Such a performance-management system takes advantage of network visibility to gain a detailed understanding of application behaviors from the macro to micro level.
This holistic system should include online traffic and bandwidth monitoring, offline modeling tools, and the integration of multiple capabilities into workflows that support both rapid troubleshooting and predictable application deployment. It spans the entire technology lifecycle, from planning and design through implementation and operations, helping IT groups attain consistent high availability and optimal IT performance.
Planning and design
Adequate predeployment preparation forestalls a multitude of problems. Both network and application planning teams usually follow established practices for performance testing and validation. However, many of the methods that have come to be accepted as standard and adequate cannot predict how a new application or network service will behave in complex networks. If your production network has lots of routers, several dozen international WAN links, and multiple data centers, can it support new voice-over-IP (VoIP) services or a new enterprise resource management application?
The risks and consequences of inadequate testing prior to deployment are too great to trust to a set of tests that do not mimic reality. Building a test network in a lab and running dummy traffic over it is not the same as modeling the specific network where an application will be deployed.
New offline modeling tools can create a "virtual" network that represents the actual enterprise environment. Such tools enable extremely accurate performance analysis as a basis for good planning and design. The tool creates a virtual environment using data collected from network-analysis devices, sniffers and traffic-collection tools that already exist, and it draws from databases of specific configurations of network elements.
Planners can input proposed changes -- such as implementing an Internet Protocol (IP) telephony service or deploying a new application -- and then observe behavior changes in the model. They can identify potential conflicts or problems and run "what if" scenarios that adjust components, architectures and designs until they achieve acceptable tradeoffs between budget, service-level-agreement (SLA) compliance, and other practical considerations.
Thorough predeployment planning and design should yield a set of proposed configuration changes to support the new service or application. For example, services based on VoIP technology call for adjustments to QoS settings, particularly over low-speed WAN links. In some cases, enterprises may need to add bandwidth over certain links. The offline modeling tool can validate these configurations before IT sends them into the live network.
Troubleshooting application performance in today's complex networks can be an arduous task, but newly available tools take much of the pain and guesswork away, enabling help-desk personnel and IT administrators to identify and solve problems rapidly. New network-analysis software aggregates and interprets the massive amount of data collected in the network, simplifying the process of determining whether the issue resides in the network or the application.
To analyze application performance issues further, another offline modeling tool captures usage data from agents embedded throughout the IT infrastructure. Advanced analysis capabilities allow administrators to visually track all messages passed between application tiers, servers and users, with micro-level views of all delay components.
Consider this scenario:
- When a user calls the help desk to complain, "My application is slow," the operator can immediately survey the relevant network segments to identify and address network problems.
- If the network tests OK, the operator escalates the help-desk ticket to a specialist.
- The specialist asks the user to duplicate the action and captures it in the offline tool.
- The specialist reviews a map of application actions, identifies latency issues, and recommends ways to fix them. For example, the map may show hundreds of messages between application and database servers before a response is sent to the user.
- The specialist may discover that the application team applied a patch the night before, and now everyone is experiencing slower performance.
- Because the patch introduced unacceptable delays through poor coding, the specialist recommends that the application team fix the code.
While this scenario is unfortunately common, isolating the problem now takes minutes instead of days or weeks. The performance-management system uses automated capabilities to help operators understand all the variables that influence application behavior. Previous methods required sifting through thousands of alarms or data points by hand. Manual troubleshooting methods cannot scale to the complexity of today's application environments and networks.
Automated, holistic performance-management capabilities overcome traditional finger-pointing between network and application teams, fostering collaboration instead of competition. The results? Applications run smoothly, users are more productive, and the entire IT staff has more time and budget to devote to strategic projects.
Dave Wetzel is in product technology marketing, network management, at Cisco Systems, Inc.