One of the many challenges facing network operators in the next-generation services market is finding tools to test service components. So far, the majority of network testing has focused on transmission and connection resources, which are critical for delivering next-gen services.
Because cloud computing is both a service and a platform for hosting next-gen service features and content, cloud testing is a major priority.
President, CIMI Corp.
But service features are more often created by IT components like servers and storage arrays. Because cloud computing is both a service and a platform for hosting next-gen service features and content, cloud testing is a major priority. Yet finding guidance on how to do it can be difficult.
As with any complex network service, operators can take one of two basic approaches to cloud testing:
- Component testing: Focus on verifying the operation and performance of each element of the service, working on the theory that if each part works, then the combination of the parts will work correctly or at least will be easier to troubleshoot.
- Service/system testing: Test the entire service as a unit, emulating what the user would experience.
While the service/system testing approach seems more intuitive, component testing may be the preferred approach for many reasons. Mainly, testing any complex system as a whole requires first that there be a set of predictable outcomes that can be generated and validated. Second, it requires a mechanism for simulating the cloud computing services to create those outcomes in a properly functioning cloud infrastructure.
Since whole-system testing is difficult, some operators are looking at a new cloud testing model based on the almost-ageless concept of protocol layering, which divides network designs into functional layers and assigns protocols to perform each layer's task. The essential goal of protocol layering is to make successive layers dependent on the services of adjacent layers without becoming dependent on variations in how the services are performed.
Three-layer cloud services structure focuses on boundary testing
Cloud services can be defined as three-layer structures, with the network layer at the top, the server pool layer in the middle, and the database/storage layer at the bottom.
The first step in using a protocol layer model for cloud testing is to define the essential services at the boundary points between layers. The testing goal is then to validate the delivery of those boundary services. So the three layers in the cloud services structure have two boundary points. Cloud service and infrastructure testing would logically focus on the connection between the delivery network and the server resource pool, then on the connection from the server pool to the storage pool. For each, the cloud testing process requires first that there be a service-level goal established at the test point, then that there be an organized procedure to test against that goal.
Network/server boundary testing issues. The biggest testing issue operators report for the network/server boundary is managing the variations in connection performance that occur because of the relative location of the user and the server resources assigned to support the user's applications.
The design goal for cloud network connectivity is to keep these variations small by reducing the number of hops between possible user locations and possible resource locations to reduce packet loss and delay on as many combinations as possible. Large differences would require that assigning server resources to a user take the communications impact of the server's location into account, which complicates the problem of making optimum use of server resources.
Since testing the network's ability to deliver consistent services to users anywhere within the target geography is critical, cloud testing will require comparing network delivery performance between various geographic points and the server resource pool. If the pool is widely distributed, it will be important to test the delivery variations across a combination of server and user locations. This can be done with traditional test equipment or with load generators and network monitoring tools.
Server/storage boundary testing. The "service" that the server layer delivers is that of assigning and sustaining a server on demand. In most cases, the issue here will be the delay between an application request for service and the delivery of the needed resources, which is a test of the resource pool's management process. While it may be possible to generate test simulators for the management protocols, most operators prefer to test the responses using load generators or live requests entered for test purposes. What's critical here is to be able to correlate the request and the response, which is most easily done if you start by measuring the application response time at the point of request.
Testing server/storage boundary points is normally easier than network/server pool boundaries because most operators rely on storage located in the same data center as the servers. The performance variations are therefore limited to those that occur due to data center network structure, notably the use of "deep" (i.e. multi-layer) storage networks that introduce variable latencies due to variations in the number of switches that need to be transited between server and storage device.
Where the structure of any of the three layers is complex, it may be essential to test within the layers to identify sources of performance variations or increased fault risk. This is most likely the case where there is a large geographic scope of users or resources. Conventional testing tools can be used to test servers or network equipment because only the behavior of a single set of resources is being tested, not that of a complex cloud.
Testing resource scheduling for cloud efficiency
A final variable in cloud infrastructure is the "recommission" performance of the cloud -- the time it takes to put something back into service when it's failed. When a resource fails, or when resource scheduling and assignment suggests that an application be moved to an alternative resource to improve overall cloud efficiency, it will be necessary to reload machine images and possibly reconnect internal data paths. This process is difficult to decompose for testing in a live cloud and is difficult to simulate except with massive load generation. Most operators prefer to use systems management data on virtual machine switchover as a starting point and refine activity and data path monitoring only if they see a consistent issue developing.
Overall, the optimal test strategy for cloud infrastructure is much the same as for testing network infrastructure: Try to test each layer independently; test the boundary conditions; and test the special conditions that form the basis for inter-layer service-level assumptions. That will optimize the skill sets of operations personnel and the existing test and measurement tools.
About the author: Tom Nolle is president of CIMI Corporation, a strategic consulting firm specializing in telecommunications and data communications since 1982. He is the publisher of Netwatcher, a journal addressing advanced telecommunications strategy issues. Check out his SearchTelecom.com networking blog Uncommon Wisdom.