Network and performance monitoring supplier ThousandEyes, this month's SearchNetworking's Network Innovation Award winner, made its industry debut last summer with an impressive list of customers that included Twitter, Equinix and Priceline.com. The San Francisco company promises a better way to monitor network performance, regardless of whether problems stem from inside or outside of the network. SearchNetworking site editor Chuck Moozakis spoke to CEO Mohit Lad to get an idea of what ThousandEyes does differently and how it can allow users to track performance problems when they occur outside of their environments.
What was the market need ThousandEyes was created to fill?
Mohit Lad: We started in 2010. The founding team was from UCLA. We did research on Internet routing with a focus on troubleshooting and started figuring out how to understand a network as complex as the Internet. After graduation I went to work at Nokia and my cofounder, Ricardo Oliveira, interned at Juniper and during this time we realized that existing products were providing very limited network visibility. It was hard to look at the entire network, regardless of whether parts were deployed by us or not, to determine where problems might be. That was a huge thing for us with Nokia providing various mobile services, but also as a distributed IT organization. Once you add the cloud phenomenon, where one is more dependent on the network, you are not sure where the issues are, so the idea is to base something that will really help you look at networks like never before and not be restrained by, 'This is your boundary.'
What were some of the goals you were trying to accomplish?
Lad: There were a few things we wanted to do. First, we wanted to make sure it's something that is really easy to set up, and intuitive to use. We wanted to make sure we could identify the root cause of observed performance issues, even if the problem is outside the customer environment. Finally, we wanted to build something that lets you collaborate with other parties. That is very important in my mind. There was also something that's important for enterprises in particular. When you think about the performance management space, a lot of it has been focused on the server side. With applications becoming black box, you can't do a server-side implementation. In my mind you need similar metrics, but it has to be measured on the client side, and legacy technologies are unable to provide that. At a high level we wanted to build a solution that did not require server side cooperation, making it more relevant in the modern IT environment.
How do you do that?
My view is if you are just doing network performance without correlating the application, it's not as meaningful. We want to be able to tell you what the application impact is.
Lad: We use agents, but they are client-side agents. We don't need any implementation on the applications. We put an agent in each major branch office site and you get an infrastructure view. The agents are easy to install -- it's a virtual appliance download and you're done. Once you set them up, you can log into ThousandEyes and set up tests from the agents and see the performance of a specific application as well as the underlying network. (Editor's note: ThousandEyes also has a public set of agents that enterprises can use to assess network performance and to determine if a problem is inside or outside of a customer's network.)
Is this network performance or application performance?
Lad: It's network performance, but it's application-aware. My view is if you are just doing network performance without correlating the application, it's not as meaningful. We want to be able to tell you what the application impact is; first, tell you if it's an application issue or network issue, and if it's a network issue, where the problem is.
How does your approach differ from other networking monitoring vendors?
Lad: The fundamental difference is that nobody we've seen understands the entire end-to-end network, particularly the Internet part. Others can't tell you where things are broken. The correlation between all the network and application layers is also very unique to ThousandEyes. Finally, we deliver this as a Software-as-a-Service (SaaS) product, while most legacy providers have on-premise products that are more difficult to manage.
How does the client agent know where the problem might lie?
Lad: We perform tests at various layers (HTTP, Domain Name System, network, etc.) simultaneously and with our correlation techniques we are first able to establish which layer is broken. There are three unique aspects here: intelligent ways to collect all the metrics by playing some tricks with different protocols; inference algorithms to isolate where the issue might be and interactive visualization to digest data in a simple and intuitive manner.
Let's say, for example, the network is the problem. In this case, what you want to know is whether the problem is in your side of the network or somewhere else. With ThousandEyes, you can see performance data at different layers through an interactive network graph that shows all of your offices on one side of the screen, the servers on the other side and all the intermediate hops in between. The graph includes visual indications of where the issues are, similar to Google Maps visually showing where traffic congestion is. In our particular example, different branch offices for a large enterprise are having availability issues with an application.
By going to the network layer, you can see two servers, out of different data centers, being used for this app -- one on Internap, the other on NeoSpire. The visualization also shows that one of the hops is experiencing heavy packet loss. When you hover over that node, you can see there is a 92% packet loss on this particular router managed by NeoSpire. So basically within seconds you can figure out that your enterprise is having an issue because of a network problem at a specific hop in one of the cloud providers' data centers. Without any cooperation from anybody along the path, you can map out the dependencies and determine where the issues are.
Talk about the Share the Screen function within ThousandEyes.
Lad: Troubleshooting in IT is becoming more and more of a multi-party effort with elements of the infrastructure -- as well as apps -- moving to cloud. We spent a lot of time and effort to think through collaboration by sharing data interactively between users and non-users of ThousandEyes. The sharing functionality not only lets you as an enterprise collaborate internally, but it also lets you share the information you see with other parties you might be dependent upon, so they can actually use this information to solve a common problem that's affecting both parties.
How does Share the Screen differ from an online forum or other communications channel that IT administrators might already be using?
Lad: There are couple aspects that make it unique. First, from the enterprise perspective, during an ongoing issue, time is everything: You need to share that exact data [illuminating the problem] immediately with another party responsible for the problem, and the snapshot sharing capability lets other parties -- even those without a ThousandEyes account -- see a limited time slice of what you are seeing and also interact with it. The snapshot sharing is a great way to collaborate during an event and the links can be revoked once the problem is solved. Now if you consider SaaS companies, they are proactively looking to build network operations centers where they get more intelligence from the customer's side -- more live intelligence.
The other aspect of the collaboration is what we call 'live sharing.' If you are an enterprise using Salesforce.com, for example, you can initiate live sharing for the performance data collected on Salesforce.com within ThousandEyes. When the receiving party logs into their ThousandEyes account, they would get this live feed of data in a continuous matter and be able to integrate this into their operations console as well as set up alerts. SaaS companies always want to know of customer problems before their customers call them and our live sharing would let them achieve that goal.
What are you working on now?
Lad: We have quite a few exciting things coming up in the road map but I cannot go into too many details. At a high level, we are working on adding a user-view component and we are also adding the ability to track performance of other critical applications, such as voice. With our voice module, you will not just see the degradation, but find out why. It's the 'why' part that is the most difficult for voice troubleshooting.