Home > Networking News > Retrospective network analysis might have found Google's lost billions
Networking News:
EMAIL THIS

Retrospective network analysis might have found Google's lost billions

By Michael Morisy, News Writer
16 Oct 2008 | SearchNetworking.com

Network management news, advice and technical information
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

When $62 billion of Google's market cap value disappeared in four minutes, it was up to IT sleuthing, and possibly some sophisticated retrospective network analysis, to find out what happened and to fix the damage. Times have been tough lately on Wall Street, but on Oct. 1, Google shareholders experienced a particularly wild ride: In the last four minutes of trading that day Google's stock price wavered between $488.43 and $25.80.

Financial times are tough, but not quite that bad: The ricocheting stock price was the result of misrouted trades, and the task of fixing the mess would inevitably fall to NASDAQ IT, whose job it is to ensure that the markets run smoothly, bullish or bearish as they might be.

While NASDAQ would not comment in detail on what caused the Google stock glitch, a spokesperson told Reuters that "erroneous orders" caused the abrupt plunge, "triggered by orders routed from another exchange."

Such misrouting is rare but not unheard of, according to Jim Frey, a senior analyst with Enterprise Management Associates (EMA) who has worked with financial institutions.

"In this case, financial institutions are similar to a lot of others in that the challenges they need to address are intermittent problem syndrome," Frey said.

In a situation like the Google stock-price meltdown, he said, it is important for IT to be able to find a record of the original error in order to make sure it never happens again. "Looking for that small blip of something, you want to look back and see what was leading up to this," he said.

IT has a variety of solutions at hand to deal with these types of problems as they occur, but the key is choosing the tool that will soonest give the necessary depth of information.

That includes everything from the trading software itself, which generally keeps detailed logs and audit records, to network-analysis tools that log every IP transaction on the network.

"When it's not apparent where a problem is, the most complete and objective point of view is that detailed record of packets," Frey said. "There really isn't a way to get that level of detail without retrospective analysis."

Retrospective network analysis (RNA) technology records packet logs over a period of time, allowing IT to sift through transactions in search of anomalies and trace those anomalies back to their origins. RNA vendors have gotten more aggressive about how far back, and how detailed, networking professionals can look. Network Instruments is offering an appliance, the GigaStor, with 288 terabytes of storage, which can store more than a month's worth of traffic from a 10 Gigabit Ethernet connection.

Paul Langway, principal at solution provider PVP Sales, which works closely with financial institutions, outlined one scenario of how IT might have handled the Google trading errors using just that device, a popular seller for his company.

Typically, the networking professionals embedded in the heart of finance are focused on optimization: On the electronic trading floors, every millisecond counts. Before the disaster struck Google's stock price, IT was probably regularly checking trade times and looking for growing latency to optimize out of the network.

"It's all about time. That's what customers need right now, especially in the financial industry," Langway said. "We're talking milliseconds today, as opposed to years ago, when [latency] wasn't as critical."

From one source or another, complaints would start streaming in at 3:57 p.m., when the NOC had almost survived another day of trading before the yo-yoing stock price struck.

Perhaps it was a stream of bewildered email that was the first notification, but the notification would probably have been more personal – and more stress-inducing.

"The guys I'm working with … actively involved in trading, typically get a phone call," Langway said.

If they were lucky, the networking pros would have seen a traffic anomaly before the first complaints came in, giving them at least the ability to say they had begun corrective action.

At this point on Oct. 1, trading is almost done for the day, but seconds still count: Cancelling trades is embarrassing and, in the NASDAQ, almost never done.

If signs pointed to a deeper network issue, a networking professional would log into the GigaStor and start zeroing in on Google trades made within that time frame.

It's now, when every second counts, that having a complete packet record pays off. Rather than waiting for the problem to recur in order to track down the root cause, the networking team can scan through the past few moments as if they had a network DVR, looking for the problem.

"The worst thing that can happen is you can miss the event, and then you have to wait around for it to happen again," Langway said.

Using the GigaStor, he said, IT would probably first start filtering header information for the past few minutes, and only those trades that NASDAQ knew went awry.

Once the erroneous trades were located, it was time to start following the trail: Trades can go through 10 or more hops from router to router, and tracing them back properly is critical to finding where the misdirected trades came from.

In this case, Langway said, a corrupted route table that misdirected to the NASDAQ trades that were meant for another market was a possible culprit.

"You may need to flush that router," he said. Corrupted or overrun routers are not uncommon, particularly since financial markets rely heavily on multi-casting network technologies that can, if the slightest element goes awry, overwhelm a router's memory or otherwise cause a bottleneck in the network.

Once the router is flushed, restarted, and operating again, the real work has just begun. It's back to RNA, tracking down the source of the failure. What caused the router to become overwhelmed? While a single failure can play havoc with the market, a repeat can cost a networking professional his job.

Generally, Langway said, most of the problems do end up in the domain of networking, as both traders and the markets themselves try to minutely optimize their networks to reduce latency, going so far as to rearrange blades in a rack for peak performance.

No matter the optimizations, though, unpredictable market surges in activity provide a constant need for improvement.

"If you look at what happened Monday, when the market rallied, I was trying to put through trades in the morning," Langway said. "They were taking 45 seconds to execute. They just can't keep up."

Ultimately, without further clarification from NASDAQ, it's impossible to know whether RNA or another tool in the IT belt discovered why the Google trades went awry, but eventually NASDAQ was able to track down, and cancel, all the erroneous trades – not the ideal situation, but at least not arbitrarily wiping out billions of dollars of the search giant's market cap.

"There are a lot of other ways for tracking and auditing the data coming in, and [RNA] probably wouldn't be the shortest way," Frey said. "But if it is an addressing issue, you might have to go back to the network data to see that problem."

Even Network Instruments was coy about its role, if any, in fixing the problem.

"Being able to track those trades -- who made those trades, how they were sent erroneously through the systems -- is the kind of thing these systems are designed to do," said Charles Thompson, manager of systems engineering at Network Instruments. "If I did know anything about that case, unfortunately, I wouldn't be able to talk about it publicly without the NASDAQ's written authorization."

Tags: Network MonitoringIP NetworkingNetwork HardwareVIEW ALL TAGS

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Network Monitoring
University tackles large-scale 802.11n wireless network management
Meru reinvents wireless LAN troubleshooting and management
Green enterprise: Three networking investments that make a difference
Network device management overload: Engineers managing too many boxes
What preventative maintenance procedures for network devices exist?
WLAN QoS and SLA monitoring with 7/24 Wireless Quality Assurance costs
How important are network infrastructure maps for engineers or admins?
Understand Windows tracert output to troubleshoot network connectivity
Network management and monitoring market remains crowded, fragmented
When do applications suffer from poor network performance?
Network Monitoring Research

IP Networking
What is the definition of ATM (Asynchronous Transfer Mode)?
Do I have to disable DHCP on my router to create a DHCP server?
Windows Server 2008 IP routing configuration: Static and dynamic RIPv2
What is IP?
Connect your LAN to the Internet using static or dynamic NAT
Using tracert and TTL to troubleshoot network connectivity problems
Test your TCP/IP protocol stack to troubleshoot network connectivity
IP addressing and subnetting explained
Checking IP configuration to troubleshoot Windows network connectivity
Does IPv6 abandon TCP/IP fragmentation?

Network Hardware
Unified wireless network still a work in progress for vendors
3Com acquisition confirms HP-Cisco battle for China
Juniper to CIOs: Invest in internal cloud computing networks
802.11n wireless APs bring IP video to sprawling Illinois high school
802.11n upgrade: College ditches legacy network for new vendor
Network device management overload: Engineers managing too many boxes
What is network infrastructure and what is a hybrid network?
What preventative maintenance procedures for network devices exist?
Can wireless adapters operate as client access points to make SoftAPs?
Is there VLAN software recommend for Realtek NICs?
Network Hardware Research

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
10-high-day busy period  (SearchNetworking.com)
ACK  (SearchNetworking.com)
baseboard management controller  (SearchNetworking.com)
call failure rate  (SearchNetworking.com)
jam  (SearchNetworking.com)
Jini  (SearchNetworking.com)
maximum segment size  (SearchNetworking.com)
maximum transmission unit  (SearchNetworking.com)
netstat  (SearchNetworking.com)
network tracking tool  (SearchNetworking.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary



Comprehensive network management resources, expert solutions, and professional research informing your technology decisions
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2000 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts