When $62 billion of Google's market cap value disappeared in four minutes, it was up to IT sleuthing, and possibly...
some sophisticated retrospective network analysis, to find out what happened and to fix the damage. Times have been tough lately on Wall Street, but on Oct. 1, Google shareholders experienced a particularly wild ride: In the last four minutes of trading that day Google's stock price wavered between $488.43 and $25.80.
Financial times are tough, but not quite that bad: The ricocheting stock price was the result of misrouted trades, and the task of fixing the mess would inevitably fall to NASDAQ IT, whose job it is to ensure that the markets run smoothly, bullish or bearish as they might be.
While NASDAQ would not comment in detail on what caused the Google stock glitch, a spokesperson told Reuters that "erroneous orders" caused the abrupt plunge, "triggered by orders routed from another exchange."
Such misrouting is rare but not unheard of, according to Jim Frey, a senior analyst with Enterprise Management Associates (EMA) who has worked with financial institutions.
"In this case, financial institutions are similar to a lot of others in that the challenges they need to address are intermittent problem syndrome," Frey said.
In a situation like the Google stock-price meltdown, he said, it is important for IT to be able to find a record of the original error in order to make sure it never happens again. "Looking for that small blip of something, you want to look back and see what was leading up to this," he said.
IT has a variety of solutions at hand to deal with these types of problems as they occur, but the key is choosing the tool that will soonest give the necessary depth of information.
That includes everything from the trading software itself, which generally keeps detailed logs and audit records, to network-analysis tools that log every IP transaction on the network.
"When it's not apparent where a problem is, the most complete and objective point of view is that detailed record of packets," Frey said. "There really isn't a way to get that level of detail without retrospective analysis."
Retrospective network analysis (RNA) technology records packet logs over a period of time, allowing IT to sift through transactions in search of anomalies and trace those anomalies back to their origins. RNA vendors have gotten more aggressive about how far back, and how detailed, networking professionals can look. Network Instruments is offering an appliance, the GigaStor, with 288 terabytes of storage, which can store more than a month's worth of traffic from a 10 Gigabit Ethernet connection.
Paul Langway, principal at solution provider PVP Sales, which works closely with financial institutions, outlined one scenario of how IT might have handled the Google trading errors using just that device, a popular seller for his company.
Typically, the networking professionals embedded in the heart of finance are focused on optimization: On the electronic trading floors, every millisecond counts. Before the disaster struck Google's stock price, IT was probably regularly checking trade times and looking for growing latency to optimize out of the network.
"It's all about time. That's what customers need right now, especially in the financial industry," Langway said. "We're talking milliseconds today, as opposed to years ago, when [latency] wasn't as critical."
From one source or another, complaints would start streaming in at 3:57 p.m., when the NOC had almost survived another day of trading before the yo-yoing stock price struck.
Perhaps it was a stream of bewildered email that was the first notification, but the notification would probably have been more personal – and more stress-inducing.
"The guys I'm working with … actively involved in trading, typically get a phone call," Langway said.
If they were lucky, the networking pros would have seen a traffic anomaly before the first complaints came in, giving them at least the ability to say they had begun corrective action.
At this point on Oct. 1, trading is almost done for the day, but seconds still count: Cancelling trades is embarrassing and, in the NASDAQ, almost never done.
If signs pointed to a deeper network issue, a networking professional would log into the GigaStor and start zeroing in on Google trades made within that time frame.
It's now, when every second counts, that having a complete packet record pays off. Rather than waiting for the problem to recur in order to track down the root cause, the networking team can scan through the past few moments as if they had a network DVR, looking for the problem.
"The worst thing that can happen is you can miss the event, and then you have to wait around for it to happen again," Langway said.
Using the GigaStor, he said, IT would probably first start filtering header information for the past few minutes, and only those trades that NASDAQ knew went awry.
Once the erroneous trades were located, it was time to start following the trail: Trades can go through 10 or more hops from router to router, and tracing them back properly is critical to finding where the misdirected trades came from.
In this case, Langway said, a corrupted route table that misdirected to the NASDAQ trades that were meant for another market was a possible culprit.
"You may need to flush that router," he said. Corrupted or overrun routers are not uncommon, particularly since financial markets rely heavily on multi-casting network technologies that can, if the slightest element goes awry, overwhelm a router's memory or otherwise cause a bottleneck in the network.
Once the router is flushed, restarted, and operating again, the real work has just begun. It's back to RNA, tracking down the source of the failure. What caused the router to become overwhelmed? While a single failure can play havoc with the market, a repeat can cost a networking professional his job.
Generally, Langway said, most of the problems do end up in the domain of networking, as both traders and the markets themselves try to minutely optimize their networks to reduce latency, going so far as to rearrange blades in a rack for peak performance.
No matter the optimizations, though, unpredictable market surges in activity provide a constant need for improvement.
"If you look at what happened Monday, when the market rallied, I was trying to put through trades in the morning," Langway said. "They were taking 45 seconds to execute. They just can't keep up."
Ultimately, without further clarification from NASDAQ, it's impossible to know whether RNA or another tool in the IT belt discovered why the Google trades went awry, but eventually NASDAQ was able to track down, and cancel, all the erroneous trades – not the ideal situation, but at least not arbitrarily wiping out billions of dollars of the search giant's market cap.
"There are a lot of other ways for tracking and auditing the data coming in, and [RNA] probably wouldn't be the shortest way," Frey said. "But if it is an addressing issue, you might have to go back to the network data to see that problem."
Even Network Instruments was coy about its role, if any, in fixing the problem.
"Being able to track those trades -- who made those trades, how they were sent erroneously through the systems -- is the kind of thing these systems are designed to do," said Charles Thompson, manager of systems engineering at Network Instruments. "If I did know anything about that case, unfortunately, I wouldn't be able to talk about it publicly without the NASDAQ's written authorization."