What is the definition of big data and why should I care?
At its essence, big data is a logical extension of what was once called a data warehouse. It is just what it implies, a large repository of data, generally with a business focus that can be used to support business decisions. Where it differs from a conventional database, however, is that big data doesn't have to be structured.
In a typical database, data is organized into standard fields and indexed using specific keys. Anyone familiar with the Microsoft Access application understands this notion perfectly. A customer record can have a first name, a last name, address and other information organized into fields that have common labels. Every customer record is the same and each can be located by using search terms that key on certain labels: last name, for example.
Now, what would happen if you want to link those customer records to, say, a picture of each customer? Or videos of events that the customer attended? How about if you want to incorporate links to all the references to that customer that appear on the Web?
Reasons for mapping such disparate data sources to each other exist yet are generally not supported by conventional databases. Additionally, the amount of data that might be linked can reach staggering levels. This leads to the notion of big data. Big data uses special data architectures to organize and make accessible enormous amounts of data: well into multiple exabyte (10 to the 18th byte) ranges. Generally, this calls for parallel computing across many servers and discrete data stores, making such big data repositories difficult for a small business to maintain. However, big data is gradually becoming a service obtained through cloud service providers, thus putting big data applications within range of most companies.
The "big" question, though, is why would you need big data? The answer is in the value of correlations. Important information is often available if you can see the relationship between data sets that on first blush may not seem to have anything to do with each other. Let's say you want to know if your company is vulnerable to exploitation by hackers. To do that successfully may require examining millions of transactions across multiple applications and data centers. This is virtually impossible to do without big data techniques and associated analytics.
Ultimately, big data's definition will likely come to characterize most database applications -- as the amount of data, available and important to business, continues to expand. IT professionals should become conversant in big data concepts and terminology now in order to prepare for that eventuality.
This was first published in September 2013