The Rise of Smart Data: The Next Step in the Evolution of Big Data

During the frenzy over the rise of Big Data and its promise to transform just about everything, an interesting backlash has also emerged: Big Data skepticism. After all, it’s only logical to ask the simple but very important question: Do all this data actually mean anything? In a recent interview with journalist Charlie Rose, Freakonomics authors Steven Leavitt and Steven Dubner made the point that ideas, talent, and good questions are the key elements that turn data into useful insights. They went on to say that these three elements remain in remarkably short supply, particularly within conventional thinking – including around Big Data. And they are not alone.

Big Data Hubris: The Need for a Gut Check

In a recent article in Science, “The Parable of Google Flu: Traps in Big Data Analysis,” David Lazer, Ryan Kennedy, et al from Northeastern University and Harvard Kennedy School coined a new phrase describing the almost magical quality that’s been given by many to Big Data; they call it “Big Data Hubris.” This phrase describes the tendency to believe that traditional methods of data collection and analysis can simply be replaced with sheer volume of data. Statistically and intuitively, we tend to believe that large sets of data create accuracy simply because the n is large. But what happens when we don’t stop to validate the information?

In February 2013, Google Flu Trends (GFT) predicted more than 2x the number of physician visits for flu-like illness, as compared to estimates from the Center for Disease Control and Prevention (CDC) that used traditional surveillance reports from U.S. laboratories. GFT was built on an algorithm to predict the reports themselves. Unfortunately, this meant that the algorithm was set to be more sensitive to particular search terms and thereby placed greater weight on the increasing number of searches for terms like “cough” and “fever.” These inferred results were never validated against any other source or against traditionally collected and analyzed information.

Big Data Skepticism: The Need to Ask Critical Questions

Shortly after the Science piece, Financial Times on March 28, 2014 published another article, “Big data: are we making a big mistake?” In the article, Professor Patrick Wolfe from University College London points out that the term “Big Data” is vague, and used primarily by people looking to sell something; suggesting that it is not necessarily a defined method for generating insights. He says that “Big Data has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever.”

This was followed by an op-ed piece in the New York Times on April 6th, entitled “Eight (No Nine!) Problems with Big Data,” in which the authors point out many of the same issues, with emphasis on the lack of judgment applied to Big Data. They make the point that Big Data analysis works fine when the analysis is relatively straightforward, but when things get more complex or subtle, big data will often lose or misinterpret nuance. This risk of assumed or accidental correlations, then amplified by the amount of data, can often lead to large scale miscalculations, such as what happened with GFT.

The common theme with all these articles is that Big Data is not the enemy, but also not something that can be expected to provide true insights without human judgment, expertise, and traditional analysis methods. We were one of the earliest companies to raise similar questions about Big Data. In 2013 we wrote a post on our blog: “What’s Wrong with This Picture? Big Data, Insight, and Asking the Right Questions.” Our premise was that the presence of data alone (even in vast quantities) is not inherently valuable. We emphasized the need to ask the right questions to gain real value from data of any size. We also questioned resource allocation, since in recent years there has been enormous investment in the process of gathering data, but not the same investment or expansion of talent or critical questions.

We would like to go one step further and put forth the term Smart Data as perhaps the approach that addresses both Big Data Skepticism and Big Data Hubris, along with our concerns about the right kind of resource allocation to extracting insights.

Smart Data: Priming Big Data to be Insightful

The process of making Big Data smarter involves collecting, curating, and standardizing the data, with the intent of creating consistency and appropriate contexts. It is essentially making sure to ask the right questions and ensuring that the data being used to answer these questions is measuring something related to the question. Big or small, you have to know that the data you’re using is clean, accurate, consistent, and provides the right context.

Smart Data has been curated, filtered, and standardized to create a consistent set of information to answer a set of defined questions. But, something magical does happen within this approach that often goes beyond the initial questions; the data starts to show patterns that beget even better questions and often show insights within insights, particularly when the information is visualized. In essence, Smart Data is the end result of a process that starts with a question, but results in a clean, consistent, and contextual relevant set of data that can be subjected to traditional and new analytical techniques. Smart data looks to limit the number of steps away the data is from the questions you are trying to answer. Smart Data becomes the most valuable subset of Big Data.

In fact, that is exactly how we at Context Matters have come to treat data that comes from multiple domains and is often unfiltered. We start by determining what we wish to accomplish, search for the appropriate data sources, and then expend the time, expertise, and energy to transform the unfiltered information into curated, filtered, and standardized data sets. From there we apply visualization techniques to these “smart” data sets and actually convert the data sets into meaningful information and insights that our customers can apply right away. Without taking any shortcuts, Context Matters transforms Big Data into Smart Data.

It is important to understand that no one is saying that Big Data is useless or of no value. However, a new approach is needed for Big Data; an approach that does not take for granted that this data will do all the heavy lifting on its own.

In the end, big data or small, the world still needs the big talents of people who know how to ask critical questions to generate truly data-driven insights.