When the New York Times recently published an article, "For Big Data Scientists 'Janitor Work' Is Key Hurdle to Insights" it struck a chord with us. Context Matters is not a "Big Data" company, but in the process of curating and standardizing disparate sources of drug development data, we confront many of the same issues and challenges.
The Times' reporter, Steve Lohr, correctly asserts that curating, cleaning, and standardizing data from multiple sources is no simple task, and requires investments of resources and time. But describing it as “janitor work,” with the implication that it is dull and dirty, mischaracterizes what happens when data are transformed into meaningful discernments—hardly "mundane," or easily relegated to automation.
Part of the Process
Our data scientists are experts in their fields, holding advanced degrees in Public Health, and specialization in subjects such as Diabetes, Multiple Sclerosis, all areas of Oncology, Patient Reported Outcomes, and Pompe disease. In the process of turning raw data into information, our data scientists use their subject-matter expertise to break apart the data into normalized variables, allowing for apples-to-apples comparisons and clearer understanding of the often-complex drug development process.
It goes without saying that collecting data is a critical part of our data scientists’ role; but equally key is the expertise that comes from an intimate knowledge of the data—both as it is being uploaded, and in accessing it from our application to perform their analyses. The process enables them to then speak in fine detail about the data as specialized client-team support members, and insightfully comprehend the nuances that only come from this level of interaction.
Smart Data Explained
The act of data cleaning is a systematized way of making critical discoveries. Our team is responsible for harmonizing and relating the data produced by agencies worldwide; the entire process of scouring data as a learning curve unto itself as we develop the taxonomies that best represent all parties. Our customers gain an immense amount of value from that process, not the least of which is the ease of use it creates as they perform comprehensive searches across a myriad of data sources at once. That is only the beginning: By investing in this standardization, we create the opportunity for Smart Data, a concept we have previously described. Most essentially, it is the idea that to get the most from Big Data, one needs to understand enough about the data’s context to make meaningful insights. We believe it is well worth the effort.
Machines can be automated for results or collecting certain types of data, as the Times’ article suggests, but not for intelligent dialogue to discuss, or understand, complex and nuanced situations, within the correct context. We would be hard-pressed to find a program that could ever be set up to go through the same work process as our data scientists, and then lend itself to conversation and Q&A regarding the output and findings.
Therefore, to brand as tedious and low-level our data scientists' high-skilled, thinking job, and their process, which yields pattern recognition that would otherwise stay hidden, undermines its, and their, value.