Understanding Predictive Modeling

With the rise of “big data,” predictive analytics has become increasingly popular. Predictive modeling, the process of using data to predict an unknown event, allows for predictions to be made based on multiple complex factors. Predictive modeling has been touted as the key that unlocks performance increases in everything from customer service, crime prevention, financial services, to population health. Within the pharmaceutical industry, “big data” shows promise for improving the drug development process (e.g., identification of genetic targets and the development of immunotherapies) and demystifying reimbursement decision-making. 

There is a growing body of research focused on building models to predict the decision-making of HTA agencies. Recently, the Context Matters team presented analysis on our predictive model for SMC oncology health technology assessment decisions. Our model uses variables such as the incremental cost-effectiveness ratio (ICER), the presence of a patient access scheme, and if the assessment was a resubmission, in order to predict SMC’s reimbursement decision. 

While demand for models to predict reimbursement decision-making is high, it’s critical to recognize that not all predictive models are created equal—companies need to be cautious when making decisions based on the results of these models. So how do you judge a good model from a bad one? There are three components that are critical to evaluating the quality of a model:

  1. Methodology
  2. Quality of data
  3. Meaningful results

A good model starts with a good hypothesis. The outcome of interest has to be something that can systematically be predicted using measurable variables. For example, in our model, we used ICERs as the measurable variable to predict the outcome, which was the SMC decision. One must also have reasonable rationale for including variables in the model. In many cases, researchers include variables in their models simply because that data exists, but there is no reasonable rationale for why this variable can help predict the outcome of interest. This is one of the pitfalls with big data and its abundance of information. Researchers must show restraint and a rationale for each variable they include in the model. The ability to validate your model is also a very significant indication of the quality of a model. We have seen many instances in academic literature where a model is built on a specific set of data, but not validated in an independent data set. The authors claim that their model successfully predicted the outcome, but without testing the model on a separate set of data, the validity and predictive power of the model is not proven. While the outcome can be predictive within the single data set, without testing your model on a separate set of data the generalizability, and consequently the usefulness of the model is unknown. 

Quality of data
Predictive modeling must rely on quality data. As the saying goes, garbage in, garbage out. The data must be cleaned and standardized in a reliable way in order to produce a reliable model. One factor that often goes unnoticed is how data coding can influence the results. For example, in our SMC model, SMC did not always directly report the ICER. Instances where the ICERs were inferred by us were slightly associated with SMC decisions. This means that the way we coded the data could influence the results of the analysis. While this is not detrimental to the model, it is an important point to consider in future analysis. Quality modeling tests for these types of biases. 

In addition to quality data, we believe “smart data”—data that has been curated, filtered, and standardized to create a consistent set of information to answer a set of defined questions— improves the reliability of the model and the meaningfulness of the results. Having a deep understanding of how each HTA agency operates and how each data point relates to the HTA agency’s process allows us to make judicious choices about what data to include in the model and how to reliably code that data. Without this deep understanding of the data, it is easy to miss important details and misinterpret the results. Therefore, we believe the modelers must have significant knowledge of the agency process and their data, and collect and manage that data in a reliable and consistent way in order to build a meaningful predictive model. 

Meaningful results
Finally, a validated model with a good hypothesis that is based on quality data will deliver actionable results and insights. If the predictive model is not able to inform the initial question and/or provide a path forward, then it is useless. Even quality models with null results are informative.  

At Context Matters, we are using our “smart data” to pursue predictive modeling in order to provide meaningful insights to our customers. We are currently in the process of validating our SMC oncology HTA decisions model. As a data and technology company, we are dedicated to high quality, smart data, and sound methodology. 

Stay tuned for our next blog on data outliers and the insights they can provide.