As renowned statistician George Box once wrote, “Essentially, all models are wrong, but some are useful.” Imagine after months of dedicated work, your analytical project culminates in a model that predicts a key performance indicator with 90% accuracy. You enthusiastically present your findings to the decision-makers. Impressed, they hastily implement the model, yet as time passes, the KPI flounders despite devoting resources and time into improving the explanatory variables in your model.
This scenario can be avoided by thorough model verification and validation, a process that will uncover how “wrong” and “useful” the model can be. In other words, we are identifying the model’s capabilities, limitations, and appropriateness to answer the underlying business question.
A model verification process systematically tests a model to find areas of weakness or errors. The first step in model verification is a peer-review testing of algorithms for bugs and accuracy. In this step, it is important to focus on programmatic errors, rather than code optimization or clarity. Next, correct any errors and repeat testing until all programming problems are fixed.
Analytical models are functions that return outputs based on inputs, similar to any basic functions taught in math classes. In fact, the most common statistical model, a simple linear regression, is f(x) = Ax + b, or the equation of a line. It is reasonable to conclude that analytical models should behave similar to most functions: continuously.
You can verify your model for continuity by varying the inputs slightly. In theory, your outputs should not drastically change due to small changes in the inputs; however, you’ll find that many complex models will. This may signify an erroneous model, but if the results make business sense, there is no reason to change anything.
You can also review extreme case scenarios to see if the model reacts wildly to high or low inputs. Based on the findings, you may need to redesign your model, or specify bounds based on the input variables. Be wary of extrapolation, or predicting outside of the data that the model was built on, which can happen for response and predictor variables. You can’t expect a model to accurate predict something it hasn’t seen, although it may be tempting to do so.
Another model verification tactic is the Sub-Models or Simpliﬁed Models Approach. In this method, the analyst will attempt to split complex models down into smaller, simpler parts, or substitute less complicated models for complex models. Given similar predictive power, if there is ever a choice between model complexity and comprehension, you should choose comprehension every time.
The model validation process ensures that the model addresses the right business problem, meets analytical standards, and provides accurate information about the underlying system being modeled. It gives the decision-makers confidence and reason to implement the analytical techniques.
The first step is to check assumptions. For any analytical project, all critical business-related assumptions should be clearly stated. Further, any assumptions necessary for statistical modeling should be confirmed and documented. Some common examples of these assumptions are normality of errors, independence of observations, and so forth.
In business scenarios, it is rare to satisfy all modeling assumptions. In this case, a review of assumption violation implications must be performed. Depending on the goals of model building, some assumptions are more important than others. For example, if the goal is to explain variables that influence a target, you’ll want to ensure assumptions that affect p-values and standard errors are met. On the other hand, if the goal is to find accurate predictions, you may be more lenient on these types of assumptions.
The next step in model validation is expert review or intuition. In this stage, you will bring a subject matter expert into the project, preferably someone that was not directly involved in the model building process. We are asking for a conceptual look at the explanatory variables, coefficients, model outcomes, and overall processes.
Encourage the reviewer to ask questions and voice concerns. This will help identify problems with your models. If you can pass an expert review, you are one step closer to model implementation.
Another method of model validation is a model backtest. This method uses the model to predict previously observed data points in an effort to assess how the model would perform to new data points. In order to accomplish this, be certain to subset these prior data observations into a validation dataset before building the model parameters. I will speak more about this topic in a future post.
A common best practice for large businesses is independent V&V, or completely separating verification and validation efforts from the model development team. Similar to double blind studies, this method reduces bias and emotional attachment. In fact, in light of the recent financial crisis, the Dodd-Frant Act requires large banking institutions to implement independent model review. In many cases, you’ll find a dedicated model validation team assigned to the tasks described in this article.
Model validation and verification often closely accompany each other in the process of implementing an analytical project. These crucial steps may be the deciding factor between influencing business decisions or simply performing research. With the mentality to take these steps seriously, you can drastically improve the model building process and, consequently, stakeholder decision making.
Are there any steps that I missed? Be sure to let me know in the comments section below!