How do we have confidence when working with data sets that are too large for us to manually check everything?
By Spencer Allee, VP Data Science
We live in an uncertain world. This is something Compliance and Risk teams know all too well.
We often hear from our customers about the anxiety and chaos that uncertainty causes in the world of regulatory compliance — uncertainty in how rules are changing, uncertainty in what rules are important and likely to be enforced, uncertainty in whether they are tracking all the right obligations, uncertainty in whether their business is properly complying with rules.
Unfortunately, reducing that uncertainty traditionally costs a lot of money. The only lever most companies have to pull is to hire more people — compliance officers, lawyers, consultants — to keep track of obligations. These costs don’t scale well and have, at best, unclear ROI.
At Ascent, our goal is to insulate our customers from some of that uncertainty that has traditionally plagued them. To do this, we are building the largest programmatically-accessible body of regulatory knowledge in the world, and we are building the tools to scale this knowledge set as fast as regulators change their information, all while maintaining the quality and accuracy required for our customers to succeed.
But just like our customers, we too face our own uncertainty challenge: How can we be certain, especially when working with datasets that are far too large to be checked manually, that our information is correct?
Rather than running from this problem, though, we embrace it — and use technology to help solve it. We design our tools and strategies in a way that treats uncertainty as a reality that we can manage. Everything — from our knowledge production processes and internal and external product decisions to the technology that powers our scale and the governance around our machine learning modeling — provides levers we can pull to more effectively manage quality and scale for our customers.
READ ARTICLE: What the Tech? Machine Learning Explained
Knowledge Risk Framework
The first tool we use to manage quality and scale in the face of uncertainty is a simple knowledge risk framework: for any given step of our knowledge production process, what is the accuracy our customers need to be successful, and what is the most efficient way of maintaining that accuracy given our portfolio of tools?
For example, consider the technology that powers self-driving cars. The accuracy the technology requires varies depending on the action the car is completing. If the task is parallel parking without hitting a bumper, 95% accuracy is probably sufficient. If it’s turning left into oncoming traffic, accuracy will need to be much, much closer to 100%.
One of the key capabilities of our solution is the ability to analyze regulatory text, extract the obligations from within it, and automatically determine which of those obligations apply to our customers’ business. Making sure that this process is complete and error free is absolutely critical for our customers. Missing an obligation is like messing up that left turn — it’s not an option.
So for this process, we do not rely purely on machine learning models, which always have some error rate. Instead, we combine machine learning with domain expert review and internal tooling, allowing us to dramatically accelerate the rate at which we conduct this decomposition while maintaining extremely high quality. Think of it as having a human driver in that self-driving car to supervise left turns.
By taking this approach we have eliminated more than 80% of the effort it takes to do this step manually, while still achieving the same or better level of quality than a fully manual process.
In another example that’s less critical than identifying obligations, we have a step at which we classify regulatory documents into different internally-defined categories to help our customers filter. Because we have many different ways for our customers to find the right documents, the accuracy requirement for this specific step is much lower, which means we can use a machine learning model exclusively and sample a small subset of predictions periodically to estimate our accuracy statistically.
By applying this knowledge risk framework, we know that we’re spending our resources to eliminate uncertainty where it matters the most for our customers, while scaling the value we provide much more quickly than most customers can do themselves.
Probabilistic Predictions and Measured Uncertainty
We also use math and statistics as a way of managing quality in the face of uncertainty. Our solutions are powered by machine learning models — essentially, algorithms that are trained how to complete a task using large sets of data. We give our algorithms a task — for example, determine whether this line of text within this regulatory document is an obligation or is supporting text. Our algorithms reference the vast archives of regulatory text on which we’ve trained them to predict an answer to that prompt — what’s known as a prediction.
Using probabilistic predictions, our machine learning models can give us a measurement of how “uncertain” they are about that prediction. Think of it like a Jeopardy! contestant labeling each answer with a score of how confident she is that she’s right. If the model consistently predicts a similar answer with a very high probability, we can interpret the model as being more certain that its prediction is correct for that data point. This gives us the opportunity to break up our predictions into different measurable confidence “tranches” with different accuracies at different levels of confidence.
For example, if we decide as a business that our risk tolerance for a particular step is very low — that it’s a “left turn into oncoming traffic” kind of step and we need 99% accuracy — we can choose a confidence threshold above which we consistently achieve 99% accuracy. Any predictions above that threshold can be fast-tracked efficiently, whereas any predictions below that threshold can go into a queue for further human review.
Initially, this could require a fair amount of manual labor on our part. But the power of machine learning models is that they continue to learn. So as we accumulate more human-reviewed data, our models continue to improve and the size of our “high confidence tranche” increases, driving up our overall efficiency while maintaining our quality.
Correcting for Model Drift
Another source of uncertainty is one all predictive models must inevitably contend with: model drift.
Machine learning models use historical data to make predictions on new data. Sometimes the relationship between historical and new data is relatively static — for example, making a left turn now isn’t materially different than making a left turn five years ago. Other times it can be much more dynamic — like comparing sunscreen sales in August to those in December. As our regulatory scope continues to expand, a possible drift between patterns in historical data and patterns in new data is something we have to guard against.
To do this we rely on process, technology, and some clever sampling techniques. We have built and continue to invest in a modern machine learning infrastructure that makes it easy for our data scientists to monitor model performance, retrain models with new data, compare multiple models against each other, and quickly deploy the models that perform the best. We also maintain a stream of human labeling to compare against our model labeling, even for models that are performing well. This allows us to constantly collect quality metrics, identify error modes and drift, and generate additional training data.
We’ve designed our internal tooling to take advantage of smart sampling techniques to apply our domain-expert labeling time to the most information-rich data points, so that if we label even a fraction of a percent of a dataset we can maximize the value of that ground truth information and propagate it across the broader dataset. All of these strategies increase our confidence that the models we have deployed are the best they can be with the resources we have; in other words, we are able to maximize the leverage of our data science and domain expert labor across the uncertainty-quality tradeoff.
Managing Enterprise Risk
Finally, as a business we also think about uncertainty from the perspective of enterprise risk and the internal control frameworks we have in place to manage that risk. Even as a growing startup, we have invested time and resources into building out a robust Model Risk Management framework, established on many of the same guiding principles that financial institutions follow when using quantitative models like credit risk models.
We have well-documented processes for reducing risk during all stages of model development:
- When we develop models, we follow documented policies around our development standards, testing procedures, and stakeholder review.
- We validate our models by using independent teams within the company and human review of model outputs.
- To help govern our modeling practice and overall data generation approach, we use a model inventory, follow a detailed change management process, and have clearly identified roles and responsibilities.
These operational investments reduce the risk that we inadvertently let entropy creep into our production system and gives us comfort that our process is working correctly.
We live in an uncertain world.
For a self-driving car to be a safe option, how safe does it need to be? 90% accident free? 95%? 100%?
Logically, the answer is it needs to be safer than the safest human driver. No self-driving car will ever be 100% accident free, but neither will human drivers.
The same holds for regulatory compliance. The size of regulatory data is too large, and the world changes too quickly, to ever know with 100% certainty every detail of every word across all regulators. Even if any business could afford to pay enough humans to read through every document multiple times, they wouldn’t be able to pay to rule out human-error.
But by acknowledging that we live in an uncertain world and by using state-of-the-art technology, smart process and frameworks, and the power of machine learning, Ascent can help financial institutions navigate the world of regulatory compliance more quickly and efficiently — and with a lot more certainty.
Enjoy this article? Subscribe to receive helpful content designed to help you win at compliance.
Modern challenges require modern tools. Interested in seeing how Ascent can help you automate horizon scanning, change management, and obligations management?