BLOG | OFFICE OF THE CTO

Data Masking and How It Differs from Data Leak Prevention

Lori MacVittie 缩略图
Lori MacVittie
Published November 20, 2023

There are many technologies that have risen in 2023 that deserve to be on any technologist’s watch list. Among them is data masking. Despite the similarity with data leak prevention in implementation, data masking and data leak prevention have two very different use cases. 

The latter has been a capability of every leading web app and API security solution for years. But data masking is just starting to make its need known, thanks to the rise of technologies like generative AI

What is data masking?

Data masking is a technique used to protect sensitive information by replacing or obfuscating the original data with fictitious or scrambled data that maintains a similar structure and format. This method is commonly used in situations where data must be shared or used for testing, training, or analysis purposes, but the actual sensitive information should remain confidential. Data masking helps organizations comply with data privacy regulations, reduce the risk of data breaches, and protect the privacy of individuals whose information is contained within the datasets.

What is data leak prevention?

Data Leak Prevention (DLP) is a set of strategies, policies, and tools designed to protect sensitive information from unauthorized access, disclosure, or misuse. The primary goal of DLP is to prevent the accidental or intentional leakage of confidential data, such as personal information, intellectual property, or trade secrets, outside of an organization's network or systems.

Apples to apples?

It may seem like the market is embracing pedantry and claiming that green apples are different than red apples. After all, both data masking and DLP tend to rely on the same technologies to “mask” or “obfuscate” sensitive data fields used by applications and APIs. 

The difference is two-fold. 

First, the primary users of data masking are developers, data scientists, and MLOps. They are employees or partners who need to test and train with or analyze real customer data. That puts customers at risk who would rather remain anonymous and may have been assured by a corporate privacy policy that they will. DLP users are ultimately the business. It is a corporate responsibility to comply with regulations that require masking sensitive information such as account and credit card numbers, and the business suffers when data is leaked. It can be argued that organizations employ DLP to protect consumers—and they do—but the primary driver is usually regulation. 

Second, DLP identifies and masks only a specific subset of personal information. When I get a bill my account number is masked, but my name and address aren’t. With data masking it is often the case that names, addresses, and other identifying information are obfuscated to ensure customers remain anonymous. This is particularly true when the use case is targeting analysis, where patterns and relationships are being sought across customers for marketing or forecasting purposes but there is a reason to not identify specific customers. 

Data masking should be on your watch list

If you’re making a “watch list” of technologies for 2024, then data masking definitely deserves a place in the top ten. 

This is because of the broad applicability to many efforts—but especially those that are leaning toward analysis and training ML models to glean insights about customer behavior or uncover patterns that inform business strategy. 

As generative—and traditional—AI has begun to seep into every product and service on the planet, consumers have become increasingly aware of the need for privacy. Being able to mask sensitive data will allow a business to both push forward with AI initiatives and satisfy their customers’ need for privacy.