by Romain Alberca
For many years, data leaks have been the number one threat in cybersecurity.
According to the Cyberedge Report, in 2020, 86% of organizations have experienced at least one successful cyberattack.
Verizon's Data Breach Investigation Report further estimates that over the past 4 years, internally generated leaks account for 34% of corporate data theft.
What is a data leak and what are the risks involved?
A data leak is characterized by an unauthorized disclosure of data, malicious or not, intentional or not, compromising the confidentiality or availability of personal data.
These data leaks can be grouped into 2 main categories: external leaks (mainly malware) and internal leaks (access to data by unauthorized persons, but also internal mistakes or maliciousness).
Unfortunately, external leaks are now well known and more and more frequent. They are regularly due to technological vulnerabilities exploited by hackers. Whether it is a question of "phishing" (a technique aiming to lure a user by pretending to be a trusted third party), malware or spyware (spyware allowing cybercriminals to secretly access personal data), these techniques now concern all types of organizations throughout the world.
What happens to organizations if their data is leaked?
The risks are numerous:
First of all, a financial risk linked to a loss of activity, but also of competitiveness, without counting the potential costs in consulting or material to resolve these leaks. According to the Cost of a Data Breach report published by IBM, the global average cost of a data breach is 4.24 million.
Less quantifiable but just as important is the risk of loss of reputation and brand image. Companies that are victims of a data leak are obliged to report it to a supervisory authority, and sometimes even to the people or customers concerned.
Finally, the risk of complaints from the people concerned in 20% of the cases according to the firm Pinset Masons.
Aware of these risks, organizations put in place numerous protections against external attacks, but they very often neglect the internal risks.
Fortunately, there are different mechanisms to protect against them:
- DLP (Data Loss Prevention), a mechanism to prevent users from sending sensitive or critical data outside the organization.
- Information classification, to prevent unauthorized users from exposing or disclosing confidential information.
- User policy (principles of least privilege and need-to-know), which involves restricting the access rights of a specific user within the organization).
- Set up a data encryption process.
- Anonymization, a process that secures data outside of production. This is the reason why we will focus on the matter.
Prevent internal data leaks with anonymization
What is anonymization, and how does it differ from pseudonymization?
The CNIL (French administrative regulatory body whose mission is to ensure that data privacy law is applied to the collection, storage, and use of personal data) describes data anonymization as a treatment that consists of using a set of techniques to make it impossible, in practice, to identify the person by any means and in an irreversible manner.
Pseudonymization, on the other hand, allows data to be used, but in a reversible manner. It is therefore subject to the GDPR (General Data Protection Regulation).
When we think about data in companies, we obviously think about the production environment. This environment hosts many applications used by end users, such as a bank advisor in a financial institution. Data in production is in most cases very secure and does not represent a major risk of leakage.
But sometimes it is necessary to take data out of those production environment to ensure the proper running of a business, and this is when cybersecurity issues arise.
This is the case, for example, in training environments for new employees, or for analysis purposes such as BI (Business Intelligence), which is now considered the nerve center of a company's marketing activities.
Other data are also extracted from the production environment and transferred to service providers, which are among the most targeted by attacks (targeting a service provider potentially allows hackers to affect hundreds of companies).
And of course, software development and testing environments, which need realistic test data.
All this data is exposed without being secured.
Anonymization addresses these issues by transforming it into fictional but realistic data. This data looks real to users but does not reveal any personal or identifying information. Any leakage of this data would therefore have no impact, since it is not real data.
But anonymization is not limited to these use cases. It can also be applied on a need-to-know basis.
Indeed, it is possible to integrate the anonymization mechanism into your tools and to use, or not, anonymization depending on the profile.
Let's imagine the case of a developer in charge of an HR application. In this case, he needs to have access to consistent data, but in no way realistic, since it is personal and identifying. Anonymization therefore meets this need. As for the business profile (Human Resources Manager), access to this data would remain unchanged.
Anonymization meets many use cases in cybersecurity. This is the reason why the efficiency of this technique also makes it possible to bypass the data protection legislation (GDPR), since the diffusion or reuse of anonymized data has no impact on the privacy of the persons concerned.