The Best of all Worlds

3 min readJan 25, 2021

Keep it safe

We always hear about sensitive information and the need to keep it hidden and safe. It seems pretty logical that organizations need to secure their clients’ data and strict litigations made it perfectly clear that any breach of sensitive information leakage would prompt fines and restrictions. Advances in litigations and standardizations in these areas developed a myriad of solutions for taking care of the problems, but most organizations only look at one side of the coin, and while they enforce restrictions and limitations on the clients’ data, the organizations’ own sensitive data is sometimes left unchecked.

The organizations’ sensitive data may be their information on their own employees, information that may be valuable for phishing attacks and even samples of their own source code. Attackers can use this information to further their attacks on the organization, and in extreme cases, even impersonate organization officials and attack their clients and cause reputational and financial damages. To mitigate this problem, most organizations know that this information should be secured and protected, and they are actively doing so, but sometimes general protection causes other unforeseen problems.

Needles in haystacks

The first problem, which sometimes overwhelms organizations, is the sheer volume of sensitive data that was accumulated. Sensitive data is located everywhere in the organization's systems, from personal workstations to development and production environments. The organization needs to classify each file if it contains sensitive information or not. Finding each shred of sensitive data seems like a daunting task for an organization of 100 people, so what happens when we add another 100 hosts, and what about 1,000?

The vast network of hosts makes it almost impossible to search each and every computer for sensitive information. To solve this problem, we just protect everything, and hereby lies our next problem.

The second problem is that by protecting everything, we sometimes harm our own business. Organizations have limited resources and each security resource that they use means that another resource cannot be implemented, resulting in strict and global enforcement on the system. This method may protect the sensitive data, but unfortunately, other business flows may be slow and incomplete. Therefore, there must be a balance between sensitive data protection and data found. Thus, we need to automize the search for sensitive information by using rules and patterns. This method enables organizations to focus their protection only on required hosts and environments.

This is the tricky part. Creating rules and dictionaries of keywords is an arduous job, which requires pinpointing all necessary keywords and their permutations. Creating those dictionaries may take several months and even after finishing the job, researches show that dictionaries and rules may find up to 60% of sensitive data since those rules and patterns do not “understand” the data. So, it seems that we still need a human to go over the files and we neatly returned to square one. How do we solve this problem?

Technology to the Rescue

The problem can be solved by using the best of all worlds and our understanding of AI. To search vast amount of data we are required to automize the process, but the system must understand the meaning of each file and discern if it contains sensitive information or not. To solve this problem, solutions may use Natural Language Processing (NLP) algorithms that enable the system to go over the files and just like a human, decide if the file contains sensitive information. Since sensitive information can come in all shapes and sizes, the solution must also use machine learning to add more information to its databases and assessments and gradually minimize errors in tagging to fit myriad organizations and their data.

Want to talk about my story?
Don’t be afraid to reach out at haviv@suridata.ai

The Best of all Worlds

Written by Haviv Ohayon