4/26/17 · Research

A system is being developed to automatically conceal confidential information in text documents

The method offers a degree of accuracy comparable to the manual process currently carried out by security experts and is even more exhaustive
Photo: Unsplash/Lluis Llerena

Photo: Unsplash/Lluis Llerena

The researcher David Sánchez, from the CRISES-UNESCO Chair in Data Privacy of the Universitat Rovira i Virgili (URV) Department of Computer Engineering and Mathematics, and the researcher Montserrat Batet, from the UOC's KISON research group, have designed a system which automatically detects and conceals confidential information in text documents. This allows documents to be sent to third parties without compromising privacy and maintaining the anonymity of the parties (individuals, organizations, etc.) the documents refer to.

Nowadays, personal data is of great use in many spheres, including research, commerce and planning. For example, the data stored in medical records are essential in order to carry out medical research; banking operations are the basis for conducting financial analysis, and the analysis of commercial transactions helps tailor the services that are being provided. Given that a large part of these data are confidential, the documents that contain them have to be protected before they are sent to researchers for use by the same. Using appropriate protection mechanisms is therefore essential in order to guarantee the privacy and/or anonymity of individuals.

Although European Legislation is very strict with regard to the transfer of personal data without the consent of those involved, in other countries, such as the United States, it is commonplace to demand and provide confidential documents in legal matters, employee dismissals, insurance policies, etc. However, in all cases it is necessary to guarantee that the documents provided do not disclose any confidential information that could be used for discriminatory purposes, for example.


The system deletes or replaces

Until now, the protection of confidential documents has required that one or more experts manually identify and delete words, phrases or sentences that could disclose sensitive or potentially discriminatory information. This process takes into consideration both sensitive terms, such as the name of a contagious disease, as well as groups of terms that allow the former to be deduced indirectly, for example, combinations of drugs or treatments that are only used in specific diseases. Applying such criteria is an arduous task and is not totally infallible, given its complexity.

The method that has been developed automates the entire process, making it possible to handle and protect the large volume of data currently used in research in an efficient way. To do this, the system analyses the information available on the Internet, which is what a third party could use as a knowledge base for deducing the confidential information in a protected document. It then protects the terms that could result in such deductions.

Tests have demonstrated that this method is more exhaustive and is just as accurate as a human expert. Furthermore, unlike the experts, the system not only deletes dangerous terms, but also tries, wherever possible, to replace them with more general and ambiguous concepts. For example, instead of specifying that a patient has pneumonia, it would state that a patient has a respiratory illness. This makes the protected document easier to understand and more useful in subsequent analyses than merely deleting terms.


Implementation for research purposes

So far, the method has been implemented in prototype software that has been tested on clinical documents in English. In the near future, there are plans to apply it to other spheres of knowledge and establish it as a professional tool that will be particularly useful for research.

This research project is part of the European CLARUS project on the privacy of cloud data, which is being coordinated by the URV and funded by the European Union's Horizon 2020 programme for 2015–2017. It is also part of the UOC's SmartGlacis project: “Security and privacy technologies for smart cities”, funded by the Ministry of Economy, Industry and Competitiveness.

Press contact

You may also be interested in…

Most popular

See more on Research