Juan F. Samaniego
In addition to cookies, there are other widely used web-tracking techniques that are not well known to the public, such as web beacons
The General Data Protection Regulation was approved by the European Parliament in 2016.
Only a small percentage of the 500 most visited websites in Spain (which include everything from government sites to streaming and adult content platforms) correctly fulfil the requirements set out in the General Data Protection Regulation (GDPR). This is one of the main findings of a study involving researchers from the Universitat Oberta de Catalunya (UOC), the University of Girona and the Center for Cybersecurity Research of Catalonia (CYBERCAT).
The results, which are published in open access in the scientific journal Computers & Security under a Creative Commons licence, were reached using novel automated methods for analysing web-tracking techniques and compliance with internet privacy regulations.
Widespread non-compliance with privacy laws
The European Parliament's approval of the General Data Protection Regulation in 2016 was set to forever change how companies, websites and digital platforms manage users' personal data. The European regulation, which was transposed in Spain as the Organic Law on the Protection of Personal Data and Guarantee of Digital Rights in 2018, was supposed to mark a turning point in the protection of citizens' privacy. However, six years later, the actual implementation of this regulation is progressing at a faltering pace.
According to Pérez-Solà, an expert in web security and privacy, "The purpose of all these techniques is usually to track the online behaviour of web users in order to create profiles that can then be used to adjust the advertising that will be shown or the prices that will be offered for services or products." The analysis carried out by the researchers from the UOC (Pérez-Solà and Albert Jové, course instructor in the UOC Faculty of Computer Science, Multimedia and Telecommunication) and the University of Girona (David Martínez and Eusebi Calle) shows that only 8.91% of websites that obtain users' consent as required apply this consent successfully in practice.
New algorithms to analyse compliance with the GDPR
Beyond the analysis results, the importance of this research lies in the algorithms used to study compliance with online privacy laws. The sheer number of pages and platforms on the internet makes it imperative to automate the process, as studying each case manually would be infeasible. Besides, some of the web-tracking techniques used are extremely hard to detect, with no clear markers to indicate their presence. To overcome these challenges, the researchers developed a proprietary method involving four algorithms and a measure – the Websites Level of Confidence – to assess the state of regulatory compliance.
Each of the algorithms used by the researchers has a well-defined function:
- • The Consent Inspector Algorithm (CIA) captures clear images of the website's cookie banners and identifies buttons that should allow users to customize the use of these tracking elements.
- • The Website Evidence Collector (WEC) gathers information on the different web-tracking techniques being used on each website.
- • The Cookies Detector Algorithm (CDA) categorizes the cookies that websites use in the browsers without user consent, based on the data provided by the WEC.
- • The Web Beacons Detection Algorithm (BDA) not only extracts web beacons detected by the WEC, but also identifies browser fingerprinting techniques.
"Our study focuses on analysing compliance with the General Data Protection Regulation by the most visited websites in Spain," Pérez-Solà added; "We selected the 500 most visited websites according to the Alexa ranking and analysed their use of these web-tracking techniques as well as the information they give to users and the alternative options they provide them with. Finally, we compiled the results of this analysis into a measure, the Websites Level of Confidence, which makes it possible to assess the current state of compliance."
"Understanding the details of the regulations that apply at any given time and knowing how to tell what techniques a website is using are beyond the grasp of most users," she concluded; "Our proposed Websites Level of Confidence (WLoC) measure provides users with insight into the compliance status of the most popular websites and lets them see how it changes over time without the need for legal or technical knowledge."
This research supports Sustainable Development Goal (SDG) 9, Build resilient infrastructure, promote sustainable industrialization and foster innovation.
+34 659 05 42 39
David Martínez, Eusebi Calle, Albert Jové, Cristina Pérez-Solà, Web-tracking compliance: websites' level of confidence in the use of information-gathering technologies, Computers & Security, Volume 122, 2022, 102873, ISSN 0167-4048, https://doi.org/10.1016/j.cose.2022.102873.
The UOC's research and innovation (R&I) is helping overcome pressing challenges faced by global societies in the 21st century by studying interactions between technology and human & social sciences with a specific focus on the network society, e-learning and e-health.
Over 500 researchers and more than 50 research groups work in the UOC's seven faculties, its eLearning Research programme and its two research centres: the Internet Interdisciplinary Institute (IN3) and the eHealth Center (eHC).
The university also develops online learning innovations at its eLearning Innovation Center (eLinC), as well as UOC community entrepreneurship and knowledge transfer via the Hubbik platform.
Open knowledge and the goals of the United Nations 2030 Agenda for Sustainable Development serve as strategic pillars for the UOC's teaching, research and innovation. More information: research.uoc.edu.
Cristina Pérez Solà
Researcher and professor in the Faculty of Computer Science, Multimedia and Telecommunications
Course instructor in the UOC Faculty of Computer Science, Multimedia and Telecommunication