Abstract
1- Introduction
2- Background
3- Related work
4- Methodology
5- Measurement parameters
6- Results
7- Discussions
8- Conclusions
References
Abstract
Tracking is pervasive on the web. Third party trackers acquire user data through information leak from websites, and user browsing history using cookies and device fingerprinting. In response, several privacy protection techniques (e.g. the Ghostery browser extension) have been developed. To the best of our knowledge, our work is the first study that proposes a reliable methodology for privacy protection comparison, and extensively compares a wide set of privacy protection techniques. Our contributions are the following. First, we propose a robust methodology to compare privacy protection techniques when crawling many websites, and quantify measurement error. To this end, we reuse the privacy footprint and apply the Kolmogorov–Smirnov test on browsing metrics. This test is likewise applied to HTML-based metrics to assess webpage quality degradation. To complement HTML-based metrics, we also design a manual analysis. Second, we study the overlap of blocking resources between most popular browser extensions, and compare the performances using the proposed methodology. We show that protection techniques have vastly different performances, and that the best of them exhibit a wide overlap. Next, we analyze the impact of privacy protection techniques on webpage quality. We show that automated HTML-based analysis sometimes fails to expose quality reduction perceived by users. Finally, we provide a set of usage recommendations for end-users and research recommendations for the scientific community. Ghostery and uBlock Origin provide the best trade-off between protection and webpage quality. Ghostery however requires a configuration step which is difficult for users. The RequestPolicy Continued and NoScript extensions exhibit the best performances but reduce webpage quality. Ghostery and uBlock Origin use manually built blocking lists which are cumbersome to maintain. Research efforts should focus on improving existing approaches that do not rely on blocking lists (such as Privacy badger or MyTrackingChoices), and automatically building reliable blocking lists.
Introduction
The huge growth of the Internet comes along with an everincreasing advertising market. Internet users access content provided for free by publishers. Consequently, publishers monetize their audience through advertisement. Companies thus buy online exposure to promote their products. In order to maximize advertisement efficiency, advertisers tailor ads to users regarding their interests. To this end, advertisers leverage context (e.g. visited website) or previous browsing interests. Advertisers use techniques such as cookies to identify users across websites and build their browsing history. Other techniques have also been developed to allow advertising actors to communicate with each other (such as cookie syncing [1]), or circumvent cookie removal by respawning cookies using diverse types of data storage inside the browser (e.g. using Flash [2]). Browser fingerprinting [3] allows a tracking entity to follow a user across websites without any in-browser data storage. In response to these techniques, several counter-measures were designed. We can here quote the Do Not Track HTTP header [4] by which a user can ask not to be tracked. Browsers can also block some or all cookies. Finally, many browser extensions hinder third party tracking by preventing cookie creation and/or blocking requests to tracking services.