By Gabor Takacs
Web tracking is the practice by which websites and third-party companies collect information about users’ online activity. The basis of tracking is the accurate identification of users – you are detected and identified even when you’re just passing through a random website that you are not signed in to. The conventional solution to implement identification and tracking is saving web cookies to the user’s browser.
How does cookie-based tracking work?
Imagine that user Alice visits an online store and puts a T-shirt in her basket. At this moment, Alice’s user ID and the T-shirt’s product ID is saved to the browser as a cookie, enabling that Alice’s basket contents are known at the checkout page. Alternatively, it is enough to save only the user ID to the browser if the user ID/product ID pair is saved in the online store’s database.
The previous scenario sounds pretty normal, but cookies can be used for tracking purposes too. Imagine that Alice reads about antidepressants on a medical website. Then a third-party advertising company that has control over a small section of the website puts a cookie in Alice’s browser and records that she has read about product XY at time T. Assume that Alice visits a totally unrelated website that is also in contract with the same advertising company. Her previous activity can be tracked through the cookie, and as an unpleasant surprise, antidepressant ads pop up on the unrelated website.
The previous example shows why the application of third-party cookies is considered a questionable practice that violates users’ privacy. Major browsers have already started to take action against this practice. Safari blocks third-party cookies by default since 2017. Firefox has also done this since 2019, and Chrome plans to join them too.
Cookies can be blocked – what’s next?
As cookie-based tracking becomes more difficult, the tracking business is moving toward different techniques such as browser fingerprinting. The idea behind browser fingerprinting is to collect information about the browser and its environment for the purpose of identification. These attributes include the browser type and version, operating system, language, time zone, active plugins, installed fonts, screen resolution, CPU class, device memory and various other settings. The attributes are concatenated into a long string, and the fingerprint is defined as a hash value of the string.
One might ask how unique these browser fingerprints are. It turns out that they tend to be unique in the majority of cases. Curious readers can check it for their own browser at amiunique.org. If a browser fingerprint happens to be non-unique, it can probably be made unique by combining it with the device’s IP address. In other words, browser fingerprints are capable of fully or partially identifying users when cookies are turned off.
Browser fingerprinting uncovered
In order to catch real-life browser fingerprinting in action, let’s analyze some websites. Specifically, I will use Incognito mode Chrome so that all extensions are turned off. Although I try to present reproducible experiments, keep in mind that browser fingerprinting can be browser- or location-dependent, or it can be turned on only for a random subset of IP addresses. Also, the fingerprinter scripts sometimes get a version update. Therefore, 100-percent reproducibility cannot be guaranteed.
The script is loaded from https://script.ioam.de/iam.js. Here is the source code of the function:
The fingerprint string is accumulated in the variable t. The components of the fingerprint are the User-Agent string, installed plugins with version number, MIME types recognized by the browser, and ActiveX related information too, if the browser is Internet Explorer.
If we put a breakpoint on line 22 and reload the page, we can observe the final value of t. It is the following for my browser:
After applying the hash() function, the fingerprint becomes “94qaxn”. And it isn’t just mobile.de using this fingerprint() function. For example, immobilienscout24.de, spiegel.de and wetteronline.de also embed and run it.
First, the components of the fingerprint are calculated. Then, on line 16, a string is created from the fingerprint components and an integer hash code is computed using the function k. On my machine, the fingerprint string is
and the computed hash code is 641572758.
By placing a breakpoint on line 1 and reloading the website, we can double-check that this code is indeed executed. Then step-by-step execution allows us to investigate what happens here. Lines 1 to 11 prepare the dictionary f and fill it with various browser attributes. At line 12 the function Er initiates a chain of function calls. The first parameter of Er is a complex data structure a that was created before. One of its attributes is the array a.B that already has 40 elements when Er is called. The main effect of Er from our perspective is that it appends all key-value pairs of f to array a.B. Then the rest of the code augments a.B with other attributes. For example, line 38 queries the device memory from the navigator object and appends it to the end of a.B.
After line 38 is executed, the content of a.B is the following:
The first 40 elements of a.B contain attributes not related to fingerprinting, and they are not shown here. I have found no sign of computing a hash code from the fingerprinting related elements of a.B. However, this can be easily done on the server-side after the data is transferred to DoubleClick.
DoubleClick-based fingerprinting is also present, for example, on lequipe.fr, news.com.au, t-online.de and also on all previously mentioned websites (mobile.de, immobilienscout24.de, spiegel.de, wetteronline.de and lemonde.fr).
The browser fingerprinting landscape
Of course, the landscape of browser fingerprinting is diverse. Here are a few other fingerprinting functions along with websites that apply them:
Countermeasures against browser fingerprinting
Like most users, we believe that anyone should have the right to opt-out of any forms of web tracking, including browser fingerprinting. That is why we are working on algorithms that detect browser fingerprinting activities.
We collect and analyze known cases of browser fingerprinting and identify patterns based on them. The conventional method for detection would be to exactly match these patterns against websites and find the ones that apply known fingerprinting methods. However, more can be done with artificial intelligence. An AI-based fingerprinting detector is able to perform inexact pattern matching and detect novel fingerprinting methods. Users therefore get a stronger defense against browser fingerprinting.
About the author:
Gabor Takacs has an MSc in computer science and a PhD in machine learning. He participated in several data science competitions, including Netflix Prize and GE Flight Quest. He was a founder of Yusp – a recommender system company. Currently he is an associate professor at the University of Gyor, and chief data scientist at CUJO AI.