Manifest

Vision

Pixelscan is a one-and-done solution to detect internet bots. 99.5% of bots are detected instantaneously (in less than one millisecond). The remaining 0.5% are detected in less than one second through additional security checks. We strive for 99.99% accuracy in bot detection so you can rest easy by knowing only real users are present on your platform.

Along with detecting outright bots, Pixelscan may be used to recognize manually-controlled browsers with irregular connections between browser fingerprint parameters. For example, if the user-agent of a visitor displays Windows, but the rest of the parameters lead to MacOS, Pixelscan will pick up on those inconsistencies.

Inspiration

As firm advocates of internet privacy, we were originally inspired by basic tools which allow a user to check his own internet privacy profile, such as the Electronic Frontier Foundation’s Panopticlick test. However, although helpful, tools such as the Panopticlick test have various flaws. For example, the Panopticlick test takes ages to run, which is unrealistic in a commercial environment. And, due to the Panopticlick test using open-source browser fingerprinting scripts, the results are often simplistic and inaccurate. Such tools should be considered more as a jumping off point to understanding internet privacy rather than a viable solution for commercial applications.

Problems with existing device identification tests

Almost all popular bot detection systems contain various flaws which lead to a variety of issues, including but not limited to inconsistent detection, resource overload, and easy exploitation.

Harvesting useless data

Many parameters can be revealed from a user’s browser, but not all of them contain enough uniquely identifiable information to be useful for bot detection, neither in a standalone fashion, nor when combined with other parameters to detect browser fingerprint inconsistencies. On a small scale, overcollection of this unnecessary data is not a big deal, but in a commercial environment, a few bits of unnecessary data multiplied by millions of stored results can easily result in a large data storage bill for any commercial system.

Examples: browserleaks.com, deviceinfo.me, f.vision, whoer.net

Computational power abuse

Tests that put heavy load on a visitor’s computer can be quite precise, but in commercial traffic and analytics environments, they are not viable for two reasons. Firstly, executing resource-heavy client-side scripts can ruin user experience, especially for users on less-powerful devices. And, unauthorized abuse of visitor resources may result in public relations troubles, especially if a platform is well-known or heavily trafficked. For example, if you visited Facebook, then needed to wait 10 seconds before the page loaded, that would be terrible from a user-experience standpoint, and Facebook users may start to question what the heck is going on during those 10 seconds.

Example: uniquemachine.org

Methods which are not universally applicable

Many device-specific identification methods will provide accurate results on a specific set of users using such devices, but will be useless outside of that specific set of users. For example, solutions revolving around WebGL 2.0 parameters will not work at all on machines which support only WebGL 1.0.

Similarly, font list discovery through Adobe Flash will not work on browsers that do not have the Flash plugin installed. Even if the Flash plugin were hypothetically installed on every user’s browser, activation of the Flash plugin would require user consent, similar to a geo-location check, which is run through the browser API.

So, while these specific tests may provide accurate results, they are not viable in commercial environments, since developing many different tests for many different sets of users would result in enormous overhead costs. Almost all publicly-known tests contain at least a few of these problematic device-specific identification methods.

Example: uniquemachine.org

Pinpointing flaws in a specific solution

Systems which detect bots and browser fingerprint inconsistencies contain bugs. Similarly, privacy solutions aimed to bypass such systems also contain bugs. In other words, all software contains bugs. Some tests rely on finding and exploiting these bugs in order to detect bots, which may be a viable short-term solution in some cases. However, in commercial environments, such an approach is unrealistic, as the R&D costs to find these bugs would be far too high to be economically viable.

At the moment, all such bug-related tests are offline, but they pop up every once in a while. One such test used to be hosted at anonymity.space/hellobot.php.

Prone to easy manipulation

Many bot detection tests discover only parameter values, and in discovering parameter values, they are sometimes quite accurate. However, these tests do not compare parameters with one another. Without comparing parameters, such tests are very easy to exploit, as if the creator of a bot is able to figure out a combination of standalone parameters that do not set off any red flags, he will be able to fly under the radar of the test with ease.

Example: https://oswg.oftn.org/projects/core-estimator/demo/

Illegal or controversial methods

Methods such as port scanning are illegal in certain jurisdictions (publicly-available port scanners often do not mention this fact). Even if methods such as port scanning are technically legal in a certain jurisdiction, the commercial applications which use such tactics may still run into legal headaches, or just public-relations headaches, by using such methods. The best approach is to use these controversial methods only when it is completely necessary, or, ideally, to avoid them altogether.

Similarly, tests which aim to reverse-engineer a user’s browser history cannot be employed by reputable websites, as such methods directly breach the European Digital Privacy Framework (GDPR), which may lead to legal troubles. Reputable companies must be very careful with the methodologies they employ when bot detection is the topic of conversation.

Examples: whatleaks.com, f.vision, vektort13.space

Honeypots

Maximum profit with minimal effort is, of course, the goal of any commercial company. In the realm of bot detection, the easiest way to achieve that goal is by enticing the user to give up his online identity, then sending his data to an affiliated third party to perform the security check. If a company is caught doing this without user consent, legal troubles may abound.

Misleading interpretation

Parameters being misinterpreted leads to confusion. As an example, we can turn to whatleaks.com. In this test, if a user’s browser provides a public WebRTC IP, whatleaks.com marks it as a red flag and recommends the user block IP leakage altogether. In reality, using a clearly-fake IP address is better for online privacy than blocking IP leakage altogether is.

Another example of misleading interpretation can be found with the Canvas Uniqueness parameter on browserleaks.com. This parameter adequately shows the relationship between a user’s canvas fingerprint and his browser’s User-Agent, but it should not be used as a standalone parameter to reliably determine his level of internet privacy. For example, oftentimes, when a new version of Chrome comes out on MacBook devices, browserleaks.com will show 100% Canvas Uniqueness for anyone using the new version of Chrome on a Macbook. This score simply means that MacBook's typical canvas fingerprint perfectly matches the User-Agent value on Chrome. The user now believes his fingerprint to be unique, but in reality, millions of other people could be using the exact same fingerprint at any given moment (due to the fact that all Macbook users have the same canvas signature).

Problems with bot detection applications

Many commercial bot detection systems claim to use advanced technological tactics, such as artificial intelligence and browser fingerprinting. From our analysis, over 90% of commercial bot detection systems do not actually use these tactics, and instead rely on outdated concepts which simply do not work anymore.

Blacklists and whitelists

Some commercial bot detection systems use parameter whitelists and blacklists to provide results. This approach is inherently flawed and often does more harm than good. For example, if an IP address becomes blacklisted, then that IP address is assigned to another legitimate user, that legitimate user will now be seen as a bot. Even worse, if the system goes so far to blacklist a mobile IP address, thousands of legitimate users may be flagged as bots, as one mobile IP address may be used by thousands of users at any given time. Whitelists and blacklists, though perhaps once effective, are dinosaur-era technology today.

Easily-manipulated parameters

Most bot detection systems rely on collecting and analyzing browser parameters through a set of conditional filters. Savvy bot detectors may be able to figure out these conditional patterns in order to bypass them. To be more resilient than the average bot detection system, Pixelscan goes beyond simple conditional filters and focuses on relationships between easy-to-manipulate and hard-to-manipulate parameters.

Massive redundancy

A few bot detection systems which get excellent and accurate results do exist. They all share the same problem, though: they take up too much time and too many resources! This approach inevitably leads to a hefty bill, which is never ideal.

Our solution

We aim to create a universally scalable algorithm for bot detection. We certainly don't want to copy how most modern bot detection systems work, as most of these systems are already obsolete, or will become obsolete within months to years. We want Pixelscan to be viable in 2020, 2030, and beyond.

The goal of Pixelscan is to be fast, cheap, and resilient to manipulation, while also providing an unambivalent evaluation of any website visitor. Anyone, from the casual privacy enthusiast to the mega-corporation, will be able to implement Pixelscan for their own purposes. If you are already using an alternative bot detection solution, you may compare your results to those found by Pixelscan to see if your current solution is hitting the mark.

More than anything, we want Pixelscan to become an educational resource: a place where we publish and discuss new concepts related to internet privacy, browser fingerprinting, bot detection, and more. We have a long backlog of ideas we’d like to implement and we will be working tirelessly to make Pixelscan the #1 resource in the world for everything related to online privacy.

Your privacy

We do not track our users. Period. The only tracking we employ is Clicky analytics to get a general idea of our website traffic statistics. If you don’t want to be included, feel free to blacklist clicky.com using your operating system’s network configuration file or through a browser add-on. The tool will still function with all of its features (and we can still be friends).

Who we are

Pixelscan was created by a group of data scientists and engineers who have a distinct interest in frictionless digital identification. Like many privacy hobbyists, we have day jobs, but we contribute most of our free time to the development of Pixelscan. We love a challenge and we want to develop a cutting-edge technology that does not exist yet. What will we do once Pixelscan is fully functional? Frankly, we’re not sure. Perhaps we will make it open-source, or perhaps we will make it available for a small fee. Regardless, should you have any questions or suggestions, please feel free to contact us directly. We’re always looking to talk and collaborate with fellow privacy enthusiasts.