Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Browserprint: Browser fingerprint tool now can guess client OS even when spoofed (browserprint.info)
163 points by jerheinze on April 26, 2017 | hide | past | favorite | 89 comments


Guessing OS is pretty simple though, I recommend the book "Silence on the wire" [0] for a thorough explanation of passive network fingerprinting.

TL;DR is that the each TCP stack has unique characteristics that are hard to spoof (you'd have to bypass the OS TCP stack and build your own that mimics another) and definitely out of reach for tools that run in sandboxed environments (like browser extensions)

edit: Also, the author of that book, Michal Zalewski, made open source tool p0f [1] that implements some of those techniques to identify spoofed user agents.

  [0]: https://www.amazon.com/gp/product/1593270461
  [1]: http://lcamtuf.coredump.cx/p0f3/


Or just looking at what set of fonts the system has: that's pretty OS dependent. There are so many fairly trivial proxies for OS that detecting OS seems… uninteresting.


font enumeration is done by basically trying to use a certain font and then measuring the div it's writing to. Also can be done a bit fancier drawing to a canvas element and then taking a fingerprint of it (but presumably slower?).

It can be spoofed from a browser extension by messing with the results from the measurement or hooking into core APIs.

Plus you need a font list to begin with, you can't just look at the fonts the system has installed just from javascript.


Yes, indeed. But you know what fonts each OS ships out of the box, so all you need is the set that's the union of those and then you have your fingerprinting. (Canvas will probably be slower, but I expect not by as much as you might suspect.)

I don't see how you can mess with the results of measuring successfully, though, at least not without breaking things. You'd have to make CSSOM lie all over the place to avoid it.


The simple thing would be for the browser to taint Javascript values derived (at whatever remove) from the CSSOM, and then block all network APIs from accepting such values.


I can think of fun ways around that. For example, using setTimeout() and then Date.now deltas to communicate numeric values. Or communicating data via UI events (you have to be able to send network requests in response to UI events, for obvious reasons).

It wouldn't be possible to do that anyway without breaking important things like infinite scroll. Infinite scroll fundamentally requires network requests to be issued when an element is scrolled into view, but whether an element is in view depends on the results of layout, which depends on the user's installed fonts…


> but whether an element is in view depends on the results of layout, which depends on the user's installed fonts…

It'd be kind of interesting if you could only ask about the CSSOM in terms of what the page would look like if rendered with a known set of {fonts, visited links, whatever else is a security leak} rather than asking what it does actually look like—with the browser keeping two render-trees in memory for metrics (the real one, and your hypothetical one) but only actually rendering the real one.

Then, you could synchronize page-manipulation events between the two render-trees, by trying to re-synthesize things like viewport/scroll-offsets and mouse positions, such that everything "will have been" in the right position in one model to end up clicking on whatever element ended up being clicked on in the other model.

Very inefficient, but kind of interesting.


If you have multiple columns of text, which column are you matching the scroll position up to? I'm pretty sure even something that inefficient isn't going to work. :)


>important things like infinite scroll

No.


TL;DR is that the each TCP stack has unique characteristics that are hard to spoof (you'd have to bypass the OS TCP stack and build your own that mimics another) and definitely out of reach for tools that run in sandboxed environments (like browser extensions)

If you are behind a NAT, the TCP/IP stack of the NAT machine will probably present some of its characteristics too.

It is also possible to modify your TCP/IP stack settings so it behaves like something else, a simple search for "defeat TCP fingerprinting" or similar will be a good place to start.

I remember reading about a few universities whose networks would, via fingerprinting, identify your OS and only Windows machines would be required to install some --- intrusive, invasive, and flaky --- additional monitoring software, while Linuxes were allowed completely open access. The solution was obviously to make your machine look like Linux, and this was not hard to do with a few registry tweaks, if I remember correctly.


> you'd have to bypass the OS TCP stack and build your own that mimics another

So, the Snabb Switch sort of thing?

I'm guessing that active layer-4-or-above proxies would also ruin your fingerprinting ability (so people behind corporate firewalls would be un-fingerprint-able.)

And, possibly, API clients running on VM instances in clouds that use software-defined networking, might "look like" the SDN infrastructure, rather than like their VM.



$24 for me, perhaps i have a richer looking browser ...


Thanks for the heads-up, I fixed the link to point to the Kindle edition!

Amazon has indeed gotten called out for these types of shenanigans in the past but that was a long time ago! https://en.wikipedia.org/wiki/Amazon.com_controversies#Diffe...

I missed this related discussion last month: The High-Speed Trading Behind an Amazon Purchase | https://news.ycombinator.com/item?id=13963743


$13.60 Kindle. That's a lot of variation.


$24 paperback, $31.95 kindle :)


$19.55 kindle for me


$21.22 from Lisbon


$18 on kindle


$17.25! Amazon thinks I'm cheap. It's not wrong.


Sniffing (Firefox http website) traffic with Wireshark on Ubuntu vs OSX and you'll notice there's extra null flags unique to OSX.

Can't imagine why..?


Thanks for the book recommendation!


I had a project I did for university a few years back, and we'd identify the browser or application just by looking at the timing information between packets (without looking at ports, source/destination, etc.).

We could identify malware with around 85% accuracy, which was pretty good without any other marker.


That sounds really interesting, do you have any publicly available documentation or articles about it?



Wrote my thesis about passive and active fingerprinting, it's very easy to do, most operating systems network stacks have different default values like window size, ttl etc. p0f[1] was pretty good back then.

http://lcamtuf.coredump.cx/p0f3/


My fonts gave me away... Damn fonts, I need those for various design files I open. Any way to limit my browser's access to my system fonts?


Version 52 of Firefox lets you use a whitelist http://www.ghacks.net/2016/12/28/firefox-52-better-font-fing...


This works. (Don't forget to disable Flash.)

Canvas and Character Sizes are still making me fairly unique... Any ideas there?


Canvas is trickier! There's some add-ons out there that disable it in various ways so they're your best bet I'd say (assuming canvas not working gives away less information than it does normally, it's hard to tell). For character sizes I'm not sure if there's many useful defences against it, I would have thought it'll depend on a number of things, the Tor Browser might defend against it well, you'd have to give it a look.


Yeah, I can't find much that gave any meaningful protection.

I tried various Firefox and Chrome extensions, tried Tor...

The problem is that at a certain point with security, everything just stops working.

Wasn't able to get any sort of meaningful protection that still let me do much of anything... including run the Browserprint tool.


Sounds like the same problem I've run into in my college project :). It's really tricky to get the balance right with privacy/usability, unless the browsers put more work in to it one of the only good options is to either constantly switch browsers or use different browsers for different types of browsing.


"An error has occurred" while trying to fingerprint my browser in iOS, (not with the browser, but their toolset). Guess it failed to fingerprint me technically hah.


Same here, Safari on iOS 10.3.1. I was curious about the result on the iOS because there are not many things you can customize on Apple devices.


Is lower-level fingerprinting enough to detect the difference between ARM / x86 linux?

How far would I have to go to setup a truly legit honeypot on a Raspberry Pi? Is anyone already doing this? The following article doesn't get into userland IP stack:

https://www.redpill-linpro.com/sysadvent/2016/12/19/raspberr...



it guessed I had a variant of Linux, yet I'm running FreeBSD with no spoofing of any kind.

(which is corroborated in both the user agent and the javascript uname sections)


It guessed I had a variant of Windows, yet I'm running a Linux with no spoofing of any kind.

By transitivity FreeBSD is a subvariant of Windows ... or maybe not.


"Your user-agent string specifies your browser as being a variant of FIREFOX. Judging by your fingerprint we believe your browser is a variant of FIREFOX. Your user-agent string specifies your operating system as being a variant of UNKNOWN. Judging by your fingerprint we believe your operating system is a variant of WINDOWS."

And yet

User agent is parsed as "Mozilla/5.0 (X11; OpenBSD amd64; rv:49.0) Gecko/20100101 Firefox/49.0 SeaMonkey/2.46". Which is actually the case.


TorBrowser 7.0a3: indicates it's running on Windows, but my OS was fingerprinted as Linux. I'm actually running it on macOS.


Under a VM, or a BSD jail?


No.


It failed to recognize my browser as Edge, it thinks it's Firefox

http://browserprint.info/view?source1=UUID&UUID1UUID=fa204a9...


Wishful thinking?


I was a little concerned when it said I had a unique fingerprint out of the 25k tested so far, but then I remembered I'm spoofing a new user-agent every few minutes. It still managed to guess my true operating system of course :)


Perhaps I misunderstand you but I think you're placing too much trust in changing the user agent. The method they are using doesn't depend on the user agent string and ignoring the user agent could even improve its performance. Further, I think it should be relatively easy to detect spoofing of user agent strings. For instance, if your user agent says your using a Linux browser but your fonts include nothing but the standard fonts on OSX, it's pretty clear then that your spoofing your user agent.


I'm okay with the spoofing being detectable. The important thing is obfuscating myself to ad agencies. It's OK if I have a unique fingerprint, if that unique fingerprint is morphing every few minutes.

What is interesting is that my unspoofed user agent is 3x more rare than the spoofed one, even though the spoofed one usually throws browser versions that are out of date.

Unfortunately, my browser is still unique to the set of 25k whether spoofed or not. Enabling javascript helps a little, but then I can be audio fingerprinted which defeats the purpose.

I definitely have an exotic configuration. KVM / Firefox / No 3rd Party Cookies / Blacklisted social media sites / Addons (including NoScript) that take various steps to lock down information leaks and prevent loading of blocked resources. I don't allow web fonts which is probably fairly exotic as well.

If more would use script blockers and ad blockers maybe I wouldn't be unique, but it seems to be a trade-off between privacy and security. And I just kind of assume that privacy is off the table for now, so at least I can work towards having security.

If I have to choose between the two, I'm more concerned with malware and being tracked through 3rd party resources like Google Fonts, Google APIs (I cache them and prevent subsequent resource loading) than I am being fingerprinted.


The point of this project is to not use the user-agent for fingerprinting.


Are you saying it isn't worth spoofing your user agent?


I'm saying that user agents are a completely separate issue from what Browserprint is measuring and using for fingerprinting. Browserprint's fingerprint of your browser doesn't change when you change your user agent.

tmalsburg2 appears to have tried to make the same point.


I think we're all misunderstanding each other. I'm aware Browserprint isn't using user agents as their main source of information. It very clearly outlines the information they are using in the results.

I was just remarking about the uniqueness of my spoofed user agent vs a non-spoofed agent. After my initial post I went back and found I was still unique even without a spoofed agent. That's really all there was to my comment, I'm not insinuating that I was surprised to find I was uniquely fingerprinted by other means like font and plugin enumeration.



Haha I love that one, and of course it's very relevant.


This tool provided some more insight into what the EFF already revealed about my browser: Yeah, my Fedora 24 Chrome UA string is fairly rare, but it's a unique (or nearly unique) font stack that rolls up into 18+ bits of entropy to nail my browser as me. EFF says my browser reveals enough information to be unique, out of their 260K+ tests they've run so far. Depressing. I can spoof the UA, but my fonts and plugins are giving me away.

It's no surprise that Browserprint arrives at the same results with only 65K tests.


Yeah, and attempting to obfuscate your plugins give you away. AFAIK fonts are an unsolved problem. :(


My randomised user-agent happened to tell it the truth, but browserprint 'detected' that I was instead using a different OS and browser.

I'm also using a fingerprint-blocking plugin, which seems to be doing its job!


Randomized U-A makes you stand out.

You want single U-A that many other people use.


I'll double check my settings, but I believe it's randomised among a set common choices.

The may well be no additional value to that, though.


Both Hulu and Netflix block Ubuntu, so this sucks for people like me who use Linux as their primary media OS.

Hopefully this doesn't catch on or we have to find another way to spoof these sites.


Seems similar to Panopticlick which was released years ago: https://panopticlick.eff.org/


One of the first things listed on the page:

  Browserprint is a free open source project designed to 
  provide the same and better functionality as the original 
  Panopticlick.


What are the legitimate uses for fingerprinting?


We've made an online action website where people can vote on things and the people with most votes win prizes.

We thought it was a good idea to validate the user by email (by sending an email with a unique link, that when clicked, the vote was authenticated.) We thought it was good enough as a "security measure", but we thought wrong!

Some people made disposable email accounts and sent the emails to there, so there were some people with thousands of votes, while most only had a few dozen.

When looking in the database, we were glad that we stored some basic info like IP, Agent Strings and timestamps. These people were smart enough to (sometimes) change IP's but then when looking to the timestamps and agent strings, we saw that these people were cheaters with disposable email addresses.

We removed all these votes and the "winners" didn't win anything at the end, because they had much less votes than "normal" players. They started sending angry emails to the customer, and did not leave them alone.

We are currently making another action for that customer, but this time we have added browser fingerprinting, that checks a lot of variables (like fonts, canvas rendering, webgl, screen sizes, number of monitors, device pixel ratios, ...). This way we'll going to identify cheaters much easily, ... BUT you can still spoof it, so it's never a foolproof method of identifying users, it just makes our lives a little bit easier when there's some cheating going on ;)

// EDIT: some typos


I'm very curious to learn what website this is, perhaps via email. This type of thing sounds like a fun way to spend a few minutes. :)


"Legitimate" is a subjective term, obviously, but I've heard of it being used in anti-fraud tools [1] - presumably as one of a suite of inputs into a risk estimation model.

Of course, whether it's effective or just marketing is difficult to prove :)

[1] https://blogs.wsj.com/digits/2010/12/01/evercookies-and-fing...


It's still used pretty extensively but it's becoming less effective to identify fraud as the techniques have become more widely known.

These days the most effective use is guarding against account takeover. When logging in from an unknown device a user may go through additional authentication steps. Fraud ring counter this with man-in-the-browser attacks.

There are a couple of vendors that offer it as a standalone service or bundle it with other offerings.


Mostly ad tracking.


In general, sure, but ad networks don't care about OS spoofing.


No, the point is they can serve up targeted ads using browser fingerprint


Detecting OS despite a spoofed user agent is not relevant to that goal


Banks for example use this to help prevent unauthorized access to your account.


It also help with user experience. The more confidence we have that user is actually the real person the less challenges we have to use for authentication. If we see a new device, we're going to present more challenges(MFA,KBA etc) to protect our users from account takeovers.


The same reason you would ever want to identify an actor. Possibly he is abusing your system and taking advantage of his relative anonymity.


I'm guessing it could be used for googles recaptcha


None. Its blackhat territory.


Heh, my WebGL renderer "ANGLE (AMD Radeon (TM) RX 480 Direct3D11 vs_5_0 ps_5_0)" is unique. And character sizes o_0


Still hasn't guessed my MenuetOS box.


I keep failing the CAPTCHA. Why is this part designed so badly?


That is exactly what a bot would say.


So did I.

Did you have JavaScript disabled? I did.

At least Panopticlick gives me something useful w/o JS. Crickets from this site.


Apparently with firefox there is a request to enable flash (which i purposely dont install)

I'm led to assume adobe flash is the piece which actually divulges all the secrets about my machine. Not surprising.


Flash makes it easier to get some of the information like your system fonts, but JavaScript get the same information it's just more difficult. It has to cycle through a list of fonts and test for them individually rather than Flash which just gives the full list of fonts you have installed.


I don't think this is the case. It seems to be a part of the overall fingerprinting test and having Flash disabled didn't block me from viewing the final results.


There are a lot more sneaky ways such as http://diracdeltas.github.io/sniffly/ that are often combined


Hmm. Doesn't seem to work on iOS. ;)


damn, it doesn't work on lynx. should i update my browser?


[flagged]


Worked for me on 3 different browsers.

Are you using lynx?


Fails on iOS safari for me.


[flagged]


dammit you tricked me into googlling 'palecoon'. suckered.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: