1st party ads are not unblockable. They only lack one aspect that helps identify...

saagarjha · on Nov 20, 2019

At some point, you’re going to have to apply spam detection techniques rather than whitelist/blacklist ones.

anoncake · on Nov 20, 2019

As long (and where) labeling ads is mandatory, there will always be a way to identify them.

ekianjo · on Nov 20, 2019

So what is the strategy to deal with legitimate 1st party subdomains and tracking/ads subdomains if they use random strings as identifiers? (I am guessing this is where we will need a combination of crawlers and machine learning algorithms)

bscphil · on Nov 20, 2019

Couldn't you blacklist all subdomains of the 1st party and whitelist the few that are actually real?

Or, assuming they have a small list of subdomains that redirect to ad servers, you could generate a list with a script that checks all their subdomains and creates a block list based on that. For example, the site discussed in the OP has all their subdomains listed here: https://crt.sh/?q=%25.liberation.fr

Edit: looking at the OP case, it seems like they only have one ad domain. I'm not sure I see this as a serious issue until multiple sites start rolling out thousands of subdomains, some pointing to back to the real server, others pointing to the ad server. Maybe that will happen but it's a pretty big barrier to entry, and just short of proxying everything through the 1st party.

uxp · on Nov 20, 2019

> whitelist the few that are actually real

I'm speculating that the balance is in the reverse favor. Last night I was looking at some file on GitHub which was redirecting to what looked like an S3 bucket subdomain named with a pattern like "github-production-f7e281a2", which I simply presumed to be cache-busting via subdomain instead of appending the hash to the filename. If my assumptions were correct, every time GitHub deploys a new build, you would have to whitelist that subdomain.

XorNot · on Nov 20, 2019

Looking for suspiciously high entropy values compared to ones native language would be one way.

TonyTheSlayer · on Nov 20, 2019

Devil's advocate: then instead of using subdomains with randomly generated strings, we use words from a dictionary instead.

roptat · on Nov 21, 2019

that won't work: for instance https://twitter.com/aeris22/status/1193644687950860289 (securite means security/safety in French, but that subdomain is a CNAME for smartadserver)

Moru · on Nov 20, 2019

Then we block those words :-)

0xC0ncord · on Nov 20, 2019

You would have to block entire wordlists to combat subdomains like that. It would make more sense to whitelist subdomains instead, but it would require much more effort in order to determine what subdomains are required for the website to function. Additionally, if the site in question ever decided to change anything around, someone would have to catch the breaking change and have it corrected on the whitelists for the site to function again.

squiggleblaz · on Nov 20, 2019

How do you know what words to block?

ekianjo · on Nov 20, 2019

Machine learning by analyzing what displays on the page by blocking different domains. Bots can be automated to do that continuously and update a decentralized database with such information.

philjackson · on Nov 20, 2019

Chrome isn't open source, so not much chance of that happening.

behringer · on Nov 20, 2019

https://www.chromium.org/

colejohnson66 · on Nov 20, 2019

Chromium is not Chrome. Chrome is based on code for Chromium, but that’s where the similarities end.

behringer · on Nov 20, 2019

Chromium is Chrome with only a few proprietary bits removed. It's essentially Chrome everywhere that it matters to this particular discussion.

epapsiou · on Nov 20, 2019

And why would you use chrome instead of chromium? Stick to Firefox and Chromium.

colejohnson66 · on Nov 20, 2019

Because one likes the features available only in Chrome? I haven’t check recently and don’t know if this is still accurate, but Chromium used to not have the PDF reader and DRM support (for Netflix, etc.)

behringer · on Nov 20, 2019

There are a number of foss and proprietary pdf readers for chromium/chrome. There are also netflix apps outside of chrome. You don't have to use Chrome...

geofft · on Nov 20, 2019

That's like saying that the Ubuntu kernel is based on code for Linux.

rndgermandude · on Nov 20, 2019

Yes, pretty much, except that Canonical is nice enough to open source their patches. And they layer a ton of patches on top of the official kernel trees, mostly backports but also some new features. Their linux_5.0.0-36.39.diff is close to 35MB.

And remember the time when Debian layered some changes on top of openssl? http://faq.caslavka.cz/attachments/196/randomness.png

Now, what changes does Google layer on top of chromium to make Chrome? Do you know exactly?

geofft · on Nov 20, 2019

Yes, it's pretty easy to disassemble it and find out. It's basically auto-updates, some closed-source extensions like Chromecast (although you can manually download the Chromecast bits for Chromium if you'd like), some branding differences as compile-time #defines. https://chromium.googlesource.com/chromium/src/+/master/docs...

(Do you know that all of Chromium is in fact open source? Have you looked at the source and the build process? Are there any parts in it that are actually precompiled binary blobs?)

colejohnson66 · on Nov 20, 2019

Does Chromium contain the DRM needed to play sites like Netflix? I know many here are against DRM, but it’s necessary if I want to use my Netflix.

behringer · on Nov 20, 2019

Netflix will play a 720p version if you don't have a DRM supported browser. Also netflix distributes native apps to all platforms, which means you don't need chrome to use netflix in full fidelity on any device except maybe linux?

singron · on Nov 20, 2019

Widevine is a DLL bundled with Chrome. You can copy it into a Chromium installation and use Netflix. I don't know if this violates TOS/licenses/law.

JoeSamoa · on Nov 20, 2019

You realize the extensions can track you too?

buraktamturk · on Nov 20, 2019

This problem can be solved easily with using an open-source extension that has reproducible builds. Make sure it doesn't have built-in tracker as easy as looking to the source code. And we can make sure the final hash (without software signature blob) of the extension is the same as your built, so it is not tempered before uploaded to the extension store.

cf141q5325 · on Nov 20, 2019

You can also track people if they install your adware.exe. The emphasis on install. What software you install is an entirely different threat scenario then visiting a website.