The captcha is basically a question to add two 2-digit numbers, e.g. 23+45. The question is returned as an SVG picture.
The hypothesis is that the need to engage brain even for 3 seconds is enough to stop 99% of the spammers, so the quality of comments should shoot through the roof.
The way it works is each comment has a sha1 hash and from that hash we can derive the 2 numbers. No state needed.
Have you tested that hypothesis? In my experience, the real annoyance comes from people that are very persistent and aren't easily deterred by requiring a small effort. Captchas slow them down (by the time they need to solve the captcha), but they'll also slow down everyone else.
Adding effort so you have to really want to comment may be a good idea to reduce the amount of comment, but it might also deprive you of valuable comments because people don't care that much about sharing some information that they'll jump through hoops to do it.
That's what I'm trying to test. Last time I've got 800 mostly meaningless comments.
The idea is to stop numerous zergs: you can't ban them one by one, but a little bump on their way can do the job.
To avoid annoying more serious commenters, the captcha should be as simple as it can be. So asking to add 321+625 would be too much, while a question like 2+4 doesn't need any thinking, so I chose two-digit numbers.
This is a small social experiment. I don't know how it'll go.
Or rather time-delay moderation, as it's easier to implement. Comments are added to the server as usual, but the web client shows them only after 1 hour.
I think most of the spammers can be deterred by a simple puzzle, like 23+47. If we want to raise the bar, we make the puzzle more and more complex. Obviously, the puzzle is returned as an svg picture where the letters are "rendered" with little squares. My point is that 99% of the spammers out there are lazy and won't be able to pass this little test vs someone who's written a thoughtful comment and can definitely add 23 to 47.
Em.. "off-the-shelf OCR" sounds neat, but anyone who knows such words isn't an average spammer. The goal of basic SVG puzzles is to block 99% of the spammers who just type dumb comments on keyboards. The rest 1% can be taken care of by human mods.
TBH, I don't like the reCAPTCHA-like solutions. They are just annoying from my personal experience and if they rely on any 3rd party service, I'll give them a hard pass for this reason alone. My approach is to use trivial SVG-style captchas with adjustable complexity, e.g. instead of asking "23+34", we can ask "log(32)/log(2)" and effectively filter out everyone except people familiar with math, or "md5(2615), first 7 hex digits" and let in only people familiar with cryptography. Forcing users to detect birds and crosswalks will just make them upset, IMHO.
It's completely dependent on the traffic of the site if a spammer takes the time to break a custom captcha.
I work on a site with 10 million monthly pageviews and spammers register on a form that has recaptcha and email verification... and we tried hidden input fields and other tricks, but each day we have consistently had 5 new spam accounts. With SVG they can just take a screenshot of what a user sees and send that to OCR. Complex math will turn away as many legitimate users as spammers.
The only real way to stop spam is to use a 3rd party API to detect it, or use something like a karma system that builds up over time. I think we're at the point where simple solutions won't work well unless you have a small site.
That's true when we talk about 10M monthly pageviews, but I doubt that this little extension will reach such popularity levels. If this somehow happens, by that time there will be a way to enable 3rd party captchas for any page.
The catch is that the text will be represented as small geometric svg shapes, so the spammer will need to first render the svg to png and then run text recognition tools. But in that svg we can easily add some css animations that make sure the entire image is never rendered, so spammers will need to run the entire browser to take screenshots and will need to assemble the image from multiple frames.
Won’t it have to be converted to a raster image before it can be OCRd?
Granted all you need to do is render it to a canvas but that’s an extra step on top of everything you need for a raster image, I’m not sure it’s easier.
And just rendering to canvas may be very tricky if the captcha is animated with css, i.e. it moves a bit and different parts of it appear at different times.
It can in theory be solved, but the more important question is: will it be solved?
I used a bunch of one-word-answer questions for over a decade now for sucessful spam prevention — trivial for a determined attacker with the time and resources to circumvent (and similarily trivial for me to replace with something else).
This also means for a decade I didn’t ship my user data to google.
Unless you are a really juicy target fending off the bots is enough.
That's right. My first attempt was to use IPFS or DAT. Figured out it's not quite possible, but we can get very close to that, in theory. Imagine the extension or the iframe could run a ipfs.js or dat.js that would discover all the http servers with comments via DHT: servers that want to participate, publish a unique key to the DHT and the web clients discover this key and then the IP addresses of the servers. In practice, this doesn't quite work because DHTs are based on the assumption that any node can quickly ping (with a UDP packet) any other node and thus perform the DHT discovery using the Kademlia algorithm in log(N) steps. But in the web, the only way to "ping" someone is to set up a p2p connection with WebRTC: this not only needs a signaling relay, but also implies a multi step exchange with SDPs and has other costly overhead. And I haven't even approached the Symmetric NAT problem. This is why ipfs.js hogs CPU, allocates a 1 GB and keeps 4-8 sockets always open (they aren't even p2p now, but rather web sockets to some relay, for perf reasons).
Also, it's already possible to have your own comntr server: you just need to git clone the comntr/http-server repo, npm install & npm start it, and tell the iframe to use your server with ?srv=https://foobar.com:42751. This would be an "on-prem" solution.
My first post about the web extension idea got some interest (and almost 150 stars on github!), so I've made the next logical step: an <iframe> that renders the comntr.github.io page and effectively adds comments to your page. The spam problem is partially addressed by the filters: the iframe.src can have a special ?filter=[...] param that can hide some of the comments you don't like. Although it's unlikely one can bypass the security model around those filters (but you are welcome to try!), advanced spammers can still post a lot of garbage to those comments as they can easily generate new ed25519 keys.
The hypothesis is that the need to engage brain even for 3 seconds is enough to stop 99% of the spammers, so the quality of comments should shoot through the roof.
The way it works is each comment has a sha1 hash and from that hash we can derive the 2 numbers. No state needed.
Sources of the service: https://github.com/comntr/captcha