Hacker Newsnew | past | comments | ask | show | jobs | submit | jsheard's commentslogin

> I figured that they have found an (automated) way to imitate Googlebot really well.

It doesn't matter how clever you are, you can't imitate Googlebot well enough to fool someone who knows what they're doing. The canonical way to verify Googlebot is to do a DNS lookup dance which will only ever succeed with a real Googlebot IP address.


Does Wikipedia really need to outsource this? They already do basically everything else in-house, even running their own CDN on bare metal, I'm sure they could spin up an archiver which could be implicitly trusted. Bypassing paywalls would be playing with fire though.

Hypothetically, any document, article, work, or object could be uniquely identified by an appropriate URI or URN, but in practice, http URLs are how editors cite external resources.

The URLs proved to be less permanent than expected, and so the issue of "linkrot" was addressed, mostly at the Internet Archive, and then through wherever else could bypass paywalls and stash the content.

All content hosted by the WMF project wikis is licensed Creative Commons or compatible licenses, with narrow exceptions for limited, well-documented Fair Use content.


Archive.org is the archiver, rotted links are replaced by Archive.org links with a bot.

https://meta.wikimedia.org/wiki/InternetArchiveBot

https://github.com/internetarchive/internetarchivebot


Yeah for historical links it makes sense to fall back on IAs existing archives, but going forward Wikipedia could take their own snapshots of cited pages and substitute them in if/when the original rots. It would be more reliable than hoping IA grabbed it.

Not opposed, Wikimedia tech folks are very accessible in my experience, ask them to make a GET or POST to https://web.archive.org/save whenever a link is added via the Wiki editing mechanism. Easy peasy. Example CLI tools are https://github.com/palewire/savepagenow and https://github.com/akamhy/waybackpy

Shortcut is to consume the Wikimedia changelog firehose and make these http requests yourself, performing a CDX lookup request to see if a recent snapshot was already taken before issuing a capture request (to be polite to the capture worker queue).


This already happens. Every link added to Wikipedia is automatically archived on the wayback machine.

TIL, thank you!

Why wouldn't Wikipedia just capture and host this themselves? Surely it makes more sense to DIY than to rely on a third party.

Why would they need to own the archive at all? The archive.org infrastructure is built to do this work already. It's outside of WMF's remit to internally archive all of the data it has links to.


I didn't know you can just ask IA to grab a page before their crawler gets to it. In that case yeah it would make sense for Wikipedia to ping them automatically.

Spammers and pirates just got super excited at that plan!

There are various systems in place to defend against them, I recommend against this, poor form against a public good is not welcome.

> It was running

It still is, uBlocks default lists are killing the script now but if it's allowed to load then it still tries to hammer the other blog.


Ah good to know. My pi-hole actually was blocking the blog itself since the ublock site list made its way into one of the blocklists I use. But I've been just avoiding links as much as possible because I didn't want to contribute.

It's also kind of ironic that a site whose whole premise is to preserve pages forever, whether the people involved like it or not, is seeking to take down another site because they are involved and don't like it. Live by the sword, etc.

Since Wikipedia established that the archiver tampered with stored pages I doubt migrating is on the cards, the trust in those archives has been burned regardless of who hosts them going forward.

AFAIK the only evidence of tampered pages are those involved in this controversy. That archive.xx could tamper pages has always been possible (and a reason why Wikipedia should have their own archive, and migrate ASAP).

But still, Wikipedia can corroborate any archive.xx page, and if they find a matching source, archive that instead.


I don't understand the need to go "Hey the only bad things are these ones i already have."

That doesn't lessen anything


Multi-pigment panels exist but in practice nearly all color e-readers still use the filter-based panels, because they are so much cheaper. There are zero Kindle or Kobo models with the multi-pigment technology.

The ReMarkable devices are E Ink Gallery 3 multi pigment display, I have one on my desk.

I did say nearly all, and the price of the ReMarkable Pros reflects on how expensive the Gallery panels still are.

Simons been doing this exact test for nearly 18 months now, if vendors want to benchmaxx it then they've had more than enough time to do so already.

Exactly. As far as I'm concerned, the benchmark is useless. It's way too easy and rewarding to train on it.

It's just an in-joke, he doesn't intend it as a serious benchmark anymore. I think it's funny.

Y'all are way too skeptical, no matter what cool thing AI does you'll make up an excuse for how they must somehow be cheating.

Jeff Dean literally featured it in a tweet announcing the model. Personally it feels absurd to believe they've put absolutely no thought into optimizing this type of SVG output given the disproportionate amount of attention devoted to a specific test for 1 yr+.

I wouldn't really even call it "cheating" since it has improved models' ability to generate artistic SVG imagery more broadly but the days of this being an effective way to evaluate a model's "interdisciplinary" visual reasoning abilities have long since passed, IMO.

It's become yet another example in the ever growing list of benchmaxxed targets whose original purpose was defeated by teaching to the test.

https://x.com/jeffdean/status/2024525132266688757?s=46&t=ZjF...


Or maybe you’re too trusting of companies who have already proven to not be trustworthy?

I mean if you want to make your own benchmark, simply don't make it public and don't do it often. If your salamander on skis or whatever gets better with time it likely has nothing to do with being benchmaxxed.

It can't, and I wouldn't hold my breath for such a small company being able to navigate compliance for contactless payments. The Pebble does use standard watch straps though, so you could get one of the ones with a programmable payment chip embedded inside.

If that's all you use your smartwatch for, you may as well skip the watch and get a payment bracelet or ring though.

e.g. https://www.curve.com/wearables/


I used to use Curve but they became really unreliable, especially when paying for public transport, so I just switched back to a normal card.

Shame because you can get some nice watch straps with curve integration, which would neatly solve the missing payment feature on Pebble watches.


Except not in the USA. To bad the rings looks nice.

> The only time I've seen the (in my mind) confused messaging is on Pebble's own website

Yeah, other wearable manufacturers who use the same display technology usually call it MIP instead. Pebble are pretty much the only ones who call it e-paper, which has led some to think theirs is a distinct thing, but it's just MIP.


The Pebble display isn't e-ink, or unique amongst watches, it's an off-the-shelf MIP LCD from Sharp.

You can get the same thing in watches from Garmin, Coros, Polar, Suunto, Casio and probably more.


I think you're confusing Pebble with something else. All current models on the website as well as the OG pebble (according to Wikipedia) use eink displays.

https://en.wikipedia.org/wiki/Pebble_(watch)#Hardware

> The watch featured a 32-millimetre (1.26 in) 144 × 168 pixel black and white memory LCD using an ultra low-power "transflective LCD" manufactured by Sharp

Later generations are color, but it's the same tech. If you've ever used actual e-ink then it should be obvious enough that the Pebble displays are something else, it would be nowhere near responsive enough to keep up with pebbleOS's animations.



They’re Sharp memory displays, functionally LCDs but with memory for retention under each pixel. They are not and have never been eink.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: