More

AnbeSivam · on Jan 26, 2021

Do you know of any blog/papers which talks about this - using topology for such interval data types.

AnbeSivam · on Jan 24, 2021

He talks about Larrabee project here - https://youtu.be/MxZe1i8z-8Y?t=782

Part 1 - https://www.youtube.com/watch?v=JTKkY2kZuEw&t=3086s

AnbeSivam · on Jan 7, 2021

Came across someone else mentioning the similar bandwidth constraint w.r.to LFB per core a month back.

https://news.ycombinator.com/item?id=25221968

AnbeSivam · on Dec 10, 2020

Does your spider face issues with cloudflare mentioned by Gigablast founder here.

https://www.gigablast.com/blog.html

ColinHayhurst · on Dec 10, 2020

We are not aware that we have any problem like that.

This might explain why GigaBlast has a problem:

Because of bugs in the original Gigablast spidering code, the Findx crawler ended up on a blacklist in Project Honeypot as being “badly behaved” (fixed in our fork). That meant quite a bit of trouble for us because CDN providers, which are a very powerful hubs for internet traffic, put a lot of weight on this blacklist. Some of the most popular websites and services on the internet run through services like Cloudflare and other CDNs – so if you are in bad standing with them, suddenly a large part of the internet is not available, and we weren’t able index it.

extract from: https://web.archive.org/web/20190921180535/https://privacore...

mikkom · on Dec 11, 2020

> fixed in our fork

Does this mean your spider is a fork of Gigablast? Is there some additional interesting technical information about how your code/infrastructure is set up?

ColinHayhurst · on Dec 11, 2020

I realise this is not addressing your second question but you might find it interesting. Post below on server expansion one year ago. We are adding another 100 servers over Christmas and early new year.

https://blog.mojeek.com/2019/12/100-server-build-and-install...

We'll be writing about our tech stack in our next FAQs series; 3 of 4, this is 1 of 4:

https://blog.mojeek.com/2020/11/frequently-asked-questions-a...

ColinHayhurst · on Dec 11, 2020

No, we have our own spider

kevsim · on Dec 11, 2020

The post was from Findx not Mojeek

deepstack · on Dec 11, 2020

any insight as to what is consider "Badly behaved crawlers"? Or is it something that you work out with CDN so they don't blacklist your ip?

ricardo81 · on Dec 11, 2020

Hello, I work on the technical side of Mojeek.

Mojeek follows the robots.txt protocol so if a site doesn't want to be crawled by MojeekBot we respect that wish. There is also a generous crawl delay between pages on the same host.

Generally a 'badly behaved bot' will ignore robots.txt or hit a site too hard with requests.

Our bot uses a specific user agent which you can verify via DNS. https://www.mojeek.com/bot.html

wyldfire · on Dec 11, 2020

> There is also a generous crawl delay between pages on the same host.

What's the order of magnitude of this delay? milliseconds? hundreds of milliseconds? seconds? I'm curious what's considered 'polite' in this realm and how the various parties come to form opinions on this.

ricardo81 · on Dec 11, 2020

A minimum of 4 seconds.

YeGoblynQueenne · on Dec 12, 2020

I just had a look and there's a non-standard "crawl-delay directive" extension to robots.txt that can be used to ask a spider to take some time between page visits:

  User-agent: bingbot
  Allow : /
  Crawl-delay: 10

https://en.wikipedia.org/wiki/Robots_exclusion_standard#Craw...

ricardo81 · on Dec 12, 2020

Hello, MojeekBot doesn't observe the crawl-delay directive but thanks for the reminder of it as it's beneficial for us to know if site owners require more grace between requests.

YeGoblynQueenne · on Dec 12, 2020

Hey. Good job with Mojeek. It seems the crawl-delay directive is not part of the robots.txt standard. It probably should be but that's not up to you!

AnbeSivam · on Dec 10, 2020

Thanks for that link, I haven't come across it before.

AnbeSivam · on May 10, 2017

Anyone following IR related news, do you know what happened to BitFunnel (opensourced rewrite of Bing search engine).

https://bitfunnel.org/categories/blog/

https://github.com/bitfunnel/nativejit/

https://github.com/BitFunnel/BitFunnel

markpapadakis · on May 10, 2017

I wasn't familiar with this project. Thanks for mentioning it.

AnbeSivam · on Oct 13, 2016

More information:

https://www.listbox.com/member/archive/247/2016/10/sort/time...

http://boingboing.net/2016/10/11/to-do-in-san-francisco-a-co...

AnbeSivam · on Oct 12, 2016

More information:

https://www.listbox.com/member/archive/247/2016/10/sort/time...

http://boingboing.net/2016/10/11/to-do-in-san-francisco-a-co...

http://www.sweetwatermusichall.com/event/1347933-everyday-mi...

gimballock81 · on Oct 12, 2016

Thanks, I was going to ask what is the back story here.

AnbeSivam · on Sept 23, 2016

From the article -

> They claimed 10x the density of DRAM, it is now 4x

> Latency missed by 100x, yes one hundred times, on their claim of 1000x faster, 10x is now promised

> More troubling is endurance, probably the main selling point of this technology over NAND. Again the claim was a 1000x improvement, Intel delivered 1/333rd of that with 3x the endurance.

From this seminar few months back - https://www.youtube.com/watch?v=hXurTRtmfWc ,

I think density can be increased, this is only the initial product,

and latency is contributed more by PCIe/OS/application rather than the underlying 3d-xpoint material. The slides from the article are for the PCIe SSDs, I wonder whether the earlier claimed latency, still holds well with NVRAM.

I wonder why the endurance is so lower than the earlier claims.

hga · on Sept 23, 2016

Ouch!

10x latency and 3x endurance might normally satisfy the "must be 10x better" criteria to break into an existing market, but with the maturity of flash, and how memory hierarchies can ameliorate useful sets of latency requirements, this could end up being a damp squib instead of the revolution promised. 1000x endurance would have been great, 3x, who will notice?

Not the first time Intel has grossly mismanaged its technology....

gpderetta · on Sept 23, 2016

"memory hierarchies can ameliorate useful sets of latency requirement"

Not the latency for commits to stable non-volatile storage, unless battery backed RAM is an option.

hga · on Sept 23, 2016

Or just a big enough capacitor to finish the necessary writes to flash memory. Which as I understand it is one of the things that distinguishes enterprise from consumer flash drives, and one of the reasons I use the slowest, smallest Intel enterprise flash drive for system and /home.

diziet · on Sept 23, 2016

It seems to be a case of transitioning marketing claims from those about the potential of the core underlying technology to more real world scenario benefits. Some of the numbers included latency in the kernel/driver, so they are more focused on actual applications.

It is a bit different to say initial product shipped vs tech potential.. We've been waiting on zen err bulldozer/excavator/piledriver/steamroller for years now, and while mobile and Apus shipped, it has been a fluke in server and desktop markets.

rbanffy · on Sept 23, 2016

While certainly not breathtaking, it's an initial product on a development path. I was expecting more, but this is still an improvement.

AnbeSivam · on Sept 23, 2016

Transcript - http://bitfunnel.org/strangeloop/

AnbeSivam · on Sept 5, 2016

Related -

http://bitfunnel.org/debugging-nativejit/

https://twitter.com/danluu/status/771622870132809729