We are not aware that we have any problem like that.
This might explain why GigaBlast has a problem:
Because of bugs in the original Gigablast spidering code, the Findx crawler ended up on a blacklist in Project Honeypot as being “badly behaved” (fixed in our fork). That meant quite a bit of trouble for us because CDN providers, which are a very powerful hubs for internet traffic, put a lot of weight on this blacklist. Some of the most popular websites and services on the internet run through services like Cloudflare and other CDNs – so if you are in bad standing with them, suddenly a large part of the internet is not available, and we weren’t able index it.
Does this mean your spider is a fork of Gigablast? Is there some additional interesting technical information about how your code/infrastructure is set up?
I realise this is not addressing your second question but you might find it interesting. Post below on server expansion one year ago. We are adding another 100 servers over Christmas and early new year.
Mojeek follows the robots.txt protocol so if a site doesn't want to be crawled by MojeekBot we respect that wish. There is also a generous crawl delay between pages on the same host.
Generally a 'badly behaved bot' will ignore robots.txt or hit a site too hard with requests.
> There is also a generous crawl delay between pages on the same host.
What's the order of magnitude of this delay? milliseconds? hundreds of milliseconds? seconds? I'm curious what's considered 'polite' in this realm and how the various parties come to form opinions on this.
I just had a look and there's a non-standard "crawl-delay directive" extension to robots.txt that can be used to ask a spider to take some time between page visits:
Hello, MojeekBot doesn't observe the crawl-delay directive but thanks for the reminder of it as it's beneficial for us to know if site owners require more grace between requests.
> They claimed 10x the density of DRAM, it is now 4x
> Latency missed by 100x, yes one hundred times, on their claim of 1000x faster, 10x is now promised
> More troubling is endurance, probably the main selling point of this technology over NAND. Again the claim was a 1000x improvement, Intel delivered 1/333rd of that with 3x the endurance.
I think density can be increased, this is only the initial product,
and latency is contributed more by PCIe/OS/application rather than the underlying 3d-xpoint material. The slides from the article are for the PCIe SSDs, I wonder whether the earlier claimed latency, still holds well with NVRAM.
I wonder why the endurance is so lower than the earlier claims.
10x latency and 3x endurance might normally satisfy the "must be 10x better" criteria to break into an existing market, but with the maturity of flash, and how memory hierarchies can ameliorate useful sets of latency requirements, this could end up being a damp squib instead of the revolution promised. 1000x endurance would have been great, 3x, who will notice?
Not the first time Intel has grossly mismanaged its technology....
Or just a big enough capacitor to finish the necessary writes to flash memory. Which as I understand it is one of the things that distinguishes enterprise from consumer flash drives, and one of the reasons I use the slowest, smallest Intel enterprise flash drive for system and /home.
It seems to be a case of transitioning marketing claims from those about the potential of the core underlying technology to more real world scenario benefits. Some of the numbers included latency in the kernel/driver, so they are more focused on actual applications.
It is a bit different to say initial product shipped vs tech potential.. We've been waiting on zen err bulldozer/excavator/piledriver/steamroller for years now, and while mobile and Apus shipped, it has been a fluke in server and desktop markets.