Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder… Google scrapes for indexing and for AI, right? I wonder if they will eventually say: ok, you can have me or not, if you don’t want to help train my AI you won’t get my searches either. That’s a tough deal but it is sort of self-consistent.


Very few people seems to be complaining that Google crashes their sites. Google also publish their crawlers IP ranges, but you really don't need to rate-limit Google, they know how to back off and not overload sites.


In theory — in practise I've had to limit Google on two large sites at work. I currently have them limited to 10/s for non-cached requests.


Curious if the content on those sites might have high value to Google? Such as if they have data that is new or unavailable elsewhere, or if they're just standard sites, and you've just been unlucky?

I have had odd bot behavior from some major crawlers, but never from Google. I wonder if there is a correlation to usefulness of content, or if certain sites get stuck in a software bug (or some other strange behavior).


Google do value the sites, they have data unavailable elsewhere. At some point we had an automated message saying the site had too many pages and would no longer be indexed, then a human message saying that was a mistake, and our site was an exception to that rule.

But as with any contact with these large companies, our contact eventually disappeared.


"Embrace, Extend, Extinguish" Google's mantra. And yes, I know about Microsoft's history with that phrase ;) But Google has done this with email, browsers (Google has web apps that run fine on Firefox but request you use Chrome), Linux (Android), and I'm sure there's others I am forgetting about.

So yeah, I too could see them doing this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: