They are, via CFAA. It depends what their robots.txt is set to or the AI version of that.
Anyway, the influence of random web text on AI is overrated. They're going to filter out pages that don't contribute, and bad words/topics/personal info will get it removed.