For NLP, sharing data is a bit of a problem though. You do a paper showing that ...

Delk · on May 29, 2021

> You do a paper showing that problem X can be solved slightly better by downloading and training on a billion tweets.

That's true. Sometimes you might try to tweak the algorithm itself rather than the data, though, or experiment with different kinds of preprocessing or something, and in those cases it would be helpful to be able to do different experiments with shared datasets.

My limited experiences were from around the time deep learning was only about to become a big thing, so it might have been different then. Maybe you nowadays just throw more tweets and GPUs at the problem.