Companies are training LLMs on all the data that they can find, but this data is not the world, but discourse about the world. The rank-and-file developers at these companies, in their naivete, do not see that distinction…So, as these LLMs become increasingly but asymptotically fluent, tantalizingly close to accuracy but ultimately incomplete, developers complain that they are short on data. They have their general purpose computer program, and if they only had the entire world in data form to shove into it, then it would be complete.
That’s easy. The people profiting from it are pushing it hard.
And other companies who had something half-baked just threw it out to both say “me too!” and to ingest as much user input training data in order to catch up.
That’s why “AI” is getting shoved into so many things right now. Not because it’s useful but because they need to gobble up as much training data as they can in order to play catch up.