We will use Grok 3.5 (maybe we should call it 4), which has advanced reasoning, to rewrite the entire corpus of human knowledge, adding missing information and deleting errors.
Then retrain on that.
Far too much garbage in any foundation model trained on uncorrected data.
LLMs are prediction tools. What it will produce is a corpus that doesn’t use certain phrases, or will use others more heavily, but will have the same aggregate statistical “shape”.
It’ll also be preposterously hard for them to work out, since the data it was trained on always has someone eventually disagreeing with the racist fascist bullshit they’ll get it to focus on. Eventually it’ll start saying things that contradict whatever it was supposed to be saying, because statistically eventually some manner of contrary opinion is voiced.
They won’t be able to check the entire corpus for weird stuff like that, or delights like MLK speeches being rewriten to be anti-integration, so the next version will have the same basic information, but passed through a filter that makes it sound like a drunk incel talking about asian women.