An MIT study finds non-clinical information in patient messages, like typos, extra whitespace, or colorful language, can reduce the accuracy of a large language model deployed to make treatment recommendations. The LLMs were consistently less accurate for female patients, even when all gender markers were removed from the text.
This is not an argument for LLMs (which people are deferring to an alarming rate) but I’d call out that this seems to be a bias in humans giving medical care as well.
Of course it is, LLMs are inherently regurgitation machines - train on biased data, make biased predictions.