An MIT study finds non-clinical information in patient messages, like typos, extra whitespace, or colorful language, can reduce the accuracy of a large language model deployed to make treatment recommendations. The LLMs were consistently less accurate for female patients, even when all gender markers were removed from the text.
Why are they… why are they having autocomplete recommend medical treatment? There are specialized AI algorithms that already exist for that purpose that do it far better (though still not well enough to even assist real doctors, much less replace them).
Because sycophants keep saying it’s going to take these jobs, eventually real scientists/researchers have to come in and show why the sycophants are wrong.
Are there any studies done (or benchmarks) that show accuracy on recommendations for treatments given a medical history and condition requiring treatment?
Im currently working on one now as a researcher. Its a crude tool to measure the quality of response. But its a start
Gotta start somewhere, and it won’t ever improve if we don’t start improving it. So many on Lemmy assume the tech will never be good enough so why even bother, but that’s why we do things, to make the world that much better… eventually. Why else would we plant literal trees? For those that come after us.
It’s not an assumption it’s just a matter of practical reality. If we’re at best a decade off from that point why pretend it could suddenly unexpectedly improve to the point it’s unrecognizable from its current state? LLMs are neat, scientists should keep working on them and if it weren’t for all the nonsense “Ai” hype we have currently I’d expect to see them used rarely but quite successfully as it would be getting used off of merit, not hype.