Stay informed with free updates
Simply sign up to the Artificial intelligence myFT Digest — delivered directly to your inbox.
OpenAI’s latest artificial intelligence model has almost matched expert doctors in analysing eye conditions, according to research that highlights the technology’s potential in medicine.
The Microsoft-backed start-up’s GPT-4 model surpassed or achieved the same scores as all but the top-scoring specialist medics in assessing ocular problems and suggesting treatments, according to a paper published on Wednesday.
Ophthalmology has been a big focus of efforts to put AI to clinical use and fix obstacles to take-up, such as the tendency of models to “hallucinate” by creating fictitious data.
“What this work shows is that the knowledge and reasoning ability of these large language models in an eye health context is now almost indistinguishable from experts,” said Arun Thirunavukarasu, the lead author of a paper on the findings published in PLOS Digital Health journal.
“We are seeing the ability to answer quite complicated questions,” he added.
The research used 87 different patient scenarios to test the performance of GPT-4 against non-specialist junior doctors and both trainee and expert eye medics. The model outperformed the juniors and achieved similar results to many of the specialists, the paper said.
The study is notable because it compares the AI model’s abilities with those of practising doctors rather than with examination results, the researchers said. It also deploys the broad powers of generative AI, rather than narrower capabilities tested in some previous AI medical studies such as diagnosing cancer risks from patient scans.
The model performed equally well on questions that demanded first-order recall and those requiring higher-order reasoning, such as the ability to interpolate, interpret and process information.
“We are now training in a much more open-ended way and we are discovering almost abilities in these models that they weren’t explicitly trained for,” said Thirunavukarasu, who carried out the research while studying at the University of Cambridge’s school of clinical medicine.
The model could be refined further by training it on an expanded data set including management algorithms, deidentified patient notes and textbooks, said Thirunavukarasu, who is now based at Oxford university.
He added that this would demand a “tricky balance” between expanding the number and nature of sources, while ensuring the information remained of good quality. Potential clinical uses could be in the triage of patients or where access to specialist healthcare professionals was limited.
Interest in deploying AI in a clinical setting has soared with evidence of its contribution to diagnostics, such as flagging early-stage breast cancers that may be missed by doctors. At the same time, researchers are grappling with how to manage serious risks, given the damage that false diagnoses can cause to patients.
The latest study was “exciting” and its idea of using AI to benchmark experts’ performance “super-interesting”, said Pearse Keane, professor of artificial medical intelligence at University College London.
Keane, who is also affiliated with Moorfields Eye Hospital in London, agreed that more work was needed before introducing the techniques in a clinical context.
Keane cited an example from his own research last year in which he asked a large language model about macular degeneration in the eye, only for it to give “made-up” references in its reply.
“We just have to balance our excitement about this technology and the potential massive benefits . . . with caution and scepticism,” he said
Read the full article here