Artificial intelligence is brittle: we need to do better
Published in: Radiol Artif Intell, Vol. 7, No. 3 (April 2025)

Excerpt:
"Anyone who has trained a neural network from scratch has wrestled with a critical question: Will the performance that I see on my dataset generalize to a real-world clinical population?
The question is all the more important to answer with models in radiology because dozens of factors can lead to variations in the input data these models receive. In general, if there is a mismatch in the distribution of characteristics between the training and evaluation or testing of datasets of models, one can expect a performance decrease. This observation has been demonstrated many times in research and (unfortunately) in real-life patient care settings, where changes in scanner machines manufacturer, imaging protocols, and patient positioning have led to decreased accuracy of artificial intelligence (AI)-based diagnostics. Thus, as new models are developed, it is critical for authors to perform sensitivity analyses to assess how these models are affected by common, real-world differences in input data."
doi: doi.org/10.1148/ryai.250081 PMID: 40202405