AI Algorithm As Good As Human Readers at Screening Mammograms
Posted on 06 Sep 2023
Mammographic screening, while valuable, may not detect all instances of breast cancer. False-positive results can lead to unnecessary imaging and biopsies for women without cancer. One approach to enhance the sensitivity and specificity of screening mammography is to have two readers interpret each mammogram. Double reading has been shown to increase cancer detection rates by 6 to 15% while maintaining low recall rates. However, implementing this strategy can be challenging during periods of reader shortages due to its labor-intensive nature. Now, a comparative study of the performance of an artificial intelligence (AI) algorithm with human readers of screening mammograms suggests that AI can provide comparable sensitivity and specificity to human readers, potentially serving as a valuable second reader in clinical practice.
Researchers at the University of Nottingham (Nottingham, UK) used a standardized assessment to evaluate the performance of a commercially available AI algorithm in comparison to human readers when interpreting screening mammograms. The evaluation utilized test sets from the Personal Performance in Mammographic Screening (PERFORMS) quality assurance assessment, a program employed by the UK's National Health Service Breast Screening Program (NHSBSP). PERFORMS test sets consist of 60 challenging mammographic exams, including cases with abnormal, benign, and normal findings. Each reader's evaluation of a test mammogram was compared to the AI's ground truth results. The study employed data from two consecutive PERFORMS test sets, totaling 120 screening mammograms, for the evaluation of both human readers and the AI algorithm.
The research team compared the performance of the AI algorithm with that of 552 human readers, comprising 315 (57%) board-certified radiologists and 237 non-radiologist readers, consisting of 206 radiographers and 31 breast clinicians. Each breast in the study was considered individually, with 67% categorized as normal (161/240), 29% as malignant (70/240), and 4% as benign (9/240). The most common malignant mammographic feature observed was masses (64.3%), followed by calcifications (12.9%), asymmetries (11.4%), and architectural distortions (11.4%). The average size of malignant lesions measured 15.5 mm. The study found that there was no significant difference in the performance of AI and human readers in detecting breast cancer in the 120 exams. Human readers demonstrated a mean sensitivity of 90% and specificity of 76%, while AI exhibited comparable sensitivity (91%) and specificity (77%) in comparison to human readers.
"The results of this study provide strong supporting evidence that AI for breast cancer screening can perform as well as human readers," said Yan Chen, Ph.D., professor of digital screening at the University of Nottingham. "It's vital that imaging centers have a process in place to provide ongoing monitoring of AI once it becomes part of clinical practice. There are no other studies to date that have compared such a large number of human reader performance in routine quality assurance test sets to AI, so this study may provide a model for assessing AI performance in a real-world setting."
Related Links:
University of Nottingham