Initial interpretation scores of screening mammograms and cancer detection in BreastScreen Norway

Tone Hovda,Silje Sagstad,Nataliia Moshina,Einar Vigeland,Solveig Hofvind
DOI: https://doi.org/10.1016/j.ejrad.2024.111662
Abstract:Purpose: To explore the association between radiologists' interpretation scores, early performance measures and cumulative reading volume in mammographic screening. Method: We analyzed 1,689,731 screening examinations (3,379,462 breasts) from BreastScreen Norway 2012-2020, all breasts scored 1-5 by two independent radiologists. Score 1 was considered negative/benign and score ≥2 positive in this scoring system. We performed descriptive analyses of recall, screen-detected cancer, positive predictive value (PPV) 1, mammographic features and histopathological characteristics by breast-based interpretation scores, and cumulative reading volume by examination-based interpretation scores. Results: Counting breasts and not women, 3.9 % (132,570/3,379,462) had a score of ≥2 by one or both radiologists. Of these, 84.8 % (112,440/132,570) were given a maximum score 2. Total recall rate was 1.6 % (53,735/3,379,462), 69.3 % (37,220/53,735) given maximum score 2. Among the 0.3 % (9733/3,379,462) diagnosed with screen-detected cancer, 34.6 % (3369/9733) had maximum score 3. The percentages of recall, screen-detected cancer and PPV-1 increased by increasing the sum of scores assigned by two radiologists (p < 0.001 for trend). Higher proportions of masses were observed among recalls and screen-detected cancers with low scores, and higher proportions of spiculated masses were observed for high scores (p < 0.001). Proportions of invasive carcinoma, histological grade 3 and lymph node positive tumors were higher for high versus low scores (p < 0.001). The proportion of examinations scored 1 increased by cumulative reading volume. Conclusions: We observed higher rates of recall and screen-detected cancer and less favorable histopathological tumor characteristics for high versus low interpretation scores. However, a considerable number of recalls and screen-detected cancers had low interpretation scores.
What problem does this paper attempt to address?