Introduction

Digital breast tomosynthesis (DBT) has been claimed to be a more sensitive screening tool than standard digital mammography (DM) due to an increased rate of screen-detected breast cancer [1,2,3,4,5]. The effect of DBT on recall rates has varied, probably due to the different screening regimes and logistics [2,3,4,5,6,7]. The limited numbers of studies reporting interval breast cancer have not been able to show any substantial difference between the two techniques [8, 9]. In addition, the increase of screen-detected breast cancer has raised the question about tumor characteristics and clinical significance related to screening with DBT [8, 10].

In most screening studies, DBT has been used in combination with DM (DM+DBT) [11], which has led to concerns about the radiation dose, as the dose is almost double that of standard DM [1]. To overcome this problem, vendors have developed an algorithm for reconstructing synthetic two-dimensional images from the raw DBT data (SM). SM seems to equal DM in combination with DBT regarding cancer detection [5, 12,13,14].

Most studies on screening using DBT have been performed as paired or unpaired studies [2] with only one, a prevalent screening examination with DBT, per woman. However, in a screening setting, early performance measures in subsequent screening rounds with DBT are of substantial interest. McDonald et al [15] evaluated the effect of DBT in subsequent rounds and reported sustainable rates of recalls and screen-detected breast cancer, as well as a reduction in interval breast cancer. The impact of DBT on recall rate is less obvious in studies from Europe versus the USA, due to a generally lower rate, different study designs, as well as screening techniques [1,2,3,4, 7, 16]. Further studies, including subsequent screening examinations with DBT, also from Europe, are thus needed.

In BreastScreen Norway, DBT has been used in two different study settings in the Oslo region: the Oslo Tomosynthesis Screening Trial [4] and Oslo-Vestfold-Vestre Viken study (OVVV-study) [17]. Taking advantage of these studies, we were able to provide novel information on the impact of DBT used in consecutive screening rounds and for different combinations of prior and subsequent screening techniques. The primary objective of our study was to compare early performance measures as recall, screen-detected, and interval breast cancer in consecutive screening rounds with DBT and DM, including results of prognostic and predictive tumor characteristics.

Materials and methods

This retrospective study was based on data from BreastScreen Norway. The screening program is administered by the Cancer Registry of Norway [18] and thus covered by the Cancer Registry Regulation, giving approval with waiver of informed consent to perform studies based on the data collected in the program [19]. The OTST and OVVV-study were covered by approvals from the Regional Ethical Committee for Medical and Health Research Ethics and the Institutional Data Protection Officer, respectively.

BreastScreen Norway is described in detail elsewhere [18]. In short, the program is population-based, offering women aged 50–69 two-view mammographic screening every second year. The screening procedure includes independent double reading with consensus or arbitration. In this study, screen reading was performed by radiologists with varying experiences in breast imaging and screen reading, from beginners to more than 20 years of experience (Appendix 1).

The study population included 35,736 women, all screened in Oslo as part of BreastScreen Norway, within the study period, from 2008 to 2016. All women screened with at least two consecutive screening examinations during this period were eligible for inclusion. One observation/unit of analysis included a pair of two consecutive screening examinations. The screening result of the first examination in the pair was either negative or positive with a negative assessment, while the screening result of the consecutive examination was the outcome of interest. Some women (n = 19,498) were screened more than twice during the study period and may thus contribute with more than one observation. In total, 69,624 observations were included in the study.

The screening techniques varied during the study period. Standard DM was used as a screening technique in 2008, 2009, 2010, 2013, and 2016, DM+DBT in 2011 and 2012, and SM+DBT in 2014 and 2015. GE Senographe and GE Seno Advantage Workstation were used from 2008 to August 2010 (DM), and Hologic Selenia Dimensions and Secureview Hologic Workstation from 2010 to the end of the study period (for DM, for DM+DBT, and for SM+DBT).

Four study groups were established. We conditioned on the prior screening technique when creating the four groups, while results from the subsequent screening examination were the measures of interest in this study. The “DM after DM” group consisted of the pair of two consecutive screening exams with DM. A pair of two consecutive screening exams including a prior DM and a consecutive DBT was the group “DBT after DM.” A pair of two consecutive screening exams with DBT was the “DBT after DBT” group, while the pair of a prior DBT and a consecutive DM represented the group “DM after DBT” (Fig. 1).

Fig. 1
figure 1

Four study groups showing pairs with different combinations of prior and subsequent screening technique. The subsequent examination in each pair occurs 2 years after the prior examination

All screening examinations included two views, craniocaudal (CC) and mediolateral-oblique (MLO), of both breasts. For those with a prior DBT, images were available at screen reading and/or consensus in addition to prior DM images. We analyzed the data on screening examination level and assumed independence between the observations. To test this assumption, Pearson’s correlation coefficients for recall rate and rates of screen-detected and interval breast cancer were estimated.

Recall rate was defined as the percentage of screening examinations resulting in a call back for further assessment due to abnormal mammographic findings among all screening examinations performed in the actual study group. Rates of screen-detected and interval breast cancer were estimated as the number of screen-detected or interval breast cancer, invasive or ductal carcinoma in situ (DCIS), per 1000 screening examinations, for each study group. Rates were given for DCIS and invasive breast cancer separately and in total. Interval breast cancer was defined as breast cancer diagnosed after a negative screening examination or > 6 months after a false-positive screening result and within 2 years after the last screening examination [20]. Due to the requirement of 2-year follow-up, interval breast cancer rates will be given for women subsequently screened in 2014 or earlier (n = 44,515 screening examinations). Accordingly, women screened with SM+DBT after DM and with DM after SM+DBT are not included in the analyses of interval breast cancer (Table 1).

Table 1 Age (mean and median), recalls (%), screen-detected and interval breast cancer (per 1000 screened), positive predictive value of recalls (PPV-1, %), sensitivity (%), and specificity (%) among women subsequently screened with different combinations of screening techniques in Oslo as part of BreastScreen Norway, 2010–2016

Women diagnosed with screen-detected breast cancer were considered true-positives (TP) while women diagnosed with interval breast cancer were considered false-negative (FN). Women screened and recalled, but not diagnosed with breast cancer, were false-positive (FP) while women with a negative screening examination without interval breast cancer were true-negative (TN). Sensitivity was defined as TP/(TP+FN) and specificity as TN/(TN+FP). Positive predictive value for recall (PPV-1) was estimated as the number of exams resulting in a diagnosis of screen-detected breast cancer divided by the total number of recalls within each study group.

We performed descriptive analyses and present prognostic (histological type and grade, tumor diameter, and lymph node involvement) and predictive (molecular subtypes) tumor characteristics as rates per 1000 women screened for screen-detected and interval breast cancers.

All results are given separately for SM and for DM in combination with DBT—six groups—in Appendices 2 and 3.

The differences between groups were tested using t tests, chi-square test, and test of proportions. A p value below 0.05 was considered statistically significant. We used the Bonferroni correction when appropriate. Statistical analyses were performed using IBM SPSS Statistics version 25.

Results

The mean and median age of the women at subsequent screening examination was 60.5 (range 60.3–60.7) and 60.0 years (range 60.0–61.0), respectively (Table 1).

Recall for women screened with DM after DM was 3.6%, higher than for all other study groups (p < 0.001) (Table 1). The lowest recall, 1.9%, was observed among women with two consecutive DBT examinations. The recall for DM after DBT of 2.2% was significantly lower than for DBT after DM (2.7%, p < 0.001).

The rate of screen-detected breast cancer was 4.6/1000 screens for DM after DM (Table 1). For DBT after DM, it was 9.9/1000 screens, and for two consecutive DBTs, 8.3/1000 screens (p < 0.001 for both, compared with DM after DM). For those screened with DM after DBT, the rate of screen-detected breast cancer was 4.3/1000 screens (p < 0.001 compared with DBT after DM or DBT).The detection rate of DCIS was higher for both DBT after DM (1.6/1000) and DBT after DBT (1.8/1000) compared with DM after DBT (0.5/1000 screens, p < 0.001 for both).

PPV-1 ranged from 12.9% (DM after DM) to 43.5% (DBT after DBT, p < 0.001) (Table 1). PPV-1 was higher for DM after DBT compared with DM after DM (p < 0.05).

The rates of interval breast cancer ranged from 1.9 to 3.0/1000, with no statistically significant differences between groups (Table 1). The sensitivity ranged from 63% (DM after DBT) to 84% (DBT after DM) and specificity ranged from 97% (DM after DM) to 99% (DBT after DBT).

The rate of invasive ductal carcinoma was 6.4/1000 for women screened with DBT after DM, compared with 2.5/1000 for those screened with DM after DM, p < 0.001 and statistically significant after the Bonferroni correction. The rate of IDC in the DBT after DBT group was 4.8/1000 (p < 0.01 compared with DM after DM), and 3.2/1000 in the DBT after DM group. As for invasive tubular carcinoma, the highest rates were observed for DBT after DM (0.7/1000) and DBT after DBT (0.8/1000), compared with none for DM after DM, p < 0.01 for both. No differences in mean and median tumor diameter were observed between groups. The rate of tumors < 10 mm was 3.3/1000 in the DBT after DBT group compared with 1.5/1000 in the DM after DM group (p < 0.01) and the rates for tumors 10–19 mm were 3.7/1000 and 1.1/1000, respectively (p < 0.001). For DBT after DBT, the rates were 2.8/1000 and 2.7/1000, respectively (p < 0.01 and p < 0.05 compared with DM after DM). No differences in rates of tumors > 20 mm were observed between groups (Table 2). Further, for DBT after DM, the rates of grade 1 invasive cancer were 3.2/1000 and of grade 2 3.7/1000, compared with 1.0/1000 and 1.2/1000, respectively, for DM after DM (p < 0.001 for both). For DBT after DBT, the rates were 3.0/1000 (p < 0.01 compared with DM after DM) and 2.5/1000 (p < 0.05 compared with DM after DM). No differences in the rates of lymph node–positive disease or human epidermal growth factor receptor 2 (HER2)–positive or triple-negative tumors were observed between groups (Table 2).

Table 2 Screen-detected cancer. Histopathological characteristics of invasive tumor among women screened with different combinations of screening techniques in Oslo as part of BreastScreen Norway, 2010–2016. Rates per 1000 screened women

For interval breast cancers, no statistical differences in rates of histological type, tumor diameter grade, or molecular subtypes were observed when the Bonferroni correction was applied (Table 3).

Table 3 Interval breast cancer. Histopathological characteristics of invasive tumor among women screened with different combinations of screening techniques in Oslo as part of BreastScreen Norway, screened 2010–2014, with 2-year follow-up. Rates per 1000 screened women

Recalls and screen-detected breast cancer, and PPV-1 for DM/SM+DBT after DM, were in accordance with the combined DBT after DM group (Appendix 2). This also applied to DM after DM/SM+DBT and the combined DM after DBT group. The groups including SM+DBT tended to be of lower grade compared with groups including DM+DBT; otherwise, no statistical differences regarding histopathological characteristics were observed (Appendix 3).

The estimated Pearson correlation coefficients were at most 0.06 for recall and 0.01 for screen-detected and interval breast cancer (results not shown in table).

Discussion

We found a lower recall among women screened with DBT after DM compared with DM after DM, but even lower among those with a prior DBT (DM or DBT after DBT). The highest rates of screen-detected breast cancer were observed among women screened with DBT after DM or consecutive DBTs. PPV-1 was highest for DBT after DBT and lowest for DM after DM. Further, higher rates of invasive tubular carcinomas, tumors < 20 mm, grade 1 and grade 2 tumors, and luminal A and luminal B HER2-negative subtypes were observed for women screened with DBT after DM or after two consecutive DBTs. No statistical differences in the rate of interval breast cancer were observed between the study groups.

A significantly lower recall for those with a prior DBT than for those with a prior DM may indicate that the availability of prior DBT examinations at screen reading and/or consensus lowers the probability of recall, possibly due to a large amount of image information available. Additionally, apparent suspicious findings on screening images could be explained by the superimposition of overlapping tissue when compared with prior DBT images, resulting in a negative screening interpretation. A learning effect as the radiologists were becoming more experienced with DBT during the study period may also be considered.

The highest rates of screen-detected cancer were observed in the groups screened with DBT after a prior DM. This increase is in line with the results from other studies [3, 5, 7] and might be partly attributed to the increased conspicuity of lesions and reduced masking of overlapping tissues by DBT [1]. Further, tumors may be diagnosed at an earlier stage than in the absence of DBT, thereby prolonging the lead-time. Based on this, the question whether the increased detection rate is only transient may be raised. In our study, the increased rate of screen-detected breast cancer sustained two screening rounds. However, the detection rate for DBT after DBT tended to be lower than for DBT after DM, though not statistically significant. Follow-up through several subsequent screening rounds is needed to evaluate whether the increased cancer detection rate observed for DBT is transient and due to extended lead-time.

An interesting question when screening with DBT is do the excess in the number of tumors represent “killing cancers,” or small and less aggressive tumors of inferior clinical significance? Studies are showing conflicting results regarding that topic [4, 7, 17, 21]. We found increased rates of invasive tubular carcinomas, tumors < 20 mm, histologic grade 1 and 2 tumors, and tumors with favorable lymph node status and molecular subtypes for women screened with DBT compared with DM, while no differences in rates of tumors > 20 mm, histologic grade 3 tumors, lymph node–positive, HER2-positive, or triple-negative tumors were observed. The detection rate of DCIS was also statistically higher among those screened with DBT compared with DM after DBT, but not compared with DM after DM, though the numbers are small. This study was not designed to investigate overdiagnosis, but our findings show that the excess cancers diagnosed with DBT seem to be less aggressive tumors. We cannot definitely conclude about the clinical implications of the increased rate of cancers diagnosed by DBT. However, having the debate of overdiagnosis in mind, that is the detection of small, less aggressive tumors which would possibly not have been diagnosed or come to a clinical appearance in the women’s lifetime in the absence of screening, DBT may contribute in that direction. Further studies in this regard are needed. Additionally, studies on even more stratified histopathological analyses and treatment options will bring valuable information into this debate.

In our study, PPV-1 was significantly higher among those screened with DBT; thus, the rate of false-positives was lower. In addition to a lower recall in general among those screened with DBT, this is beneficial both to the screened women and to society due to a reduced burden of false-positive screening results [1]. The highest value of PPV-1 was observed among those with two consecutive DBTs, mainly due to the very low recall rate in this study group.

Interval breast cancer is essential when evaluating the effectiveness of a new screening technique [1, 15]. The results from the OTST (Oslo Tomosynthesis Screening Trial) and the STORM (Screening with Tomosynthesis OR standard Mammography) trial did not show any reduction in the interval breast cancers among those screened with DBT compared with DM [8, 9]. Our results support these findings. However, an obvious limitation of the results is the small number of cancer cases. Further studies including analyses of interval breast cancer after screening with DBT are thus needed.

The combination of DM after DM/SM+DBT into one study group and DM/SM+DBT after DM into one group creates some heterogeneity within the groups. However, this combination had a minor influence on the results for recall rates, cancer detection rates, and PPV (Appendix 2). This is in accordance with the results from other studies, showing comparable results for DM and SM with regard to early performance measures [5, 12]. Tumors detected with SM+DBT tended to be of lower histologic grade compared with DM+DBT; otherwise, no differences regarding histopathological characteristics were observed between SM+DBT and DM+DBT (Appendix 3). As SM+DBT is now preferred to DM+DBT in screening because of radiation concerns [5, 17, 22], these differences are not considered of major importance in the interpretation of the overall results.

Our study had some limitations. The total time-span of the study was 8 years (2008–2016) resulting in a high number of radiologists with different experience in screen reading, particularly with DBT (Appendix 1). Further, the study groups were screened at different time periods, which might influence the results, among others, because of the change of mammographic equipment and reading environments. There was also some heterogeneity regarding the women comprising the groups as some contributed with more than one observation in the study population. However, the correlation coefficients were estimated to be at most 0.06 for recalls and 0.01 for screen-detected and interval breast cancer; thus, the dependency between observations was considered minor. Furthermore, the number of prior screening examinations available at screen reading and consensus may vary. Lastly, the study was performed in Oslo, which represents a small geographical area with a relatively large population—45,000 women in the target group of the screening program. The women represent a heterogeneous population including women who have moved to Oslo from all parts of Norway, as well as about 20% immigrants [23].

To conclude, our study showed that recall remained lower and screen-detected breast cancer higher for consecutive screening examinations with DBT versus standard DM. We observed higher rates of tumors with smaller tumor diameter and more favorable histological grade and molecular subtype, for tumors detected with DBT compared with DM, while no differences in rates of larger and histologic grade 3 tumors were observed. This questions the clinical implications of the increased detection rate. Whether our results remain stable over several more screening rounds is yet to be investigated. Both a long-term increased rate of screen-detected breast cancer, as well as a reduction in the rate of interval breast cancer, is crucial when considering DBT as a sustainable and more effective tool for mammographic screening than standard DM alone.