Evaluating the impact of new imaging tests: promises and pitfalls.
D. Pryma,A. DeMichele,D. Mankoff
DOI: https://doi.org/10.1093/jnci/djs488
2012-12-19
Journal of the National Cancer Institute
Abstract:The introduction of a new diagnostic imaging test into cancer clinical practice creates both anticipation and confusion. The use of 18F-fluorodexoyglucose (18FDG) positron emission tomography (PET)/computed tomography (CT) in breast cancer is a good example. Early studies indicated high sensitivity for regional nodal metastases (1), whereas later studies in patient populations more representative of early breast cancer diagnosis showed much poorer performance (2). More recently, data have supported the potential utility of 18FDG PET/CT in locally advanced breast cancer (3,4); based on these data, current clinical guidelines recommend 18FDG PET/CT as an optional test in stage III breast cancer (5). Nevertheless, the data in support of the use of 18FDG PET/CT for locally advanced breast cancer are from relatively small and largely retrospective series, and controversy remains about the utility of 18FDG PET/CT in the initial evaluation of patients with newly diagnosed breast cancer at high risk for recurrence. In this issue of the Journal, Groheux et al. (6) shed some additional light on this topic. In one of the largest series to date, the authors describe their prospective evaluation of 18FDG-PET/CT in consecutive patients over the course of 5 years with newly diagnosed stage II or III breast cancer. Restricting inclusion in their series to women without positive sentinel lymph nodes, they found that 18FDG PET/CT detected N3 disease in 16% of patients overall and was positive for suspected M1 disease in 17% to 47% of patients with stage IIB to stage IIIC breast cancer. Importantly, in the subset of patients with long-term follow up, 18FDG PET/ CT evidence of M1 disease (pathologically confirmed in 40% of patients) was predictive of disease-specific survival. The Groheux et al. study (6) was a long-term undertaking and supports the utility of 18FDG-PET/CT in locally advanced breast cancer and possibly in stage IIB and IIIA disease. Although the Groheux et al. study (6) adds considerably to the data supporting the use of FDG PET/CT in stage IIB to stage IIIC disease, the study also illustrates some of the difficulties in thoroughly investigating the utility and prognostic value of diagnostic tests. Clear guidance on optimal evaluation and reporting of diagnostic tests has been published (7–9). These guidelines form an appropriate and robust directive for diagnostic testing and have become the standard for diagnostic laboratory assays, but they are difficult, and in some cases impossible, to fully satisfy for diagnostic imaging studies. In the early evaluation of a new diagnostic imaging modality, it is common to collect data prospectively and then determine whether results correlate with endpoints of interest in a post hoc analysis. Although this data-mining approach is appropriate in hypothesis generating and for secondary endpoints, it is critical for primary evaluation of prognostic biomarkers that the putative diagnostic criteria be prospectively defined and that the study be powered to test these variables. Retrospective definition of diagnostic criteria that are supported by the data introduces a high likelihood of bias in favor of the experimental test, whereas an underpowered study can yield incomplete or inaccurate results. For laboratory-based diagnostic tests, conducting follow-up on studies with prospectively determined diagnostic criteria can be carried out ethically and with relatively modest cost. However, for an imaging study, costs, logistics, and potential medical risks make large-scale, prospective, phase III studies a challenge. For example, whereas a study of a laboratory test often requires little direct subject participation other than perhaps the removal of an additional tube of blood during routine phlebotomy or consent for use of archival pathologic source material, imaging trials require significant subject time commitment and may involve the need for an invasive biopsy to confirm results, adding risk and expense. Whereas enrollment in drug trials is frequently brisk when there is the potential for clinical benefit from the experimental therapeutic, blinded diagnostic imaging trials often struggle to meet accrual goals because the patient is being asked to participate with no possibility of direct benefit. Thus, because of a variety of factors, but to a great degree because of cost and feasibility, many evaluations of diagnostic imaging tests are undertaken with the experimental test being performed as part of standard of care, as was done in this case. Can a new diagnostic test be accurately assessed in the context of clinical care? Although such studies may be prospective, they are not randomized, and there is often an incomplete reference standard. There is bias introduced because only those patients who choose to undergo the “experimental” test as part of their standard of care are eligible for the study. Most important, in the absence of blinding, the “experimental” imaging test results are provided to patients and providers, who are often compelled to use the results of the experimental test to direct decisions on whether to use tests that are part of the reference standard (eg, whether or not to get another imaging test or a biopsy to confirm the results of the modality under study) or to direct therapy. In the Groheux et al. article (6), for example, standard imaging was specifically chosen in many cases to clarify the 18FDG-PET/CT findings rather than act as a standard companion to it. Furthermore, the experimental test could be used to direct therapy, whereas the ideal study protocol would prohibit disclosure of results of the experimental test to prevent alteration of subsequent management. However, blinding 12