Development of a SAS® Macro for Automated Data Cleaning of Major Outcomes of Interest in Hematopoietic Cell Transplantation

Peigang Li,Zhiwei Wang
2011-01-01
Abstract:Previously we have developed a set of SAS macros to run against clean outcomes data so that univariate summary statistics can be automatically generated for major outcomes of interest, such as relapse, treatment related mortality, progression/disease free survival and overall survival (Li, Zhu and Chen, MWSUG Paper 177-2010). Among the outcomes, the data cleaning of relapse is the most time-consuming due to (a) evolving definitions of relapse; (b) applicable monitoring methodologies for acute or chronic hematological diseases (National Cancer Institute Relapse Workshop November 2009); (c) multiple sources of relapse information from comprehensive report forms (CRFs), transplant essential data (TED) forms, and legacy forms; (d) insufficient reporting by the transplant centers. We have designed and developed a SAS macro to automate and standardize the process of relapse cleaning for Acute Lymphoblastic Leukemia (ALL), Acute Myelogenous Leukemia (AML), Chronic Myelogenous Leukemia (CML), and Myelodysplastic Syndrome (MDS). We validated relapse status against clean outcomes data from early studies. Both sensitivity and specificity have achieved 99% from the most recent test, and a few misclassified non-relapse cases are likely due to hardcoding in early studies. The initial design required input data to include relapse-related key variables in addition to the patient unique identifiers. The final version only requires patient unique identifiers. The macro will greatly speed up CIBMTR studies from protocol development to creation of statistical analysis datasets.
What problem does this paper attempt to address?