Big Data Research in Chronic Kidney Disease
Xiao-Xi Zeng,Jing Liu,Liang Ma,Ping Fu
DOI: https://doi.org/10.4103/0366-6999.245275
IF: 6.133
2018-01-01
Chinese Medical Journal
Abstract:With a worldwide estimated prevalence of 8–16%, chronic kidney disease (CKD) is a major noncommunicable disease: it substantially contributes to premature mortality and loss of disability-adjusted life years.[12] The variety in terms of causes, progression mechanisms, and histopathological manifestations creates challenges for early diagnosis and effective interventions with CKD.[3] In addition, CKD is a major drain on health resources: in 2015, CKD and end-stage renal disease (ESRD) spend Medicare (the United States) over $98 billion.[4] China also faces a great financial burden owing to the increasing prevalence of CKD. The definitions and boundaries of big data in health are still debatable.[5] However, the US National Institute of Standards and Technology defines big data as consisting of extensive datasets (in terms of volume, variety, velocity, or variability) that require a scalable architecture for efficient storage, manipulation, and analysis.[6] In addition to conventional data resources (e.g., electronic medical records, observational cohorts, and medical claims), environmental, behavioral, image, wearable device, social media, and multiomics data have been used for data-driven research for CKD. As well as actual physical data, big data refer to the techniques used for analyzing multidimensional data sets,[7] such as artificial intelligence (including machine learning for structured data and natural language processing (NLP) for unstructured data), to reveal clinically relevant information from massive amounts of data.[78] Progress in cross-disciplinary collaborations of medicine, mathematical modeling, machine learning, and bioinformatics has led to novel mechanisms; it has helped in targeting intervention strategies for CKD that can facilitate precise risk predictions, early diagnosis, clinical decision analysis, and cost-effective interventions.[910] GROWTH OF BIG DATA AND INNOVATIVE ANALYTIC METHODS IN CHRONIC KIDNEY DISEASE RESEARCH One clear benchmark for big data is volume. In 2011, the data of US health-care system alone amounted to 150 exabytes (1018). Before long, the data will reach zettabyte and yottabyte levels worldwide.[11] Big data for medical research can be obtained from administrative and claims data, population statistics and disease surveillance data, real-world data, research data, registries, mobile medical devices, and patient-reported information. In addition, data that are not conventionally considered direct health-care information may also be collected and incorporated into medical research and applications, such as search engine queries,[9] social media data,[12] and environmental data.[13] Large-volume databases for CKD research in the United States include the following: the National Health and Nutrition Examination Survey; United States Renal Data System; Kaiser Permanente; and Veterans Affairs Healthcare System. Those databases are widely used and support investigations into the disease burden, risk factors, outcomes, and medical resource consumption with CKD. In China, a national cross-sectional study investigated the prevalence of CKD.[14] The study covered 47,204 participants from 13 provinces; it reported the prevalence as 10.8%, and it demonstrated that CKD is a major public health concern in China. Subsequently, according to data of China's Hospital Quality Monitoring System, the pattern for CKD has changed: diabetes has become the leading cause of CKD.[15] In addition, supported by the China-WHO Biennial Collaborative Projects 2014–2015, the China Kidney Disease Network (CK-NET) was established under the leadership of Drs. Lu-Xia Zhang and Hai-Bo Wang, based on the efforts of Professor Hai-Yan Wang.[16] CK-NET covers over 19.5 million patients from China's class 3 hospitals. CK-NET summarizes patient-level data from standardized discharge summaries; it highlights information that has not previously been reported, such as that related to epidemiology, treatment, costs, and other aspects pertinent to CKD.[16] Besides, large-size biobanks also serve as basic information sources for CKD research. KADOORIE Biobank, which was launched in 2004, has recruited 500,000 people from 10 regions of China (five urban and five rural) to assess the effects of risk factors for common chronic diseases. Its resources range from questionnaires, physical measurements at baseline, and long-term follow-up survey data to laboratory assays (including genotyping, metabolomic, and blood biochemistry data).[17] The Chinese Cohort Study of CKD has enrolled and followed up on 3000 predialysis CKD patients in Mainland China; that cohort study has also been used to explore the underlying mechanisms of CKD and adverse outcomes.[18] All these big data researches characterized CKD epidemiology in China, which is essential for health policymaking and health resource allocation planning. Another feature of integrating big data in CKD could be the variety in data types. One example is the wide use of environmental data. In several studies, long-term exposure to air pollutants was evaluated by means of land-use regression and spatiotemporal models that utilized satellite remote-sensing aerosol optical depth data.[131920] The association between air pollution and incidence of CKD and declining glomerular filtration rate was investigated using a generalized additive logistic model, time-varying linear mixed-effects regression model, and Cox proportional hazard models. The results showed that air pollution could be a nonconventional risk factor in the incidence[1320] and progression of CKD.[1920] With respect to the development of artificial intelligence techniques, clinical notes and images are also used in kidney research. Singh et al.[21] undertook a concept-wide association study of clinical notes to determine new predictions of ESRD. The concepts were extracted from existing clinical notes using NLP tools; they were evaluated as predictors using proportional subdistribution hazards regression. Novel predictors were identified, such as high-dose ascorbic acid and fast food. In another study about predicting the outcomes in kidney transplant patients,[22] Banff lesion scores from the pathology reports and vital signs were extracted from unstructured text fields using proprietary NLP solutions in IBM Watson Content Analytics. Structured data have also been obtained from electronic medical records, the United Network for Organ Sharing database, and hospitals’ own transplant databases. Predictive models for graft loss and mortality have been developed from both structured and unstructured data formats. The results demonstrate that the big data approach significantly adds efficacy in predicting adverse outcomes. By means of digital pathology applied to kidney tissue slides, Pedraza A et al.[23] used convolutional neural network classification to identify glomerulus and nonglomerulus segments. On average, the accuracy with this approach attained 99.95%, which underlines the promising application of machine vision in kidney histopathology. With regard to speed, practice and research have benefited from the real-time collection of patient-level data. The acute kidney injury (AKI) system is one example of such an application based on the clinical data collected in routine clinical practice: the use of real-time data can improve the early detection of AKI and permit timely therapeutic interventions.[24] For advanced CKD, some researchers have developed a smartphone-based self-management system as an adjunct to the normal care. The system collects patients’ behavior elements in real time and generates personalized patient messages based on prebuilt algorithms. If predefined treatment thresholds are met or critical changes occur, alerts are sent to providers.[22] To identify CKD patients with uncontrolled blood pressure (BP), Greenberg et al.[25] proposed a measurement system that incorporates data from the billing system, structured fields in the electronic health records, and free-text physician notes using NLP. To take action toward improving BP control and for completion of additional data, a point-of-care paper worksheet is given to the physician when such patients are presented. Using NLP in some systems has been found to produce benefits with regard to medication errors and control of BP.[2225] Multiomics technology enriches the data sources and helps improve analytic techniques with respect to data variety in CKD research. High-resolution analytic omics platforms (such as genomics, proteomics, peptidomics, transcriptomics, and metabolomics) and machine learning methods have been of tremendous help in the following: elucidating the molecular map of diverse interactions, signaling and regulation, and identifying CKD-related biomarkers and targeting different molecules with high precision.[39] For example, genome-wide association studies (GWASs) based on big data have gradually appeared and been refined. Gene analysis and consequent single-nucleotide polymorphism (SNP) analysis, adjusted for clinical characteristics from the data of 1293 African Americans, have been used to examine the causal association between racial disparities and CKD.[26] A strong association between CKD and apolipoprotein L1 renal-risk variants became evident. With a Chinese Han population of over 10,000 participants, GWAS identified TNFSF13 as a susceptible gene of IgA nephrology.[27] Subsequently, an advanced verification test of that association was conducted among 2000 participants using SNPs and the phenotype level of the TNFSF13 gene.[28] Studies have also focused on the association of renal function with the gut microbiome, amino acid metabolomic profiling, and renal microRNA and RNA profiles. VALUE OF BIG DATA ANALYTICS IN CHRONIC KIDNEY DISEASE RESEARCH The aforementioned studies demonstrate the value of big data in CKD research. Big data can provide essential information about disease burden, molecular mechanisms, novel risk factors, and therapeutic targets. In this way, big data can help toward providing more effective prevention, earlier diagnosis, and more precise interventions. According to McKinsey's report, big data – if used creatively and effectively – may lead to annual reductions of over $300 billion in the US health-care sector; most of that would be in the form of decreased health-care expenditure.[29] Another field where the value of big data has been demonstrated with regard to health policymaking is modeling and health economic evaluation using real-world data. That has been found to be time-saving, and it has potential to optimize clinical pathways and improve hospital management and the medical insurance system. Decision modeling combined with real-world data and medical knowledge can be used to predict the future prevalence of CKD in a given population.[30] That method can also be applied in health economic analysis. The American Diabetes Association and American College of Cardiology/American Heart Association Task Force recommend testing urinalysis and creatinine in patients with diabetes[31] or hypertension.[32] These recommendations are supported by modeling analysis, which has shown these tests to be cost-effective in high-risk populations, including tests for diabetes and hypertension.[33] Data science is widely used in medical insurance. One example of the application in nephrology is the ESRD prospective payment system (PPS) project. Following the report “End-stage Renal Disease Payment System: Results of Research on Case-Mix Adjustment for an Expanded Bundle” (submitted by the University of Michigan Kidney Epidemiology and Cost Center), Centers for Medicare and Medicaid Services (CMS) finalized the case-mix and facility-level adjustments for the ESRD PPS in the CY2011. Further data were collected and analyzed to support later refinement of the CMS ESRD payment system. OPPORTUNITIES AND CHALLENGES FOR BIG DATA IN CHRONIC KIDNEY DISEASE We have now entered the era of big data. Policies and initiatives have been announced to advance biomedical big data research and application in both developed and developing countries.[34] Quite a few instances of this kind of development can be cited, such as the Federal Big Data Research and Development Strategic Plan in the United States and Guidelines for Promoting and Standardizing the Healthcare Big Data Application and Development in China. The situation of CKD in China is characterized by a heavy disease burden in a large developing country; it is one of the most suitable places where biomedical big data should be applied. However, fully utilizing the value of big data to support CKD research presents challenges. First, efforts have been made to encourage data sharing and accessibility to some national health databases, such as the National Scientific Data Sharing Platform for Population and Health; however, platforms where individual-level information is updated in a timely manner and can be freely accessed by scholars need to be constructed or improved. Second, health information is individual sensitive information according to China's “Information Security Technology – Personal Information Security Specification”. Thus, when collecting, transferring, analyzing, sharing, and reporting health-related data, it is necessary to carefully balance the benefit of gains and risk to security and privacy. In China, there are national-level regulations that provide detailed guidance about medical data disclosure. However, data sharing could be more secure, and medical institutions need to be more willing to collaborate with outside partners in performing productive multidisciplinary research. The third challenge lies in the quality of data and techniques of data analysis. For example, Cisek et al.[3] concluded that there is a lack of satisfactory algorithms for multidimensional data modeling in clinically relevant predictive models for accurate elucidation of kidney disease. Fragmentary, diverse, and uncategorized data in mass information storage can result in difficulties when processing and analyzing information islands with complex and heterogeneous structures. Despite all the above challenges, big data for CKD is in an era of opportunity, and it needs mature technology and policy supports. To provide better care and better health through cross-disciplinary efforts, building a database for CKD research is a top priority in addition to collecting and analyzing health-care information from a multidimensional perspective. Financial support and sponsorship This study was supported by grants from Science and Technology Department of Sichuan Province (No. 2016HH0069) and Chengdu Science and Technology Bureau (No. 2015-RK00-00252-ZF).