Abstract:Background To reduce cancer mortality and improve cancer outcomes, it is critical to understand the various cancer risk factors (RFs) across different domains (e.g., genetic, environmental, and behavioral risk factors) and levels (e.g., individual, interpersonal, and community levels). However, prior research on RFs of cancer outcomes, has primarily focused on individual level RFs due to the lack of integrated datasets that contain multi-level, multi-domain RFs. Further, the lack of a consensus and proper guidance on systematically identify RFs also increase the difficulty of RF selection from heterogenous data sources in a multi-level integrative data analysis (mIDA) study. More importantly, as mIDA studies require integrating heterogenous data sources, the data integration processes in the limited number of existing mIDA studies are inconsistently performed and poorly documented, and thus threatening transparency and reproducibility. Methods Informed by the National Institute on Minority Health and Health Disparities (NIMHD) research framework, we (1) reviewed existing reporting guidelines from the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) network and (2) developed a theory-driven reporting guideline to guide the RF variable selection, data source selection, and data integration process. Then, we developed an ontology to standardize the documentation of the RF selection and data integration process in mIDA studies. Results We summarized the review results and created a reporting guideline—ATTEST—for reporting the variable selection and data source selection and integration process. We provided an ATTEST check list to help researchers to annotate and clearly document each step of their mIDA studies to ensure the transparency and reproducibility. We used the ATTEST to report two mIDA case studies and further transformed annotation results into sematic triples, so that the relationships among variables, data sources and integration processes are explicitly standardized and modeled using the classes and properties from OD-ATTEST. Conclusion Our ontology-based reporting guideline solves some key challenges in current mIDA studies for cancer outcomes research, through providing (1) a theory-driven guidance for multi-level and multi-domain RF variable and data source selection; and (2) a standardized documentation of the data selection and integration processes powered by an ontology, thus a way to enable sharing of mIDA study reports among researchers. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This study was supported in part by the National Institute of Health (NIH) awards UL1TR001427 and R01CA246418 and Patient-Centered Outcomes Research Institute (PCORI) award ME-2018C3-14754. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or PCORI. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: Our study doesn't require IRB approval. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes All the data needed are included in the manuscript. * ACS : American Cancer Society BRFSS : Behavioral Risk Factor Surveillance System EQUATOR : Enhancing the QUAlity and Transparency Of health Research FCDS : Florida Cancer Data System mIDA Multi-level Integrative Data Analysis NIH : National Institute of Health NIMHD : Minority Health and Health Disparities OD-ATTEST : Ontology for the Documentation of Variable and Data Source Selection and Integration Process RF : Risk Factor US : United States RUCA : Rural-Urban Commuting Area NCHS : National Center for Health Statistics BFO : Basic Formal Ontology NCBO : National Center for Biomedical Ontology RDF : Resource Description Framework GRIPS : Genetic RIsk Prediction Studies COHERE : Checklist for One Health Epidemiological Reporting of Evidence EHR Electronic Health Records OBI : Ontology for Biomedical Investigations IAO : Information Artifact Ontology NCIt : National Cancer Institute Thesaurus STATO : Statistics Ontology SIO : Semanticscience Integrated Ontology CDM : Common Data Model PCORnet : The national Patient-Centered Clinical Research Network

Shallow Angle Wave Profiling LIDAR

A discovery platform to improve visibility of linked data across Australia.

Developing Metadata to Organize Public Health Datasets.

From Planning Stage To FAIR Data: A Practical Metadatasheet For Biomedical Scientists

From Planning Stage Towards FAIR Data: A Practical Metadatasheet For Biomedical Scientists

Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies

INSPIRE datahub: a pan-African integrated suite of services for harmonising longitudinal population health data using OHDSI tools

A novel framework for assessing metadata quality in epidemiological and public health research settings

An Ontology-Based Documentation of Data Discovery and Integration Process in Cancer Outcomes Research.

A metadata framework for computational phenotypes

An Ontology-based Approach to Guide and Document Variable and Data Source Selection and Data Integration Process to Support Integrative Data Analysis in Cancer Outcomes Research

FAIR data sharing: The roles of common data elements and harmonization

Initiatives, Concepts, and Implementation Practices of FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles in Health Data Stewardship Practice: Protocol for a Scoping Review

Enabling data sharing and utilization for African population health data using OHDSI tools with an OMOP-common data model

Sociome Data Commons: A scalable and sustainable platform for investigating the full social context and determinants of health

Initiatives, Concepts, and Implementation Practices of FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles in Health Data Stewardship Practice: Protocol for a Scoping Review (Preprint)

Metadata harmonization-Standards are the key for a better usage of omics data for integrative microbiome analysis

Seamless EMR data access: Integrated governance, digital health and the OMOP-CDM

The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset

Advancing microbiome research through standardized data and metadata collection: introducing the Microbiome Research Data Toolkit

Enabling FAIR data in Earth and environmental science with community-centric (meta)data reporting formats