Developing and Validating a Computable Phenotype for the Identification of Transgender and Gender Nonconforming Individuals and Subgroups

Yi Guo,Xing He,Tianchen Lyu,Hansi Zhang,Yonghui Wu,Xi Yang,Zhaoyi Chen,Merry J. Markham,François Modave,Mengjun Xie,William R. Hogan,Christopher A. Harle,Elizabeth Shenkman,Jiang Bian
DOI: https://doi.org/10.1101/2020.08.04.20168161
2020-01-01
Abstract:AbstractTransgender and gender nonconforming (TGNC) individuals face significant marginalization, stigma, and discrimination. Under-reporting of TGNC individuals is common since they are often unwilling to self-identify. Meanwhile, the rapid adoption of electronic health record (EHR) systems has made large-scale, longitudinal real-world clinical data available to research and provided a unique opportunity to identify TGNC individuals using their EHRs, contributing to a promising routine health surveillance approach. Built upon existing work, we developed and validated a computable phenotype (CP) algorithm for identifying TGNC individuals and their natal sex (i.e., male-to-female or female-to-male) using both structured EHR data and unstructured clinical notes. Our CP algorithm achieved a 0.955 F1-score on the training data and a perfect F1-score on the independent testing data. Consistent with the literature, we observed an increasing percentage of TGNC individuals and a disproportionate burden of adverse health outcomes, especially sexually transmitted infections and mental health distress, in this population.
What problem does this paper attempt to address?