Supervised non-negative matrix factorization on cell-free DNA fragmentomic features enhances early cancer detection
Trung Hieu Tran,Ngoc Tan Pham,Van Thien Chi Nguyen,Dac Ho Vo,Thi Hue Hanh Nguyen,Thi Trang Tran,Thanh Truong Tran,Truong Dang Huy Vo,Thi Huyen Dao,Huu Tam Phuc Nguyen,Thi Van Phan,Thi Minh Thi Ha,Thi Dieu Huong Ngo,Nhat-Huy Tran,Nhat-Thang Tran,Thanh Quang Hoang,Viet Binh Nguyen,Van Cuong Le,Xuan Chung Nguyen,Thi Minh Phuong Nguyen,Van Hung Nguyen,Nu Thien Nhat Tran,Thi Ngoc Quynh Dang,Manh Hoang Tran,Phuc Nguyen Nguyen,Thi Anh Tuyet Pham,Duy Long Vo,Thuy Nguyen Doan,Viet Hai Nguyen,Quang Dat Tran,Quang Thong Dang,Le Minh Quoc Ho,Vu Tuan Anh Nguyen,Sao Trung Nguyen,Hoai-Nghia Nguyen,Le Son Tran,Hoa Giang,Minh Duy Phan,Trong Hieu Nguyen
DOI: https://doi.org/10.1101/2024.12.20.629316
2024-12-20
Abstract:Background
Cell-free circulating DNA (cfDNA) fragments exhibit non-random patterns in their length (FLEN), end-motif (EM), and distance to nucleosome position (ND). While these cfDNA features have shown promise as inputs for machine learning and deep learning models in early cancer detection, most studies utilize them as raw inputs, overlooking the potential benefits of pre-processing to extract cancer-specific features. This study aims to enhance cancer detection accuracy by developing a novel approach to feature extraction from cfDNA fragmentomics.
Methods
We implemented a supervised non-negative matrix factorization (SNMF) algorithm to generate embedding vectors capturing cancer-specific signals within cfDNA fragmentomic features. These embeddings served as input for a machine learning model to classify cancer patients from healthy individuals.
Results
We validated our framework using two datasets: an in-house cohort of 431 cancer patients and 442 healthy individuals (dataset 1), and a published cohort comprising 90 hepatocellular carcinoma (HCC) patients and 103 individuals with cirrhosis or hepatitis B (dataset 2). In dataset 1, we achieved an AUC of 94% in pan-cancer detection. In dataset 2, our framework achieved an AUC of 100% for HCC vs healthy classification, 99% for HCC vs non-HCC patients classification, and 96% for identifying HCC patients among a mixed group of non-HCC patients and healthy donors.
Conclusion
This study demonstrates the efficiency of SNMF-transformed features in improving both pan-cancer detection and specific HCC detection. Our approach offers a significant advancement in leveraging cfDNA fragmentomics for early cancer detection, potentially enhancing diagnostic accuracy in clinical settings.
Bioinformatics