A Semi-Supervised Framework for Detecting and Classifying Human Transposon LINE-1 Insertions.

Xinxing Yan,Zhongmeng Zhao,Xuanping Zhang,Jiayin Wang
DOI: https://doi.org/10.1109/aim.2019.8868714
2019-01-01
Abstract:Most of the repetitive elements in the human genome are associated with retrotransposons, which have wide-ranging impacts on complex traits and diseases. Detecting human active transposon LINE-1 insertions is a tricky computational problem because of their repetitiveness and similarities. Existing methods are not working well for identifying large-scale insertion events, or rely on a small number of annotated samples, which often leads to high false positive rates. In this paper, we proposed a semi-supervised framework, named L1Detector, to improve the performance of the detection and classification processes. The core of L1Detector was a shallow neural network. This framework first extracted multiple features around the candidate insertion sites. Then, it took the advantages of an existing machine learning model to compute the interactions among the features. We further improved this model by introducing a semi-supervised learning framework, which facilitated to handle the large-scale unlabeled data. In addition, this framework enhanced a comprehensively and accurately detection on the polymorphic insertion events and insertion types. We conducted a series of simulation experiments to evaluate the performance of the proposed framework and compared it to a popular detection method. The experiment results demonstrated that the proposed framework often provided more comprehensive and effective results.
What problem does this paper attempt to address?