Improved functions for nonlinear sequence comparison using SEEKR

Shuang Li,Quinn E. Eberhard,Luke Ni,J. Mauro Calabrese
DOI: https://doi.org/10.1261/rna.080188.124
2024-10-17
RNA
Abstract:SEquence Evaluation through k -mer Representation (SEEKR) is a method of sequence comparison that uses sequence substrings called k -mers to quantify the nonlinear similarity between nucleic acid species. We describe the development of new functions within SEEKR that enable end-users to estimate P- values that ascribe statistical significance to SEEKR-derived similarities, as well as visualize different aspects of k -mer similarity. We apply the new functions to identify chromatin-enriched lncRNAs that contain XIST -like sequence features, and we demonstrate the utility of applying SEEKR on lncRNA fragments to identify potential RNA-protein interaction domains. We also highlight ways in which SEEKR can be applied to augment studies of lncRNA conservation, and we outline the best practice of visualizing RNA-seq read density to evaluate support for lncRNA annotations before their in-depth study in cell types of interest.
biochemistry & molecular biology
What problem does this paper attempt to address?