Tokenvizz: GraphRAG-Inspired Tokenization Tool for Genomic Data Discovery and Visualization

Cerag Oguztuzun,Zhenxiang Gao,Rong Xu
DOI: https://doi.org/10.1101/2024.12.03.626631
2024-12-06
Abstract:One of the primary challenges in biomedical research is the interpretation of complex genomic relationships and the prediction of functional interactions across the genome. Tokenvizz is a novel tool for genomic analysis that enhances data discovery and visualization by combining GraphRAG-inspired tokenization with graph-based modeling. In Tokenvizz, genomic sequences are represented as graphs, where sequence k-mers (tokens) serve as nodes and attention scores as edge weights, enabling researchers to visually interpret complex, non-linear relationships within DNA sequences. Through a web-based visualization interface, researchers can interactively explore these genomic relationships and extract biologically meaningful insights about regulatory patterns and functional elements. Applied to promoter-enhancer interaction prediction tasks, Tokenvizz outperformed traditional sequential models while providing interpretable insights into genomic features, demonstrating the advantage of graph-based representations for biological discovery.
Bioinformatics
What problem does this paper attempt to address?