The theoretical analysis of sequencing bioinformatics algorithms and beyond

Paul Medvedev
DOI: https://doi.org/10.48550/arXiv.2205.01785
2022-11-15
Abstract:The theoretical analysis of performance has been an important tool in the engineering of algorithms in many application domains. Its goals are to predict the empirical performance of an algorithm and to be a yardstick that drives the design of novel algorithms that perform well in practice. While these goals have been achieved in many instances, they have not been achieved ubiquitously across crucial application domains. I provide a case study in the area of sequencing bioinformatics, an inter-disciplinary field that uses algorithms to extract biological meaning from genome sequencing data. In particular, I give three concrete examples: two showing how theoretical analysis has failed to achieve its goals and one showing how it has been successful. I will then catalog some of the challenges of applying theoretical analysis to sequencing bioinformatics, argue why empirical analysis is not enough, and give a vision for improving the relevance of theoretical analysis to sequencing bioinformatics. By recognizing the problem, understanding its roots, and providing potential solutions, this work can hopefully be a crucial first step towards making theoretical analysis more relevant in sequencing bioinformatics and potentially other fast-paced application domains.
Data Structures and Algorithms
What problem does this paper attempt to address?