Improvements on a Privacy-Protection Algorithm for DNA Sequences with Generalization Lattices

Guang Li,Yadong Wang,Xiaohong Su
DOI: https://doi.org/10.1016/j.cmpb.2011.02.013
IF: 6.1
2012-01-01
Computer Methods and Programs in Biomedicine
Abstract:When developing personal DNA databases, there must be an appropriate guarantee of anonymity, which means that the data cannot be related back to individuals. DNA lattice anonymization (DNALA) is a successful method for making personal DNA sequences anonymous. However, it uses time-consuming multiple sequence alignment and a low-accuracy greedy clustering algorithm. Furthermore, DNALA is not an online algorithm, and so it cannot quickly return results when the database is updated. This study improves the DNALA method. Specifically, we replaced the multiple sequence alignment in DNALA with global pairwise sequence alignment to save time, and we designed a hybrid clustering algorithm comprised of a maximum weight matching (MWM)-based algorithm and an online algorithm. The MWM-based algorithm is more accurate than the greedy algorithm in DNALA and has the same time complexity. The online algorithm can process data quickly when the database is updated.
What problem does this paper attempt to address?