PASS: De novo assembler for short peptide sequences

René L. Warren
DOI: https://doi.org/10.48550/arXiv.2208.05598
2022-08-11
Abstract:The ability to characterize proteins at sequence-level resolution is vital to biological research. Currently, the leading method for protein sequencing is by liquid chromatography mass spectrometry (LC-MS) whereas proteins are reduced to their constituent peptides by enzymatic digest and subsequently analyzed on an LC-MS instrument. The short peptide sequences that result from this analysis are used to characterize the original protein content of the sample. Here we present PASS, a de novo assembler for short peptide sequences that can be used to reconstruct large portions of protein targets, a step that can facilitate downstream sample characterization efforts. We show how, with adequate peptide sequence coverage and little-to-no additional sequence processing, PASS reconstructs protein sequences into relatively large (100 amino acid or longer) contigs having high (93.1 - 99.1%) sequence identity to reference antibody light and heavy chain proteins. Availability: PASS is released under the GNU General Public License Version 3 (GPLv3) and is publicly available from <a class="link-external link-https" href="https://github.com/warrenlr/PASS" rel="external noopener nofollow">this https URL</a>
Genomics,Biomolecules
What problem does this paper attempt to address?