Full-length Protein Sequencing Based on Continuous Digestion Using Non-specific Proteases

Chao Yang,Yi-Chu Shan,Wei-Jie Zhang,Zhong-Peng Dai,Li-Hua Zhang,Yu-Kui Zhang
DOI: https://doi.org/10.6023/a21010025
2021-01-01
Acta Chimica Sinica
Abstract:Determining the complete sequence of the protein is helpful to analyze the structure of the protein and reveal the biological function of the protein.In traditional "bottom-up" proteomic strategy, database searching is used to identify sequences of peptides and proteins analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS).It is impossible to identify proteins with unknown sequences through database searching, so de novo sequencing is essential for protein characterization.To increase the accuracy and coverage of protein sequencing, a de novo protein sequencing method based on continuous digestion using various non-specific proteases has been developed.A continuous digestion device was constructed, and a variety of non-specific proteases were used to continuously digest the protein.Taking advantage of the non-specific cleavage sites of non-specific proteases, the complementarity of peptides produced at different time and by different kinds of proteases, the type and overlapping degree of digested peptides were improved.The sequence coverage of peptides after continuous digestion by each protease can reach 100%.Finally, a sequence assembly algorithm was developed to assemble the peptides obtained by de novo sequencing.At first, the candidate peptide sequences were splitted into sequence tags which contain 7 amino acids, and then the most frequently occurring sequence tag was chosen as the seed sequence.Afterwards, the seed sequence was automatically or manually extended to the N-terminal end and C-terminal end respectively according to the scores of sequence tags.Finally, the complete protein sequence was successfully assembled.The developed method was applied to the de novo sequencing of bovine serum albumin (BSA) and monoclonal antibody Herceptin.Excluding leucine and isoleucine, full-length de novo sequencing was achieved with 100% accuracy for BSA and Herceptin light chain.Accuracy of the sequenced Herceptin heavy chain was 99.7%.The de novo sequencing strategy based on continuous digestion of proteins using non-specific proteases can be applied to de novo sequencing of proteins with unknown sequences or quality control of monoclonal antibody drugs.
What problem does this paper attempt to address?