Abstract:Function or disfunction of proteins depends on the primary structures, and protein sequencing, which provides key information on protein related biological processes and disease, plays important roles in biological, biomedical, clinical research and application. To obtain the precise protein sequences, researchers developed different methods over the past few decades, and these methods include conventional methods and newly methods. The former includes Edman degradation and mass spectrometry (MS), and the latter includes single-molecule detection, nanopore and other lately developed techniques. In the 1960s, the classic Edman degradation was firstly developed for sequencing protein molecules from N-terminus using cyclic chemical reaction. Afterwards, solid-state, and gas-state Edman degradation was further developed that still plays a significant role in the modern technologies. This review discusses the principle and limits of Edman degradation. Moreover, we discussed advantages and shortcomings of MS-based approaches, which are the current standard methods for protein sequencing applications. Single-molecule approaches could bring revolution in proteomics, realizing high sensitivity for the low-abundance protein detection and single-cell proteomics. With the development of the single-molecule nucleic acid sequencing, four kinds of basic groups of DNA/RNA can be effectively detected using label-free or fluorescence labelling strategies. However, it is still a challenge to label and analyze all twenty kinds of amino acid residues. Moreover, sensitive optical detection has been utilized for high throughput protein sequencing using fluorescence labelling. In this approach, selected residues of peptides were labelled, and the C-terminus was anchored onto the glass substrate. N-terminus was degraded through Edman cycles. Finally, the sequence can be analyzed through the wide-field fluorescence signals. This method has potential of large-scale, sensitive, and parallel detection. We have discussed its principle and characteristic features in detail. Nanopore, including biological nanopore and solid-state nanopore, has been emerged as powerful technologies for protein sequencing. Nanopore can provide single-molecule sensing interface and controlled nano-confined space enabling ultimate sensitivity and high spatiotemporal resolution. The mechanism of nanopore-based technologies depends on the interaction of functional group and the nanopore, inducing the current modulations. The information of peptides can be obtained by monitoring the ionic current responses. Arrayed nanopores have potential of high-throughput detection at lowabundance. It is still in early stage of development and some challenges need to be addressed. As "finger-print" signal, Raman spectrum is an ideal candidate for protein sequencing. However, very weak signals can significantly restrict its application, especially at low concentration of target molecule. Surface enhanced Raman spectroscopy (SERS) can enhance the Raman signal to achieve the detection on the scale of a single molecule. Combination of the SERS and nanopore has demonstrated powerful capability of label-free detection of ten kinds of amino acids. Moreover, this method offers a new strategy for protein sequencing. Comparing with the weak Raman signal, fluorescence signals are more accessible, even on the level of single molecule. Several molecular dynamics (MD) simulations have been discussed to show possibility of fluorescence labelled protein sequencing within nanopore. Nevertheless, some drawbacks need to be addressed, especially the high-cost fabrication of nanopore and translocation of proteins through a pore. Specifically, this review also discusses the future challenges as well as summarize recent efforts to break the bottleneck of the current protein sequencing, promoting development of medical treatment, disease diagnosis and related fields.

Whole protein sequencing and quantification without proteolysis, terminal residue cleavage, or purification: A computational model

Single molecule identification and quantification of whole proteins without purification, proteolysis, or labeling: a computational model

A minimalist binary/digital approach to large-scale single molecule protein identification with optically labeled tRNAs and multiple carboxypeptidases and its extension to peptide sequencing

Amplifiable protein identification via residue-resolved barcoding and composition code counting

Recent Advances in Protein Sequencing

A Theoretical Justification for Single Molecule Peptide Sequencing

Peptide Sequencing Via Protein Language Models

A fully automated system with online sample loading, isotope dimethyl labeling and multidimensional separation for high-throughput quantitative proteome analysis.

Paving the way to single-molecule protein sequencing

Online Nanoflow Reversed Phase-Strong Anion Exchange-Reversed Phase Liquid Chromatography-Tandem Mass Spectrometry Platform for Efficient and In-Depth Proteome Sequence Analysis of Complex Organisms

Full-length Protein Sequencing Based on Continuous Digestion Using Non-specific Proteases

Beyond Volume Exclusion-Nanopore-Based Protein Sequencing

Real-time dynamic single-molecule protein sequencing on an integrated semiconductor device

Multi-pass, single-molecule nanopore reading of long protein strands

Single-cell protein analysis by mass-spectrometry

A strategy to load, rethread and read protein sequences through a nanopore

Generating Protein Sequence Tags by Combining Cone and Conventional Collision Induced Dissociation in a Quadrupole Time-of-flight Mass Spectrometer

Highly Robust de Novo Full-Length Protein Sequencing

Biological Nanopore Approach for Single‐Molecule Protein Sequencing

A generalized protein identification method for novel and diverse sequencing technologies

Genome-scale Proteome Quantification by DEEP SEQ Mass Spectrometry