TermineR: Extracting information on endogenous proteolytic processing from shotgun proteomics data
Miguel Cosenza-Contreras,Adrianna Seredynska,Daniel Vogele,Niko Pinter,Eva Brombacher,Ruth Fiestas Cueto,Thien-Ly Julia Dinh,Patrick Bernhard,Manuel Rogg,Junwei Liu,Patrick Willems,Simon Stael,Pitter F Huesgen,E Wolfgang Kuehn,Clemens Kreutz,Christoph Schell,Oliver Schilling
DOI: https://doi.org/10.1002/pmic.202300491
PROTEOMICS
Abstract:State-of-the-art mass spectrometers combined with modern bioinformatics algorithms for peptide-to-spectrum matching (PSM) with robust statistical scoring allow for more variable features (i.e., post-translational modifications) being reliably identified from (tandem-) mass spectrometry data, often without the need for biochemical enrichment. Semi-specific proteome searches, that enforce a theoretical enzymatic digestion to solely the N- or C-terminal end, allow to identify of native protein termini or those arising from endogenous proteolytic activity (also referred to as "neo-N-termini" analysis or "N-terminomics"). Nevertheless, deriving biological meaning from these search outputs can be challenging in terms of data mining and analysis. Thus, we introduce TermineR, a data analysis approach for the (1) annotation of peptides according to their enzymatic cleavage specificity and known protein processing features, (2) differential abundance and enrichment analysis of N-terminal sequence patterns, and (3) visualization of neo-N-termini location. We illustrate the use of TermineR by applying it to tandem mass tag (TMT)-based proteomics data of a mouse model of polycystic kidney disease, and assess the semi-specific searches for biological interpretation of cleavage events and the variable contribution of proteolytic products to general protein abundance. The TermineR approach and example data are available as an R package at https://github.com/MiguelCos/TermineR.