Deep Novo A+: Improving the Deep Learning Model for De Novo Peptide Sequencing with Additional Ion Types and Validation Set

Page: [949 - 954] Pages: 6

  • * (Excluding Mailing and Handling)

Abstract

Background: De novo peptide sequencing is one of the key technologies in proteomics, which can extract peptide sequences directly from tandem mass spectrometry (MS/MS) spectra without any protein databases. Since the accuracy and efficiency of de novo peptide sequencing can be affected by the quality of the MS/MS data, the DeepNovo method using deep learning for de novo peptide sequencing is introduced, which outperforms the other state-of-the-art de novo sequencing methods.

Objective: For superior performance and better generalization ability, additional ion types of spectra should be considered and the model of DeepNovo should be adaptive.

Methods: Two improvements are introduced in the DeepNovo A+ method: a_ions are added in the spectral analysis, and the validation set is used to automatically determine the number of training epochs.

Results: Experiments show that compared to the DeepNovo method, the DeepNovo A+ method can consistently improve the accuracy of de novo sequencing under different conditions.

Conclusion: By adding a_ions and using the validation set, the performance of de novo sequencing can be improved effectively.

Keywords: MS/MS spectra, de novo peptide sequencing, DeepNovo, deep learning, validation set, fragment ions.

Graphical Abstract

[1]
Johnson RS, Biemann K. The primary structure of thioredoxin from Chromatium vinosum determined by high-performance tandem mass spectrometry. Biochemistry 1987; 26(5): 1209-14.
[http://dx.doi.org/10.1021/bi00379a001] [PMID: 3567166]
[2]
Pandey Akhilesh. Proteomics to study genes and genomes. Nature 2000.
[3]
Catusse J, Strub JM, Job C, Van Dorsselaer A, Job D. Proteome-wide characterization of sugarbeet seed vigor and its tissue specific expression. Proc Natl Acad Sci USA 2008; 105(29): 10262-7.
[http://dx.doi.org/10.1073/pnas.0800585105] [PMID: 18635686]
[4]
Jorrín-Novo JV, Pascual J, Sánchez-Lucas R, et al. Fourteen years of plant proteomics reflected in Proteomics: moving from model species and 2DE-based approaches to orphan species and gel-free platforms. Proteomics 2015; 15(5-6): 1089-112.
[http://dx.doi.org/10.1002/pmic.201400349] [PMID: 25487722]
[5]
Liebler DC. Analytical Proteomics Approaches to Analysis of Protein Modifications: Tools for Studying Proteome-Environment Interactions. In: Hamadeh Hisham K, Afshari Cynthia A, EdsToxi-Cogenomics: Principles and Applications. Wiley 2005; pp. 283-97.
[6]
Hughes C, Ma B, Lajoie GA. De novo sequencing methods in proteomics. Methods Mol Biol 2010; 604: 105-21.
[PMID: 20013367]
[7]
Eng JK, McCormack AL, Yates JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 1994; 5(11): 976-89.
[http://dx.doi.org/10.1016/1044-0305(94)80016-2] [PMID: 24226387]
[8]
Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999; 20(18): 3551-67.
[http://dx.doi.org/10.1002/(SICI)1522-2683(19991201)20:18<3551:AID-ELPS3551>3.0.CO;2-2] [PMID: 10612281]
[9]
Craig R, Beavis RC. A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun Mass Spectrom 2003; 17(20): 2310-6.
[http://dx.doi.org/10.1002/rcm.1198] [PMID: 14558131]
[10]
Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004; 20(9): 1466-7.
[http://dx.doi.org/10.1093/bioinformatics/bth092] [PMID: 14976030]
[11]
Cox J, Neuhauser N, Michalski A, Scheltema RA, Olsen JV, Mann M. Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 2011; 10(4): 1794-805.
[http://dx.doi.org/10.1021/pr101065j] [PMID: 21254760]
[12]
Fu Y, Yang Q, Sun R, et al. Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry. Bioinformatics 2004; 20(12): 1948-54.
[http://dx.doi.org/10.1093/bioinformatics/bth186] [PMID: 15044235]
[13]
Wang LH, Li DQ, Fu Y, et al. pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry. Rapid Commun Mass Spectrom 2007; 21(18): 2985-91.
[http://dx.doi.org/10.1002/rcm.3173] [PMID: 17702057]
[14]
Kim S, Pevzner P A MS-GFMS-GF. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 2014; 5: 5277.
[http://dx.doi.org/10.1038/ncomms6277] [PMID: 25358478]
[15]
Zhang J, Xin L, Shan B, Chen W, Xie M, Yuen D, et al. Peaks db: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 2012.
[16]
Bern M, Kil YJ, Becker C. Byonic: advanced peptide and protein identification software. Curr Protoc Bioinformatics 2012.
[http://dx.doi.org/10.1002/0471250953.bi1320s40]
[17]
Allmer J. Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert Rev Proteomics 2011; 8(5): 645-57.
[http://dx.doi.org/10.1586/epr.11.54] [PMID: 21999834]
[18]
Ma B, Johnson R. De novo sequencing and homology searching. Mol Cell Proteomics 2011; 11(2)
[19]
Taylor JA, Johnson RS. Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 1997; 11(9): 1067-75.
[http://dx.doi.org/10.1002/(SICI)1097-0231(19970615)11:9<1067:AID-RCM953>3.0.CO;2-L] [PMID: 9204580]
[20]
Dancík V, Addona TA, Clauser KR, Vath JE, Pevzner PA. De novo peptide sequencing via tandem mass spectrometry. J Comput Biol 1999; 6(3-4): 327-42.
[http://dx.doi.org/10.1089/106652799318300] [PMID: 10582570]
[21]
Ma B, Zhang K, Hendrie C, et al. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 2003; 17(20): 2337-42.
[http://dx.doi.org/10.1002/rcm.1196] [PMID: 14558135]
[22]
Fischer B, Roth V, Roos F, et al. NovoHMM: a hidden Markov model for de novo peptide sequencing. Anal Chem 2005; 77(22): 7265-73.
[http://dx.doi.org/10.1021/ac0508853] [PMID: 16285674]
[23]
Frank A, Pevzner P. PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem 2005; 77(4): 964-73.
[http://dx.doi.org/10.1021/ac048788h] [PMID: 15858974]
[24]
Frank AM, Savitski MM, Nielsen ML, Zubarev RA, Pevzner PA. De novo peptide sequencing and identification with precision mass spectrometry. J Proteome Res 2007; 6(1): 114-23.
[http://dx.doi.org/10.1021/pr060271u] [PMID: 17203955]
[25]
Chi H, Sun RX, Yang B, et al. pNovo: de novo peptide sequencing and identification using HCD spectra. J Proteome Res 2010; 9(5): 2713-24.
[http://dx.doi.org/10.1021/pr100182k] [PMID: 20329752]
[26]
Chi H, Chen H, He K, et al. pNovo+: de novo peptide sequencing using complementary HCD and ETD tandem mass spectra. J Proteome Res 2013; 12(2): 615-25.
[http://dx.doi.org/10.1021/pr3006843] [PMID: 23272783]
[27]
Jeong K, Kim S, Pevzner PA. UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics 2013; 29(16): 1953-62.
[http://dx.doi.org/10.1093/bioinformatics/btt338] [PMID: 23766417]
[28]
Ma B. Novor: real-time peptide de novo sequencing software. J Am Soc Mass Spectrom 2015; 26(11): 1885-94.
[http://dx.doi.org/10.1007/s13361-015-1204-0] [PMID: 26122521]
[29]
Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 2012; 25: 1097-105.
[30]
Hinton G, Deng L, Yu D, Dahl G, Mohamed AR, Jaitly N, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process Mag 2012; 29: 82-97.
[http://dx.doi.org/10.1109/MSP.2012.2205597]
[31]
Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 2014; 27(2): 3104-12.
[32]
Tran NH, Zhang X, Xin L, Shan B, Li M. De novo peptide sequencing by deep learning. Proc Natl Acad Sci USA 2017; 114(31): 8247-52.
[http://dx.doi.org/10.1073/pnas.1705691114] [PMID: 28720701]
[33]
Seidel G, Meierhofer D, Şen NE, Guenther A, Krobitsch S, Auburger G. Quantitative global proteomics of Yeast PBP1 deletion mutants and their stress responses identifies glucose metabolism, mitochondrial, and stress granule changes. J Proteome Res 2017; 16(2): 504-15.
[http://dx.doi.org/10.1021/acs.jproteome.6b00647] [PMID: 27966978]
[34]
Zhang J, Xin L, Shan B, et al. De Novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Prot 2012; 11(4): M111.
[35]
Devabhaktuni A, Lin S, Zhang L, et al. TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat Biotechnol 2019; 37(4): 469-79.
[http://dx.doi.org/10.1038/s41587-019-0067-5] [PMID: 30936560]