The identification of protein-coding genes is currently based on the merging of evidence and predictions from a variety of databases that may themselves contain inaccurate and partial information. We have developed a method for mapping accurate interpretations of protein MS-MS data to the genome. This approach enables verification of genes, exons, transcripts and variant transcripts as well as the de novo discovery of novel protein-coding genes. Here we describe improvements in spectral interpretation algorithms, multiple separation techniques, sub-cellular fractionation and novel bioinformatics approaches to characterise more than 14,000 naturally occurring human genes.
Keywords: proteomics, genome, human, genes, protein isoforms, bioinformatics