Function prediction by sequence-similarity based methods identifies only ∼50% of the proteins deduced from newly sequenced genomes. We have developed an approach to annotate the ‘leftover proteins’ i.e., those which cannot be assigned function using sequence similarity. Our method (MOPS) is pan-taxonomic, predicting fine-grained molecular function (rather than a broad functional category) with high performance. In addition, we developed a validation scheme that assesses predictions using domain-specific knowledge.
Keywords: Ab initio protein function prediction, sequence-similarity-free, machine learning, decision trees, mitochondrionencoded hypothetical proteins, domain-specific validation