Protein insertion sequences and their biological role in many organisms have been largely unknown. Here we study proteomes of 12 organisms of diverse genomes for insertion length and amino acid preferences. A total of 871 common proteins were catalogued amongst the 12 organisms for structure based sequence alignment. This underscores the key observations: (i) AT-richness seems to have no implication on the average protein length in an organism as only Dictyostelium discoideum and Plasmodium falciparum encode proteins of high average length (ii) all studied organisms possess insertion in their proteins, however >40 residue length insertions and unique insertions were abundant in pathogen proteomes of Plasmodium falciparum followed by Toxoplasma gondii and Leishmania major, suggesting accessory structural and functional features that may favour evolutionary fitness. (iii) Glu and Asp residues are over-represented in most proteomes irrespective of AT/GC compositions or pathogenecity with an exception of Plasmodium falciparum where Asn dominates (iv) Abundance of Asn residues in Plasmodium falciparum is exceptional given that this feature is not common to other AT-rich genomes. In conclusion, this bioinformatics based study provides comprehensive knowledge of insertions and residue’s preference among pathogen proteins, which can be exploited for further inhibitor studies.
Keywords: Insertions, Leishmania major, malaria, pathogens, Plasmodium falciparum, proteome comparison, Toxoplasma gondii.