Given the huge number of sequences of otherwise uncharacterized protein sequences, computer-aided prediction of posttranslational modifications (PTMs) and translocation signals from amino acid sequence becomes a necessity. We have contributed to this multi-faceted, worldwide effort with the development of predictors for GPI lipid anchor sites, for N-terminal N-myristoylation sites, for farnesyl and geranylgeranyl anchor attachment as well as for the PTS1 peroxisomal signal. Although the substrate protein sequence signals for various PTMs or translocation systems vary dramatically, we found that their principal architecture is similar for all the cases studied. Typically, a small stretch of the amino acid residues is buried in the catalytic cleft of the protein-modifying enzyme (or the binding site of the transporter). This piece most intensely interacts with the enzyme and its sequence variability is most restricted. This stretch is surrounded by linker segments that connect the part bound by the enzyme with the rest of the substrate protein. These residues are, as a trend, small with a flexible backbone and polar. Due to the mechanistic requirements of binding to the enzyme, we suggest that most PTM sites are necessarily embedded into intrinsically disordered regions (except for cases of autocatalytic PTMs, PTMs executed in the unfolded state or non-enzymatic PTMs) and this issue requires consideration in structural studies of proteins with complex architecture. Surprisingly, some proteins carry sequence signals for posttranslational modification or translocation that remain hidden in the normal biological context but can become fully functional in certain conditions.
Keywords: Protein posttranslational modification, intrinsically disordered region, GPI lipid anchor, myristoylation, prenylation, phosphorylation, protease cleavage site, subcellular translocation signal