Ubiquitination, a reversible protein post-translational modification (PTM), occurs when an amide bond is formed between ubiquitin (a small protein) and the targeted protein. It involves in a wide variety of cellular processes and is associated with various diseases such as Alzheimer’s disease. In order to understand ubiquitination at the molecular level, it is important to identify the ubiquitination site by which the ubiquitin binds to. Since experimental methods to determine ubiquitination sites are both expensive and time-consuming, it is necessary to develop in-silico methods to predict ubiquitination sites based on merely the sequential information of the target protein. In this paper, we apply a new classifier called weighted passive nearest neighbor algorithm (WPNNA) to predict the ubiquitination sites. WPNNA was demonstrated to be insensitive to the varied datum densities between different classes. A hybrid of features, including PSSM conservation scores, amino acid factors and disorder scores, are employed to code the protein fragments centered on the possible ubiquitination sites. The Mathew’s correlation coefficient (MCC) of our predictor on a training dataset is 0.169 with sensitivity of 31.6% and specificity of 82.9%, and on an independent test dataset is 0.403 with sensitivity of 64.3% and specificity of 75.7%. We compare our predictor with that of a recent published paper which also made predictions on the same datasets. Our predictor achieves much better sensitivities on both datasets than the paper and achieves much better MCC than the paper on the independent test dataset, indicating that the predictor based on WPNNA is as least a good complement to the current state of art in ubiquitination site prediction.
Keywords: Amino acid factors, disorder scores, PSSM conservation scores, post-translational modifications, weighted passive nearest neighbor algorithm, ubiquitination, ubiquitination site prediction, targeted protein, Alzheimer's disease, in-silico method