Biological data about genes, proteins and biologically relevant molecules that are stored in databases may be associated to biological information (knowledge) such as experiments, properties and functions, response to drugs etc. Such knowledge is formally structured into ontologies that provide the best formalize to organize and store knowledge. In the biological field, Gene Ontology (GO) provides both a categorization of annotating terms and a source of annotation for genes and proteins. Consequently it is possible to introduce novel methodologies of analysis that are based on the use of ontologies. Recently a growing interest has caputed semantic similarities, i.e. the calculation of the similarity of two or more proteins starting from their annotations. For instance semantic measures have been used for the prediction of protein complexes. Although the importance of these researches, some problems remain still unsolved: the assessment of semantic measures with respect to biological features as well as a deep study on the impact of the chosen measure in the obtained results. This paper focus on the use of semantic similarity measures into the protein complexes prediction pipeline. For these aims we investigated if there exists a bias among different measures as well as a higher value of semantic similarity within proteins that participate in the same complex. Results confirm that protein belonging to the same complex have a bigger average values of semantic similarity with respect to the average values of the proteomes. This confirm a possible use of semantic similarity measures within protein complexes prediction algorithms and a way to choose the best one among them.
Keywords: Ontologies, protein interaction networks, semantic similarity measures.