Background: The sequence Arginine-Glycine-Aspartic acid (RGD tripeptide) has been identified in most proteins implicated in cell adhesion and signal transduction. Moreover, the RGD paradigm extends to the plant and microbial kingdoms. Investigating this field can be facilitated by combining data from multiple databases into a single one. The RGD tripeptide database is a comprehensive resource with records including general annotation, ontology, database cross-references, sequence and structure data.
Objective: In this work, we present the integration of a novel visualization tool within the RGDtrip 1.0 version data collection and retrieval environment for proteins containing the RGD tripeptide. This approach allows state-of-the-art data querying combined with an advanced, user-friendly visualization environment.
Method: The overall system architecture is based on a three-tier client-server model, thus comprising three main components: the client application, the application server and the database server. The underlying structure of RGDtrip is a relational database developed with Microsoft SQL Server. All the data compiled in RGDtrip were originally scattered in other data bases, such as UNIProt, PDBdb, etc. has been incorporated into a visualization tool based on the Microsoft’s PivotViewer software. The tool enables users to see data under many different perspectives and thus to gain a better aspect and understanding of them.
Results: The RGDtrip database may be used for the investigation of proteins containing the RGD tripeptide and the shaping of meaningful conclusions regarding, among other things, evolution, phylogenesis and pharmacological interactions with disease- implicated entities and possible loci of side-effects. The RGDtrip database offers the following main advantages: (i) a collection of about 32,000 proteins containing the RGD tripeptide in just one database and through a unique user interface; (ii) the utilization of state-of-the-art technologies to deliver new data querying and visualization tools for scientists, thus allowing Visual Data Mining, for both basic and applied research on the above mentioned proteins.
Conclusion: This paper describes the integration of existing information with advanced visualization and querying tools, in a dedicated database to implement Visual Data Mining, for basic and applied research on RGD-containing proteins.
Keywords: RGD tripeptide, querying tools, visualization tools, data mining, cell adhesion, signal transduction.