Protein & Peptide Letters

Author(s): Khader Shameer, Ganesan Pugalenthi, Krishna Kumar Kandaswamy and Ramanathan Sowdhamini

DOI: 10.2174/092986611796378729

3dswap-pred: Prediction of 3D Domain Swapping from Protein Sequence Using Random Forest Approach

Page: [1010 - 1020] Pages: 11

  • * (Excluding Mailing and Handling)

Abstract

3D domain swapping is a protein structural phenomenon that mediates the formation of the higher order oligomers in a variety of proteins with different structural and functional properties. 3D domain swapping is associated with a variety of biological functions ranging from oligomerization to pathological conformational diseases. 3D domain swapping is realised subsequent to structure determination where the protein is observed in the swapped conformation in the oligomeric state. This is a limiting step to understand this important structural phenomenon in a large scale from the growing sequence data. A new machine learning approach, 3dswap-pred, has been developed for the prediction of 3D domain swapping in protein structures from mere sequence data using the Random Forest approach. 3Dswap-pred is implemented using a positive sequence dataset derived from literature based structural curation of 297 structures. A negative sequence dataset is obtained from 462 SCOP domains using a new sequence data mining approach and a set of 126 sequencederived features. Statistical validation using an independent dataset of 68 positive sequences and 313 negative sequences revealed that 3dswap-pred achieved an accuracy of 63.8%. A webserver is also implemented using the 3dswap-pred Random Forest model. The server is available from the URL: http://caps.ncbs.res.in/3dswap-pred

Keywords: 3D domain swapping, hinge region, swapped region, machine learning, prediction algorithm, protein oligomer, random forest, Random Forest approach, COP domains, NMR, GPCR, DIAL, CD-HIT, AAINDEX, PSIPRED3D domain swapping, hinge region, swapped region, machine learning, prediction algorithm, protein oligomer, random forest, Random Forest approach, COP domains, NMR, GPCR, DIAL, CD-HIT, AAINDEX, PSIPRED