Comparative mapping of sequence-based and structure-based protein domains

BackgroundProtein domains have long been an ill-defined concept in biology. They are generally described as autonomous folding units with evolutionary and functional independence. Both structure-based and sequence-based domain definitions have been widely used. But whether these types of models alone can capture all essential features of domains is still an open question.MethodsHere we provide insight on domain definitions through comparative mapping of two domain classification databases, one sequence-based (Pfam) and the other structure-based (SCOP). A mapping score is defined to indicate the significance of the mapping, and the properties of the mapping matrices are studied.ResultsThe mapping results show a general agreement between the two databases, as well as many interesting areas of disagreement. In the cases of disagreement, the functional and evolutionary characteristics of the domains are examined to determine which domain definition is biologically more informative.

[1]  M. Uhlén,et al.  Protein engineering of an IgG-binding domain allows milder elution conditions during affinity chromatography. , 2000, Journal of biotechnology.

[2]  S Gülich Protein engineering of an IgG-binding domain allows milder elution conditions during affinity chromatography , 2000 .

[3]  See-Kiong Ng,et al.  ADVICE: Automated Detection and Validation of Interaction by Co-Evolution , 2004, Nucleic Acids Res..

[4]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[5]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[6]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[7]  Satoshi Murakami,et al.  Crystal structure of bacterial multidrug efflux transporter AcrB , 2002, Nature.

[8]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[9]  W. CHANDLER ROBERTS,et al.  The Diffusion of Liquids , 1879, Nature.

[10]  R. Jaenicke,et al.  Folding and association of proteins. , 1982, Biophysics of structure and mechanism.

[11]  Arne Elofsson,et al.  A comparison of sequence and structure protein domain families as a basis for structural genomics , 1999, Bioinform..

[12]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[13]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[14]  Tim Hubbard,et al.  Domain insertions in protein structures. , 2004, Journal of molecular biology.

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[17]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[18]  James E. Bray,et al.  Assigning genomic sequences to CATH , 2000, Nucleic Acids Res..

[19]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[20]  Neil D. Rawlings,et al.  A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised. , 2003, BMC Bioinformatics.

[21]  Alfonso Valencia,et al.  Automatic annotation of protein function based on family identification , 2003, Proteins.

[22]  Jérôme Gouzy,et al.  ProDom: Automated Clustering of Homologous Domains , 2002, Briefings Bioinform..

[23]  C. Sander,et al.  Parser for protein folding units , 1994, Proteins.

[24]  J M Thornton,et al.  Small-molecule metabolism: an enzyme mosaic. , 2001, Trends in biotechnology.