Quantifying the accessibility of the metagenome by random expression cloning techniques.

The exploitation of the metagenome for novel biocatalysts by functional screening is determined by the ability to express the respective genes in a surrogate host. The probability of recovering a certain gene thereby depends on its abundance in the environmental DNA used for library construction, the chosen insert size, the length of the target gene, and the presence of expression signals that are functional in the host organism. In this paper, we present a set of formulas that describe the chance of isolating a gene by random expression cloning, taking into account the three different modes of heterologous gene expression: independent expression, expression as a transcriptional fusion and expression as a translational fusion. Genes of the last category are shown to be virtually inaccessible by shotgun cloning because of the low frequency of functional constructs. To evaluate which part of the metagenome might in this way evade exploitation, 32 complete genome sequences of prokaryotic organisms were analysed for the presence of expression signals functional in E. coli hosts, using bioinformatics tools. Our study reveals significant differences in the predicted expression modes between distinct taxonomic groups of organisms and suggests that about 40% of the enzymatic activities may be readily recovered by random cloning in E. coli.

[1]  G. Stormo,et al.  Translational initiation in prokaryotes. , 1981, Annual review of microbiology.

[2]  C. Schleper,et al.  Metagenome—a challenging source of enzyme discovery , 2002 .

[3]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[4]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[5]  S. Salzberg,et al.  Prediction of transcription terminators in bacterial genomes. , 2000, Journal of molecular biology.

[6]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[7]  T. Donohue,et al.  Molecular phylogeny of Archaea from soil. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[8]  John M Woodley,et al.  The search for the ideal biocatalyst , 2002, Nature Biotechnology.

[9]  H. Margalit,et al.  Compilation of E. coli mRNA promoter sequences. , 1993, Nucleic acids research.

[10]  L. Øvreås,et al.  Microbial diversity and function in soil: from genes to ecosystems. , 2002, Current opinion in microbiology.

[11]  J. Hughes,et al.  New approaches to analyzing microbial biodiversity data. , 2003, Current opinion in microbiology.

[12]  D. Cowan Microbial genomes--the untapped resource. , 2000, Trends in biotechnology.

[13]  M. Tomita,et al.  Analysis of complete genomes suggests that many prokaryotes do not rely on hairpin formation in transcription termination. , 1998, Nucleic acids research.

[14]  Dick B Janssen,et al.  Efficient recovery of environmental DNA for expression cloning by indirect extraction methods. , 2003, FEMS microbiology ecology.

[15]  James W. Brown,et al.  Gene structure, organization, and expression in archaebacteria. , 1989, Critical reviews in microbiology.

[16]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[17]  M. Gilman,et al.  Expression and isolation of antimicrobial small molecules from soil DNA libraries. , 2001, Journal of Molecular Microbiology and Biotechnology.

[18]  T. D. Schneider,et al.  Quantitative analysis of the relationship between nucleotide sequence and functional activity. , 1986, Nucleic acids research.

[19]  Wyeth W. Wasserman,et al.  TFBS: Computational framework for transcription factor binding site analysis , 2002, Bioinform..