Rchemcpp: a web service for structural analoging in ChEMBL, Drugbank and the Connectivity Map

UNLABELLED We have developed Rchempp, a web service that identifies structurally similar compounds (structural analogs) in large-scale molecule databases. The service allows compounds to be queried in the widely used ChEMBL, DrugBank and the Connectivity Map databases. Rchemcpp utilizes the best performing similarity functions, i.e. molecule kernels, as measures for structural similarity. Molecule kernels have proven superior performance over other similarity measures and are currently excelling at machine learning challenges. To considerably reduce computational time, and thereby make it feasible as a web service, a novel efficient prefiltering strategy has been developed, which maintains the sensitivity of the method. By exploiting information contained in public databases, the web service facilitates many applications crucial for the drug development process, such as prioritizing compounds after screening or reducing adverse side effects during late phases. Rchemcpp was used in the DeepTox pipeline that has won the Tox21 Data Challenge and is frequently used by researchers in pharmaceutical companies. AVAILABILITY AND IMPLEMENTATION The web service and the R package are freely available via http://shiny.bioinf.jku.at/Analoging/ and via Bioconductor. CONTACT hochreit@bioinf.jku.at SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  H. Kashima,et al.  Kernels for graphs , 2004 .

[2]  Tatsuya Akutsu,et al.  Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines , 2005, J. Chem. Inf. Model..

[3]  P. Jeffrey Conn,et al.  A Close Structural Analog of 2-Methyl-6-(phenylethynyl)-pyridine Acts as a Neutral Allosteric Site Ligand on Metabotropic Glutamate Receptor Subtype 5 and Blocks the Effects of Multiple Allosteric Modulators , 2005, Molecular Pharmacology.

[4]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[5]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[6]  Klaus Obermayer,et al.  A new summarization method for affymetrix probe level data , 2006, Bioinform..

[7]  Hinrich W. H. Göhlmann,et al.  I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data , 2007, Bioinform..

[8]  Jean-Philippe Vert,et al.  Graph kernels based on tree patterns for molecules , 2006, Machine Learning.

[9]  Donglei Zhang,et al.  Structural Analog of Sildenafil Identified as a Novel Corrector of the F508del-CFTR Trafficking Defect , 2008, Molecular Pharmacology.

[10]  Klaus Obermayer,et al.  A Maximum Common Subgraph Kernel Method for Predicting the Chromosome Aberration Test , 2010, J. Chem. Inf. Model..

[11]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[12]  Adetayo Kasim,et al.  Filtering data from high-throughput experiments based on measurement reliability , 2010, Proceedings of the National Academy of Sciences.

[13]  Willem Talloen,et al.  cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate , 2011, Nucleic acids research.

[14]  J. Arrowsmith Trial watch: Phase III and submission failures: 2007–2010 , 2011, Nature Reviews Drug Discovery.

[15]  S. Hochreiter,et al.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate , 2012, Nucleic acids research.

[16]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[17]  J. Scannell,et al.  Diagnosing the decline in pharmaceutical R&D efficiency , 2012, Nature Reviews Drug Discovery.

[18]  S. Hochreiter,et al.  DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions , 2013, Nucleic acids research.

[19]  S. Hochreiter,et al.  HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data , 2013, Nucleic acids research.

[20]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[21]  Bie M. P. Verbist,et al.  Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project. , 2015, Drug discovery today.