research-article

PrePeP: A Tool for the Identification and Characterization of Pan Assay Interference Compounds

Authors:
Maksim Koptelov

Normandie Université, Caen, France

Normandie Université, Caen, France
View Profile

,
Albrecht Zimmermann

Normandie Université, Caen, France

Normandie Université, Caen, France
View Profile

,
Pascal Bonnet

ICOA/University of Orléans, Orléans, France

ICOA/University of Orléans, Orléans, France
View Profile

,
Ronan Bureau

CERMN/University of Caen Normandy, Caen, France

CERMN/University of Caen Normandy, Caen, France
View Profile

,
Bruno Crémilleux

Normandie Université, Caen, France

Normandie Université, Caen, France
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 462–471https://doi.org/10.1145/3219819.3219849

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 462–471

ABSTRACT

Pan Assays Interference Compounds (PAINS) are a significant problem in modern drug discovery: compounds showing non-target specific activity in high-throughput screening can mislead medicinal chemists during hit identification, wasting time and resources. Recent work has shown that existing structural alerts are not up to the task of identifying PAINS. To address this short-coming, we are in the process of developing a tool, PrePeP, that predicts PAINS, and allows experts to visually explore the reasons for the prediction. In the paper, we discuss the different aspects that are involved in developing a functional tool: systematically deriving structural descriptors, addressing the extreme imbalance of the data, offering visual information that pharmacological chemists are familiar with. We evaluate the quality of the approach using benchmark data sets from the literature and show that we correct several short-comings of existing PAINS alerts that have recently been pointed out.

Supplemental Material

koptelov_prepep_tool.mp4

mp4

406.2 MB

Download

References

2017. The Open Source Chemistry Toolbox. (dec 2017). https://openbabel.orgGoogle Scholar
2017. Public database of chemical molecules and their activities against biological assays. (2017). https://pubchem.ncbi.nlm.nih.gov/about.htmlGoogle Scholar
2017. RDKit: Open-Source Cheminformatics. (2017). http://www.rdkit.orgGoogle Scholar
Courtney Aldrich, Carolyn Bertozzi, Gunda I. Georg, Laura Kiessling, Craig Lindsley, Dennis Liotta, Kenneth M. Merz Jr., Alanna Schepartz, and Shaomeng Wang. 2017. The ecstasy and agony of assay interference compounds. (2017).Google Scholar
Pieter Swart et al. Aric Hagberg, Dan Schult. 2017. Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. (dec 2017). https://networkx.github.ioGoogle Scholar
Jonathan Baell and Georgina Holloway. 2010. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. Journal of Medicinal Chemistry 53(7) (2010), 2719--2740.Google Scholar
Jonathan B. Baell. 2016. Feeling nature's PAINS: Natural products, natural product drugs, and pan assay interference compounds (PAINS). Journal of natural products 79, 3 (2016), 616--628.Google ScholarCross Ref
Jonathan B. Baell and Walters, Michael A. 2015. Chemical con artists foil drug discover. Nature 7519, 513 (2015), 481--483.Google Scholar
M. R. Berthold, N. Cebron, T. R. Dill, F. and Gabriel, T. Kötter, T. Meinl, C. Ohl, P.and Sieb, K. Thiel, and B. Wiswedel. 2007. KNIME: The Konstanz Information Miner. Springer, Chapter Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007).Google Scholar
BIOVIA. 2017. BIOVIA Pipeline Pilot. (2017). http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/Google Scholar
BIOVIA. 2017. CTfile Formats. (2017). http://accelrys.com/products/collaborative-science/biovia-draw/ctfile-no-fee.htmlGoogle Scholar
N. Bosc, B. Wroblowski, C. Meyer, and P. Bonnet. 2017. Prediction of Protein Kinase-Ligand Interactions through 2.5D Kinochemometrics. J. Chem Inf Model. 57, 1 (2017), 93--101.Google ScholarCross Ref
Leo Breiman. 1996. Bagging Predictors. Machine Learning 24, 2 (1996), 123--140. Google ScholarDigital Library
Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen. 1984. Classification and Regression Trees. Chapman &Hall, New York. 358 pages.Google Scholar
Björn Bringmann, Albrecht Zimmermann, Luc De Raedt, and Siegfried Nijssen. 2006. Don't Be Afraid of Simpler Patterns. In 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou (Eds.). Springer, 55--66. Google ScholarDigital Library
Matthieu Brucher. 2017. Library of Machine Learning tools in Python. (dec 2017). http://scikit-learn.orgGoogle Scholar
Stephen J. Capuzzi, Eugene N. Muratov, and Alexander Tropsha. 2017. Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS. Journal of chemical information and modeling 57, 3 (2017), 417--427.Google ScholarCross Ref
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297. Google ScholarDigital Library
John G. Cumming, Andrew M. Davis, Sorel Muresan, Markus Haeberlein, and Hongming Chen. 2013. Chemical predictive modelling to improve compound quality. Nature reviews Drug discovery 12, 12 (2013), 948.Google Scholar
Jayme L. Dahlin, J. Willem M. Nissink, Jessica M. Strasser, Subhashree Francis, LeeAnn Higgins, Hui Zhou, Zhiguo Zhang, and Michael A. Walters. 2015. PAINS in the assay: chemical mechanisms of assay interference and promiscuous enzymatic inhibition observed during a sulfhydryl-scavenging HTS. Journal of medicinal chemistry 58, 5 (2015), 2091--2113.Google ScholarCross Ref
Richard Eglen, Terry Reisine, Philippe Roby, Nathalie Rouleau, Chantal Illy, Roger Bosse, and Martina Bielefeld. 2008. The Use of AlphaScreen Technology in HTS: Current Status. Journal of Current Chemical Genomics 1 (2008), 2--10.Google ScholarCross Ref
J. M. Gally, S. Bourg, Q. T. Do, S. Aci-Sèche, and P. Bonnet. 2017. VSPrep: A General KNIME Workflow for the Preparation of Molecules for Virtual Screening. Molecular Informatics 36 (2017).Google Scholar
Rajarshi Guha. 2008. On the interpretation and interpretability of quantitative structure-activity relationship models. Journal of computer-aided molecular design 22, 12 (2008), 857--871.Google ScholarCross Ref
Daylight Chemical Information Systems, Inc. {n. d.}. Simplified Molecular Input Line Entry System. ({n. d.}). http://www.daylight.com/smiles/index.htmlGoogle Scholar
Akihiro Inokuchi and Takashi Washio. 2008. A fast method to mine frequent subsequences from graph sequence data. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 303--312. Google ScholarDigital Library
Swarit Jasial, Ye Hu, and Jürgen Bajorath. 2017. How frequently are pan-assay interference compounds active? Large-scale analysis of screening data reveals diverse activity profiles, low global hit frequency, and many consistently inactive compounds. Journal of Medicinal Chemistry 60, 9 (2017), 3879--3886.Google ScholarCross Ref
Ulf Johansson, Cecilia Sönströd, Ulf Norinder, and Henrik Boström. 2011. Tradeoff between accuracy and interpretability for predictive in silico modeling. Future medicinal chemistry 3, 6 (2011), 647--663.Google Scholar
Michihiro Kuramochi and George Karypis. 2001. Frequent subgraph discovery. In Data Mining, 2001. ICDM 2001, Proceedings IEEE international conference on. IEEE, 313--320. Google ScholarDigital Library
Sylvain Lozano, Guillaume Poezevara, Marie-Pierre Halm-Lemeille, Elodie Lescot-Fontaine, Alban Lepailleur, Ryan Bissell-Siders, Bruno Cremilleux, Sylvain Rault, Bertrand Cuissart, and Ronan Bureau. 2010. Introduction of jumping fragments in combination with QSARs for the assessment of classification in ecotoxicology. Journal of chemical information and modeling 50, 8 (2010), 1330--1339.Google ScholarCross Ref
Thomas Mendgen, Christian Steuer, and Christian D. Klein. 2012. Privileged scaffolds or promiscuous binders: a comparative study on rhodanines and related heterocycles in medicinal chemistry. Journal of medicinal chemistry 55, 2 (2012), 743--753.Google ScholarCross Ref
Jean-Philippe Métivier, Alban Lepailleur, Aleksey Buzmakov, Guillaume Poezevara, Bruno Crémilleux, Sergei O. Kuznetsov, Jérémie Le Goff, Amedeo Napoli, Ronan Bureau, and Bertrand Cuissart. 2015. Discovering structural alerts for mutagenicity using stable emerging molecular patterns. Journal of chemical information and modeling 55, 5 (2015), 925--940.Google ScholarCross Ref
Shinichi Morishita and Jun Sese. 2000. Transversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 226--236. Google ScholarDigital Library
Kathryn M. Nelson, Jayme L. Dahlin, Jonathan Bisson, James Graham, Guido F. Pauli, and Michael A. Walters. 2017. The essential medicinal chemistry of curcumin: miniperspective. Journal of medicinal chemistry 60, 5 (2017), 1620--1637.Google ScholarCross Ref
Siegfried Nijssen and Joost Kok. 2006. Frequent subgraph miners: runtimes don't say everything. In Proceedings of the Workshop on Mining and Learning with Graphs. 173--180.Google Scholar
Siegfried Nijssen and Joost N. Kok. 2004. A quickstart in frequent structure mining can make a difference. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 647--652. Google ScholarDigital Library
Martin Pouliot and Stephane Jeanmart. 2015. Pan Assay Interference Compounds (PAINS) and Other Promiscuous Compounds in Antifungal Research: Miniperspective. Journal of medicinal chemistry 59, 2 (2015), 497--503.Google ScholarCross Ref
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144. Google ScholarDigital Library
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85--117. Google ScholarDigital Library
Sanjay Joshua Swamidass, Jonathan H. Chen, Jocelyne Bruand, Peter Phung, Liva Ralaivola, and Pierre Baldi. 2005. Kernels for small molecules and the prediction of mutagenicity,toxicity and anti-cancer activity. 359--368.Google Scholar
Natasha Thorne, Douglas S. Auld, and James Inglese. 2010. Apparent activity in high-throughput screening: origins of compound-dependent assay interference. Current opinion in chemical biology 14, 3 (2010), 315--324.Google Scholar
Tihomir Tomašič and Lucija Peterlin Mašič. 2012. Rhodanine as a scaffold in drug discovery: a critical review of its biological activities and mechanisms of target modulation. Expert opinion on drug discovery 7, 7 (2012), 549--560.Google Scholar
David J. Wood, David Buttar, John G. Cumming, Andrew M. Davis, Ulf Norinder, and Sarah L. Rodgers. 2011. Automated QSAR with a hierarchy of global and local models. Molecular informatics 30, 11-12 (2011), 960--972.Google Scholar
Marc Wörlein, Thorsten Meinl, Ingrid Fischer, and Michael Philippsen. 2005. A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 392--403.Google ScholarCross Ref
Xifeng Yan and Jiawei Han. 2002. gSpan: Graph-Based Substructure Pattern Mining. In ICDM. IEEE Computer Society, 721--724. Google ScholarDigital Library
Jeremy J. Yang, Oleg Ursu, Christopher A. Lipinski, Larry A. Sklar, Tudor I. Oprea, and Cristian G. Bologa. 2016. Badapple: promiscuity patterns from noisy evidence. Journal of cheminformatics 8, 1 (2016), 29.Google ScholarCross Ref
Albrecht Zimmermann, Björn Bringmann, and Ulrich Rückert. 2010. Fast, Effective Molecular Feature Mining by Local Optimization. In ECML/PKDD (3) (Lecture Notes in Computer Science), José L. Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag (Eds.), Vol. 6323. Springer, 563--578. Google ScholarDigital Library

Index Terms

PrePeP: A Tool for the Identification and Characterization of Pan Assay Interference Compounds
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Structure and multilingual text search
        Chemical and biochemical retrieval
  2. Information systems applications
    1. Decision support systems
      1. Data analytics
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Exploratory data analysis

Recommendations

In silico identification of natural products with anticancer activity using a chemo-structural database of Brazilian biodiversity
Graphical abstract

Display Omitted
Highlights
- The predicted natural products with anticancer activity are widely distributed in 46 families and have at least 19 different molecular targets involved in ...
Abstract
Cancer is one of the leading causes of death worldwide, and the number of patients has only increased each year, despite the considerable efforts and investments in scientific research. Since natural products (NPs) may serve as ...
Read More
MMsINC®: A New Public Large-Scale Chemoinformatics Database System
BIOTECHNO '08: Proceedings of the 2008 International Conference on Biocomputation, Bioinformatics, and Biomedical Technologies

MMsINC is a free web-oriented database of commercially available compounds for virtual screening and chemoinformatic applications. MMsINC contains over 4 million non-redundant chemical compounds in 3D format. The whole database was studied in term of ...
Read More
Using Chemoinformatics and Rough Set Rule Induction for HIV Drug Discovery
ICMLC '10: Proceedings of the 2010 Second International Conference on Machine Learning and Computing

This paper presents a computational approach to HIV Drug discovery using rough set based rule induction. Since conventional drug discovery is a time consuming process in which drugs are discovered either by chance or by screening the natural products, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chemoinformatics
discriminative graph mining
structure activity relationships
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 516
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PrePeP: A Tool for the Identification and Characterization of Pan Assay Interference Compounds

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

In silico identification of natural products with anticancer activity using a chemo-structural database of Brazilian biodiversity

MMsINC®: A New Public Large-Scale Chemoinformatics Database System

Using Chemoinformatics and Rough Set Rule Induction for HIV Drug Discovery

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PrePeP: A Tool for the Identification and Characterization of Pan Assay Interference Compounds

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

In silico identification of natural products with anticancer activity using a chemo-structural database of Brazilian biodiversity

MMsINC®: A New Public Large-Scale Chemoinformatics Database System

Using Chemoinformatics and Rough Set Rule Induction for HIV Drug Discovery

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media