skip to main content
10.1145/3219819.3219849acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

PrePeP: A Tool for the Identification and Characterization of Pan Assay Interference Compounds

Published:19 July 2018Publication History

ABSTRACT

Pan Assays Interference Compounds (PAINS) are a significant problem in modern drug discovery: compounds showing non-target specific activity in high-throughput screening can mislead medicinal chemists during hit identification, wasting time and resources. Recent work has shown that existing structural alerts are not up to the task of identifying PAINS. To address this short-coming, we are in the process of developing a tool, PrePeP, that predicts PAINS, and allows experts to visually explore the reasons for the prediction. In the paper, we discuss the different aspects that are involved in developing a functional tool: systematically deriving structural descriptors, addressing the extreme imbalance of the data, offering visual information that pharmacological chemists are familiar with. We evaluate the quality of the approach using benchmark data sets from the literature and show that we correct several short-comings of existing PAINS alerts that have recently been pointed out.

Skip Supplemental Material Section

Supplemental Material

koptelov_prepep_tool.mp4

mp4

406.2 MB

References

  1. 2017. The Open Source Chemistry Toolbox. (dec 2017). https://openbabel.orgGoogle ScholarGoogle Scholar
  2. 2017. Public database of chemical molecules and their activities against biological assays. (2017). https://pubchem.ncbi.nlm.nih.gov/about.htmlGoogle ScholarGoogle Scholar
  3. 2017. RDKit: Open-Source Cheminformatics. (2017). http://www.rdkit.orgGoogle ScholarGoogle Scholar
  4. Courtney Aldrich, Carolyn Bertozzi, Gunda I. Georg, Laura Kiessling, Craig Lindsley, Dennis Liotta, Kenneth M. Merz Jr., Alanna Schepartz, and Shaomeng Wang. 2017. The ecstasy and agony of assay interference compounds. (2017).Google ScholarGoogle Scholar
  5. Pieter Swart et al. Aric Hagberg, Dan Schult. 2017. Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. (dec 2017). https://networkx.github.ioGoogle ScholarGoogle Scholar
  6. Jonathan Baell and Georgina Holloway. 2010. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. Journal of Medicinal Chemistry 53(7) (2010), 2719--2740.Google ScholarGoogle Scholar
  7. Jonathan B. Baell. 2016. Feeling nature's PAINS: Natural products, natural product drugs, and pan assay interference compounds (PAINS). Journal of natural products 79, 3 (2016), 616--628.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jonathan B. Baell and Walters, Michael A. 2015. Chemical con artists foil drug discover. Nature 7519, 513 (2015), 481--483.Google ScholarGoogle Scholar
  9. M. R. Berthold, N. Cebron, T. R. Dill, F. and Gabriel, T. Kötter, T. Meinl, C. Ohl, P.and Sieb, K. Thiel, and B. Wiswedel. 2007. KNIME: The Konstanz Information Miner. Springer, Chapter Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007).Google ScholarGoogle Scholar
  10. BIOVIA. 2017. BIOVIA Pipeline Pilot. (2017). http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/Google ScholarGoogle Scholar
  11. BIOVIA. 2017. CTfile Formats. (2017). http://accelrys.com/products/collaborative-science/biovia-draw/ctfile-no-fee.htmlGoogle ScholarGoogle Scholar
  12. N. Bosc, B. Wroblowski, C. Meyer, and P. Bonnet. 2017. Prediction of Protein Kinase-Ligand Interactions through 2.5D Kinochemometrics. J. Chem Inf Model. 57, 1 (2017), 93--101.Google ScholarGoogle ScholarCross RefCross Ref
  13. Leo Breiman. 1996. Bagging Predictors. Machine Learning 24, 2 (1996), 123--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen. 1984. Classification and Regression Trees. Chapman &Hall, New York. 358 pages.Google ScholarGoogle Scholar
  15. Björn Bringmann, Albrecht Zimmermann, Luc De Raedt, and Siegfried Nijssen. 2006. Don't Be Afraid of Simpler Patterns. In 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou (Eds.). Springer, 55--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matthieu Brucher. 2017. Library of Machine Learning tools in Python. (dec 2017). http://scikit-learn.orgGoogle ScholarGoogle Scholar
  17. Stephen J. Capuzzi, Eugene N. Muratov, and Alexander Tropsha. 2017. Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS. Journal of chemical information and modeling 57, 3 (2017), 417--427.Google ScholarGoogle ScholarCross RefCross Ref
  18. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. John G. Cumming, Andrew M. Davis, Sorel Muresan, Markus Haeberlein, and Hongming Chen. 2013. Chemical predictive modelling to improve compound quality. Nature reviews Drug discovery 12, 12 (2013), 948.Google ScholarGoogle Scholar
  20. Jayme L. Dahlin, J. Willem M. Nissink, Jessica M. Strasser, Subhashree Francis, LeeAnn Higgins, Hui Zhou, Zhiguo Zhang, and Michael A. Walters. 2015. PAINS in the assay: chemical mechanisms of assay interference and promiscuous enzymatic inhibition observed during a sulfhydryl-scavenging HTS. Journal of medicinal chemistry 58, 5 (2015), 2091--2113.Google ScholarGoogle ScholarCross RefCross Ref
  21. Richard Eglen, Terry Reisine, Philippe Roby, Nathalie Rouleau, Chantal Illy, Roger Bosse, and Martina Bielefeld. 2008. The Use of AlphaScreen Technology in HTS: Current Status. Journal of Current Chemical Genomics 1 (2008), 2--10.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. M. Gally, S. Bourg, Q. T. Do, S. Aci-Sèche, and P. Bonnet. 2017. VSPrep: A General KNIME Workflow for the Preparation of Molecules for Virtual Screening. Molecular Informatics 36 (2017).Google ScholarGoogle Scholar
  23. Rajarshi Guha. 2008. On the interpretation and interpretability of quantitative structure-activity relationship models. Journal of computer-aided molecular design 22, 12 (2008), 857--871.Google ScholarGoogle ScholarCross RefCross Ref
  24. Daylight Chemical Information Systems, Inc. {n. d.}. Simplified Molecular Input Line Entry System. ({n. d.}). http://www.daylight.com/smiles/index.htmlGoogle ScholarGoogle Scholar
  25. Akihiro Inokuchi and Takashi Washio. 2008. A fast method to mine frequent subsequences from graph sequence data. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 303--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Swarit Jasial, Ye Hu, and Jürgen Bajorath. 2017. How frequently are pan-assay interference compounds active? Large-scale analysis of screening data reveals diverse activity profiles, low global hit frequency, and many consistently inactive compounds. Journal of Medicinal Chemistry 60, 9 (2017), 3879--3886.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ulf Johansson, Cecilia Sönströd, Ulf Norinder, and Henrik Boström. 2011. Tradeoff between accuracy and interpretability for predictive in silico modeling. Future medicinal chemistry 3, 6 (2011), 647--663.Google ScholarGoogle Scholar
  28. Michihiro Kuramochi and George Karypis. 2001. Frequent subgraph discovery. In Data Mining, 2001. ICDM 2001, Proceedings IEEE international conference on. IEEE, 313--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sylvain Lozano, Guillaume Poezevara, Marie-Pierre Halm-Lemeille, Elodie Lescot-Fontaine, Alban Lepailleur, Ryan Bissell-Siders, Bruno Cremilleux, Sylvain Rault, Bertrand Cuissart, and Ronan Bureau. 2010. Introduction of jumping fragments in combination with QSARs for the assessment of classification in ecotoxicology. Journal of chemical information and modeling 50, 8 (2010), 1330--1339.Google ScholarGoogle ScholarCross RefCross Ref
  30. Thomas Mendgen, Christian Steuer, and Christian D. Klein. 2012. Privileged scaffolds or promiscuous binders: a comparative study on rhodanines and related heterocycles in medicinal chemistry. Journal of medicinal chemistry 55, 2 (2012), 743--753.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jean-Philippe Métivier, Alban Lepailleur, Aleksey Buzmakov, Guillaume Poezevara, Bruno Crémilleux, Sergei O. Kuznetsov, Jérémie Le Goff, Amedeo Napoli, Ronan Bureau, and Bertrand Cuissart. 2015. Discovering structural alerts for mutagenicity using stable emerging molecular patterns. Journal of chemical information and modeling 55, 5 (2015), 925--940.Google ScholarGoogle ScholarCross RefCross Ref
  32. Shinichi Morishita and Jun Sese. 2000. Transversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 226--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kathryn M. Nelson, Jayme L. Dahlin, Jonathan Bisson, James Graham, Guido F. Pauli, and Michael A. Walters. 2017. The essential medicinal chemistry of curcumin: miniperspective. Journal of medicinal chemistry 60, 5 (2017), 1620--1637.Google ScholarGoogle ScholarCross RefCross Ref
  34. Siegfried Nijssen and Joost Kok. 2006. Frequent subgraph miners: runtimes don't say everything. In Proceedings of the Workshop on Mining and Learning with Graphs. 173--180.Google ScholarGoogle Scholar
  35. Siegfried Nijssen and Joost N. Kok. 2004. A quickstart in frequent structure mining can make a difference. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 647--652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Martin Pouliot and Stephane Jeanmart. 2015. Pan Assay Interference Compounds (PAINS) and Other Promiscuous Compounds in Antifungal Research: Miniperspective. Journal of medicinal chemistry 59, 2 (2015), 497--503.Google ScholarGoogle ScholarCross RefCross Ref
  37. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sanjay Joshua Swamidass, Jonathan H. Chen, Jocelyne Bruand, Peter Phung, Liva Ralaivola, and Pierre Baldi. 2005. Kernels for small molecules and the prediction of mutagenicity,toxicity and anti-cancer activity. 359--368.Google ScholarGoogle Scholar
  40. Natasha Thorne, Douglas S. Auld, and James Inglese. 2010. Apparent activity in high-throughput screening: origins of compound-dependent assay interference. Current opinion in chemical biology 14, 3 (2010), 315--324.Google ScholarGoogle Scholar
  41. Tihomir Tomašič and Lucija Peterlin Mašič. 2012. Rhodanine as a scaffold in drug discovery: a critical review of its biological activities and mechanisms of target modulation. Expert opinion on drug discovery 7, 7 (2012), 549--560.Google ScholarGoogle Scholar
  42. David J. Wood, David Buttar, John G. Cumming, Andrew M. Davis, Ulf Norinder, and Sarah L. Rodgers. 2011. Automated QSAR with a hierarchy of global and local models. Molecular informatics 30, 11-12 (2011), 960--972.Google ScholarGoogle Scholar
  43. Marc Wörlein, Thorsten Meinl, Ingrid Fischer, and Michael Philippsen. 2005. A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 392--403.Google ScholarGoogle ScholarCross RefCross Ref
  44. Xifeng Yan and Jiawei Han. 2002. gSpan: Graph-Based Substructure Pattern Mining. In ICDM. IEEE Computer Society, 721--724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jeremy J. Yang, Oleg Ursu, Christopher A. Lipinski, Larry A. Sklar, Tudor I. Oprea, and Cristian G. Bologa. 2016. Badapple: promiscuity patterns from noisy evidence. Journal of cheminformatics 8, 1 (2016), 29.Google ScholarGoogle ScholarCross RefCross Ref
  46. Albrecht Zimmermann, Björn Bringmann, and Ulrich Rückert. 2010. Fast, Effective Molecular Feature Mining by Local Optimization. In ECML/PKDD (3) (Lecture Notes in Computer Science), José L. Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag (Eds.), Vol. 6323. Springer, 563--578. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PrePeP: A Tool for the Identification and Characterization of Pan Assay Interference Compounds

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          July 2018
          2925 pages
          ISBN:9781450355520
          DOI:10.1145/3219819

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 July 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader