ABSTRACT
Pan Assays Interference Compounds (PAINS) are a significant problem in modern drug discovery: compounds showing non-target specific activity in high-throughput screening can mislead medicinal chemists during hit identification, wasting time and resources. Recent work has shown that existing structural alerts are not up to the task of identifying PAINS. To address this short-coming, we are in the process of developing a tool, PrePeP, that predicts PAINS, and allows experts to visually explore the reasons for the prediction. In the paper, we discuss the different aspects that are involved in developing a functional tool: systematically deriving structural descriptors, addressing the extreme imbalance of the data, offering visual information that pharmacological chemists are familiar with. We evaluate the quality of the approach using benchmark data sets from the literature and show that we correct several short-comings of existing PAINS alerts that have recently been pointed out.
Supplemental Material
- 2017. The Open Source Chemistry Toolbox. (dec 2017). https://openbabel.orgGoogle Scholar
- 2017. Public database of chemical molecules and their activities against biological assays. (2017). https://pubchem.ncbi.nlm.nih.gov/about.htmlGoogle Scholar
- 2017. RDKit: Open-Source Cheminformatics. (2017). http://www.rdkit.orgGoogle Scholar
- Courtney Aldrich, Carolyn Bertozzi, Gunda I. Georg, Laura Kiessling, Craig Lindsley, Dennis Liotta, Kenneth M. Merz Jr., Alanna Schepartz, and Shaomeng Wang. 2017. The ecstasy and agony of assay interference compounds. (2017).Google Scholar
- Pieter Swart et al. Aric Hagberg, Dan Schult. 2017. Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. (dec 2017). https://networkx.github.ioGoogle Scholar
- Jonathan Baell and Georgina Holloway. 2010. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. Journal of Medicinal Chemistry 53(7) (2010), 2719--2740.Google Scholar
- Jonathan B. Baell. 2016. Feeling nature's PAINS: Natural products, natural product drugs, and pan assay interference compounds (PAINS). Journal of natural products 79, 3 (2016), 616--628.Google ScholarCross Ref
- Jonathan B. Baell and Walters, Michael A. 2015. Chemical con artists foil drug discover. Nature 7519, 513 (2015), 481--483.Google Scholar
- M. R. Berthold, N. Cebron, T. R. Dill, F. and Gabriel, T. Kötter, T. Meinl, C. Ohl, P.and Sieb, K. Thiel, and B. Wiswedel. 2007. KNIME: The Konstanz Information Miner. Springer, Chapter Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007).Google Scholar
- BIOVIA. 2017. BIOVIA Pipeline Pilot. (2017). http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/Google Scholar
- BIOVIA. 2017. CTfile Formats. (2017). http://accelrys.com/products/collaborative-science/biovia-draw/ctfile-no-fee.htmlGoogle Scholar
- N. Bosc, B. Wroblowski, C. Meyer, and P. Bonnet. 2017. Prediction of Protein Kinase-Ligand Interactions through 2.5D Kinochemometrics. J. Chem Inf Model. 57, 1 (2017), 93--101.Google ScholarCross Ref
- Leo Breiman. 1996. Bagging Predictors. Machine Learning 24, 2 (1996), 123--140. Google ScholarDigital Library
- Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen. 1984. Classification and Regression Trees. Chapman &Hall, New York. 358 pages.Google Scholar
- Björn Bringmann, Albrecht Zimmermann, Luc De Raedt, and Siegfried Nijssen. 2006. Don't Be Afraid of Simpler Patterns. In 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou (Eds.). Springer, 55--66. Google ScholarDigital Library
- Matthieu Brucher. 2017. Library of Machine Learning tools in Python. (dec 2017). http://scikit-learn.orgGoogle Scholar
- Stephen J. Capuzzi, Eugene N. Muratov, and Alexander Tropsha. 2017. Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS. Journal of chemical information and modeling 57, 3 (2017), 417--427.Google ScholarCross Ref
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297. Google ScholarDigital Library
- John G. Cumming, Andrew M. Davis, Sorel Muresan, Markus Haeberlein, and Hongming Chen. 2013. Chemical predictive modelling to improve compound quality. Nature reviews Drug discovery 12, 12 (2013), 948.Google Scholar
- Jayme L. Dahlin, J. Willem M. Nissink, Jessica M. Strasser, Subhashree Francis, LeeAnn Higgins, Hui Zhou, Zhiguo Zhang, and Michael A. Walters. 2015. PAINS in the assay: chemical mechanisms of assay interference and promiscuous enzymatic inhibition observed during a sulfhydryl-scavenging HTS. Journal of medicinal chemistry 58, 5 (2015), 2091--2113.Google ScholarCross Ref
- Richard Eglen, Terry Reisine, Philippe Roby, Nathalie Rouleau, Chantal Illy, Roger Bosse, and Martina Bielefeld. 2008. The Use of AlphaScreen Technology in HTS: Current Status. Journal of Current Chemical Genomics 1 (2008), 2--10.Google ScholarCross Ref
- J. M. Gally, S. Bourg, Q. T. Do, S. Aci-Sèche, and P. Bonnet. 2017. VSPrep: A General KNIME Workflow for the Preparation of Molecules for Virtual Screening. Molecular Informatics 36 (2017).Google Scholar
- Rajarshi Guha. 2008. On the interpretation and interpretability of quantitative structure-activity relationship models. Journal of computer-aided molecular design 22, 12 (2008), 857--871.Google ScholarCross Ref
- Daylight Chemical Information Systems, Inc. {n. d.}. Simplified Molecular Input Line Entry System. ({n. d.}). http://www.daylight.com/smiles/index.htmlGoogle Scholar
- Akihiro Inokuchi and Takashi Washio. 2008. A fast method to mine frequent subsequences from graph sequence data. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 303--312. Google ScholarDigital Library
- Swarit Jasial, Ye Hu, and Jürgen Bajorath. 2017. How frequently are pan-assay interference compounds active? Large-scale analysis of screening data reveals diverse activity profiles, low global hit frequency, and many consistently inactive compounds. Journal of Medicinal Chemistry 60, 9 (2017), 3879--3886.Google ScholarCross Ref
- Ulf Johansson, Cecilia Sönströd, Ulf Norinder, and Henrik Boström. 2011. Tradeoff between accuracy and interpretability for predictive in silico modeling. Future medicinal chemistry 3, 6 (2011), 647--663.Google Scholar
- Michihiro Kuramochi and George Karypis. 2001. Frequent subgraph discovery. In Data Mining, 2001. ICDM 2001, Proceedings IEEE international conference on. IEEE, 313--320. Google ScholarDigital Library
- Sylvain Lozano, Guillaume Poezevara, Marie-Pierre Halm-Lemeille, Elodie Lescot-Fontaine, Alban Lepailleur, Ryan Bissell-Siders, Bruno Cremilleux, Sylvain Rault, Bertrand Cuissart, and Ronan Bureau. 2010. Introduction of jumping fragments in combination with QSARs for the assessment of classification in ecotoxicology. Journal of chemical information and modeling 50, 8 (2010), 1330--1339.Google ScholarCross Ref
- Thomas Mendgen, Christian Steuer, and Christian D. Klein. 2012. Privileged scaffolds or promiscuous binders: a comparative study on rhodanines and related heterocycles in medicinal chemistry. Journal of medicinal chemistry 55, 2 (2012), 743--753.Google ScholarCross Ref
- Jean-Philippe Métivier, Alban Lepailleur, Aleksey Buzmakov, Guillaume Poezevara, Bruno Crémilleux, Sergei O. Kuznetsov, Jérémie Le Goff, Amedeo Napoli, Ronan Bureau, and Bertrand Cuissart. 2015. Discovering structural alerts for mutagenicity using stable emerging molecular patterns. Journal of chemical information and modeling 55, 5 (2015), 925--940.Google ScholarCross Ref
- Shinichi Morishita and Jun Sese. 2000. Transversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 226--236. Google ScholarDigital Library
- Kathryn M. Nelson, Jayme L. Dahlin, Jonathan Bisson, James Graham, Guido F. Pauli, and Michael A. Walters. 2017. The essential medicinal chemistry of curcumin: miniperspective. Journal of medicinal chemistry 60, 5 (2017), 1620--1637.Google ScholarCross Ref
- Siegfried Nijssen and Joost Kok. 2006. Frequent subgraph miners: runtimes don't say everything. In Proceedings of the Workshop on Mining and Learning with Graphs. 173--180.Google Scholar
- Siegfried Nijssen and Joost N. Kok. 2004. A quickstart in frequent structure mining can make a difference. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 647--652. Google ScholarDigital Library
- Martin Pouliot and Stephane Jeanmart. 2015. Pan Assay Interference Compounds (PAINS) and Other Promiscuous Compounds in Antifungal Research: Miniperspective. Journal of medicinal chemistry 59, 2 (2015), 497--503.Google ScholarCross Ref
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144. Google ScholarDigital Library
- Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85--117. Google ScholarDigital Library
- Sanjay Joshua Swamidass, Jonathan H. Chen, Jocelyne Bruand, Peter Phung, Liva Ralaivola, and Pierre Baldi. 2005. Kernels for small molecules and the prediction of mutagenicity,toxicity and anti-cancer activity. 359--368.Google Scholar
- Natasha Thorne, Douglas S. Auld, and James Inglese. 2010. Apparent activity in high-throughput screening: origins of compound-dependent assay interference. Current opinion in chemical biology 14, 3 (2010), 315--324.Google Scholar
- Tihomir Tomašič and Lucija Peterlin Mašič. 2012. Rhodanine as a scaffold in drug discovery: a critical review of its biological activities and mechanisms of target modulation. Expert opinion on drug discovery 7, 7 (2012), 549--560.Google Scholar
- David J. Wood, David Buttar, John G. Cumming, Andrew M. Davis, Ulf Norinder, and Sarah L. Rodgers. 2011. Automated QSAR with a hierarchy of global and local models. Molecular informatics 30, 11-12 (2011), 960--972.Google Scholar
- Marc Wörlein, Thorsten Meinl, Ingrid Fischer, and Michael Philippsen. 2005. A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 392--403.Google ScholarCross Ref
- Xifeng Yan and Jiawei Han. 2002. gSpan: Graph-Based Substructure Pattern Mining. In ICDM. IEEE Computer Society, 721--724. Google ScholarDigital Library
- Jeremy J. Yang, Oleg Ursu, Christopher A. Lipinski, Larry A. Sklar, Tudor I. Oprea, and Cristian G. Bologa. 2016. Badapple: promiscuity patterns from noisy evidence. Journal of cheminformatics 8, 1 (2016), 29.Google ScholarCross Ref
- Albrecht Zimmermann, Björn Bringmann, and Ulrich Rückert. 2010. Fast, Effective Molecular Feature Mining by Local Optimization. In ECML/PKDD (3) (Lecture Notes in Computer Science), José L. Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag (Eds.), Vol. 6323. Springer, 563--578. Google ScholarDigital Library
Index Terms
- PrePeP: A Tool for the Identification and Characterization of Pan Assay Interference Compounds
Recommendations
In silico identification of natural products with anticancer activity using a chemo-structural database of Brazilian biodiversity
Graphical abstractDisplay Omitted
Highlights- The predicted natural products with anticancer activity are widely distributed in 46 families and have at least 19 different molecular targets involved in ...
AbstractCancer is one of the leading causes of death worldwide, and the number of patients has only increased each year, despite the considerable efforts and investments in scientific research. Since natural products (NPs) may serve as ...
MMsINC®: A New Public Large-Scale Chemoinformatics Database System
BIOTECHNO '08: Proceedings of the 2008 International Conference on Biocomputation, Bioinformatics, and Biomedical TechnologiesMMsINC is a free web-oriented database of commercially available compounds for virtual screening and chemoinformatic applications. MMsINC contains over 4 million non-redundant chemical compounds in 3D format. The whole database was studied in term of ...
Using Chemoinformatics and Rough Set Rule Induction for HIV Drug Discovery
ICMLC '10: Proceedings of the 2010 Second International Conference on Machine Learning and ComputingThis paper presents a computational approach to HIV Drug discovery using rough set based rule induction. Since conventional drug discovery is a time consuming process in which drugs are discovered either by chance or by screening the natural products, ...
Comments