skip to main content
10.1145/2623330.2623667acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Automated hypothesis generation based on mining scientific literature

Authors Info & Claims
Published:24 August 2014Publication History

ABSTRACT

Keeping up with the ever-expanding flow of data and publications is untenable and poses a fundamental bottleneck to scientific progress. Current search technologies typically find many relevant documents, but they do not extract and organize the information content of these documents or suggest new scientific hypotheses based on this organized content. We present an initial case study on KnIT, a prototype system that mines the information contained in the scientific literature, represents it explicitly in a queriable network, and then further reasons upon these data to generate novel and experimentally testable hypotheses. KnIT combines entity detection with neighbor-text feature analysis and with graph-based diffusion of information to identify potential new properties of entities that are strongly implied by existing relationships. We discuss a successful application of our approach that mines the published literature to identify new protein kinases that phosphorylate the protein tumor suppressor p53. Retrospective analysis demonstrates the accuracy of this approach and ongoing laboratory experiments suggest that kinases identified by our system may indeed phosphorylate p53. These results establish proof of principle for automated hypothesis generation and discovery based on text mining of the scientific literature.

Skip Supplemental Material Section

Supplemental Material

p1877-sidebyside.mp4

mp4

281.4 MB

References

  1. ALTSCHUL, S.F., GISH, W., MILLER, W., MYERS, E.W., and LIPMAN, D.J., 1990. Basic local alignment search tool. J Mol Biol 215, 3 (Oct 5), 403--410. DOI= http://dx.doi.org/10.1016/S0022--2836(05)80360--2.Google ScholarGoogle ScholarCross RefCross Ref
  2. ASHBURNER, M., BALL, C.A., BLAKE, J.A., BOTSTEIN, D., BUTLER, H., CHERRY, J.M., DAVIS, A.P., DOLINSKI, K., DWIGHT, S.S., EPPIG, J.T., HARRIS, M.A., HILL, D.P., ISSEL-TARVER, L., KASARSKIS, A., LEWIS, S., MATESE, J.C., RICHARDSON, J.E., RINGWALD, M., RUBIN, G.M., and SHERLOCK, G., 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 1 (May), 25--29. DOI= http://dx.doi.org/10.1038/75556.Google ScholarGoogle ScholarCross RefCross Ref
  3. BELKIN, M., MATVEEVA, I., and NIYOGI, P., 2004. Regularization and Semi-supervised Learning on Large Graphs. In Learning Theory, J. SHAWE-TAYLOR and Y. SINGER Eds. Springer Berlin Heidelberg, 624--638. DOI= http://dx.doi.org/10.1007/978--3--540--27819--1_43.Google ScholarGoogle Scholar
  4. BJÖRK, B.-C., ROOSR, A., and LAURI, M., Global annual volume of peer reviewed scholarly articles and the share available via different open access options. In Sustainability in the Age of Web 2.0 - Proceedings of the 12th International Conference on Electronic Publishing, Toronto, Canada.Google ScholarGoogle Scholar
  5. CHUNG, F.R.K., 1997. Spectral Graph Theory American Mathematical Society.Google ScholarGoogle Scholar
  6. COORDINATORS, N.R., 2014. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 42, 1 (Jan 1), D7-D17. DOI= http://dx.doi.org/10.1093/nar/gkt1146.Google ScholarGoogle Scholar
  7. DA COSTA, C.A., SUNYACH, C., GIAIME, E., WEST, A., CORTI, O., BRICE, A., SAFE, S., ABOU-SLEIMAN, P.M., WOOD, N.W., TAKAHASHI, H., GOLDBERG, M.S., SHEN, J., and CHECLER, F., 2009. Transcriptional repression of p53 by parkin and impairment by mutations associated with autosomal recessive juvenile Parkinson's disease. Nat Cell Biol 11, 11 (Nov), 1370--1375. DOI= http://dx.doi.org/10.1038/ncb1981.Google ScholarGoogle ScholarCross RefCross Ref
  8. DAI, C. and GU, W., 2010. p53 post-translational modification: deregulated in tumorigenesis. Trends Mol Med 16, 11 (Nov), 528--536. DOI= http://dx.doi.org/10.1016/j.molmed.2010.09.002.Google ScholarGoogle ScholarCross RefCross Ref
  9. DERDAK, Z., VILLEGAS, K.A., HARB, R., WU, A.M., SOUSA, A., and WANDS, J.R., 2013. Inhibition of p53 attenuates steatosis and liver injury in a mouse model of non-alcoholic fatty liver disease. J Hepatol 58, 4 (Apr), 785--791. DOI= http://dx.doi.org/10.1016/j.jhep.2012.11.042.Google ScholarGoogle ScholarCross RefCross Ref
  10. GOH, K.I., CUSICK, M.E., VALLE, D., CHILDS, B., VIDAL, M., and BARABASI, A.L., 2007. The human disease network. Proc Natl Acad Sci U S A 104, 21 (May 22), 8685--8690. DOI= http://dx.doi.org/10.1073/pnas.0701361104.Google ScholarGoogle ScholarCross RefCross Ref
  11. GRAY, K.A., DAUGHERTY, L.C., GORDON, S.M., SEAL, R.L., WRIGHT, M.W., and BRUFORD, E.A., 2013. Genenames.org: the HGNC resources in 2013. Nucleic Acids Res 41, Database issue (Jan), D545--552. DOI= http://dx.doi.org/10.1093/nar/gks1066.Google ScholarGoogle Scholar
  12. GU, B. and ZHU, W.G., 2012. Surf the post-translational modification network of p53 regulation. Int J Biol Sci 8, 5, 672--684. DOI= http://dx.doi.org/10.7150/ijbs.4283.Google ScholarGoogle ScholarCross RefCross Ref
  13. HAGER, K.M. and GU, W., 2014. Understanding the non-canonical pathways involved in p53-mediated tumor suppression. Carcinogenesis(Feb 3). DOI= http://dx.doi.org/10.1093/carcin/bgt487.Google ScholarGoogle Scholar
  14. HORNBECK, P.V., KORNHAUSER, J.M., TKACHEV, S., ZHANG, B., SKRZYPEK, E., MURRAY, B., LATHAM, V., and SULLIVAN, M., 2012. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 40, Database issue (Jan), D261--270. DOI= http://dx.doi.org/10.1093/nar/gkr1122.Google ScholarGoogle Scholar
  15. JENKINS, L.M., DURELL, S.R., MAZUR, S.J., and APPELLA, E., 2012. p53 N-terminal phosphorylation: a defining layer of complex regulation. Carcinogenesis 33, 8 (Aug), 1441--1449. DOI= http://dx.doi.org/10.1093/carcin/bgs145.Google ScholarGoogle ScholarCross RefCross Ref
  16. JINHA, A.E., 2010. Article 50 million: an estimate of the number of scholarly articles in existence. Learned Publishing 23, 3 (//), 258--263. DOI= http://dx.doi.org/10.1087/20100308.Google ScholarGoogle Scholar
  17. LANGLEY, P., BRADSHAW, G., and SIMON, H., 1983. Rediscovering Chemistry with the Bacon System. In Machine Learning, R. MICHALSKI, J. CARBONELL and T. MITCHELL Eds. Springer Berlin Heidelberg, 307--329. DOI= http://dx.doi.org/10.1007/978--3--662--12405--5_10.Google ScholarGoogle Scholar
  18. LARSEN, P.O. and VON INS, M., 2010. The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics 84, 3 (Sep), 575--603. DOI= http://dx.doi.org/10.1007/s11192-010-0202-z.Google ScholarGoogle ScholarCross RefCross Ref
  19. LI, M., HE, Y., DUBOIS, W., WU, X., SHI, J., and HUANG, J., 2012. Distinct regulatory mechanisms and functions for p53-activated and p53-repressed DNA damage response genes in embryonic stem cells. Mol Cell 46, 1 (Apr 13), 30--42. DOI= http://dx.doi.org/10.1016/j.molcel.2012.01.020.Google ScholarGoogle ScholarCross RefCross Ref
  20. LISEWSKI, A.M. and LICHTARGE, O., 2010. Untangling complex networks: risk minimization in financial markets through accessible spin glass ground states. Physica A 389, 16 (Aug 15), 3250--3253. DOI= http://dx.doi.org/10.1016/j.physa.2010.04.005.Google ScholarGoogle ScholarCross RefCross Ref
  21. MANNING, G., WHYTE, D.B., MARTINEZ, R., HUNTER, T., and SUDARSANAM, S., 2002. The protein kinase complement of the human genome. Science 298, 5600 (Dec 6), 1912--1934. DOI= http://dx.doi.org/10.1126/science.1075762.Google ScholarGoogle ScholarCross RefCross Ref
  22. MAY, P. and MAY, E., 1999. Twenty years of p53 research: structural and functional aspects of the p53 protein. Oncogene 18, 53 (Dec 13), 7621--7636. DOI= http://dx.doi.org/10.1038/sj.onc.1203285.Google ScholarGoogle ScholarCross RefCross Ref
  23. MEEK, D.W. and ANDERSON, C.W., 2009. Posttranslational modification of p53: cooperative integrators of function. Cold Spring Harb Perspect Biol 1, 6 (Dec), a000950. DOI= http://dx.doi.org/10.1101/cshperspect.a000950.Google ScholarGoogle ScholarCross RefCross Ref
  24. MULLER, P.A. and VOUSDEN, K.H., 2013. p53 mutations in cancer. Nat Cell Biol 15, 1 (Jan), 2--8. DOI= http://dx.doi.org/10.1038/ncb2641.Google ScholarGoogle ScholarCross RefCross Ref
  25. NATHANSON, J.W., YADRON, N.E., FARNAN, J., KINNEAR, S., HART, J., and RUBIN, D.T., 2008. p53 mutations are associated with dysplasia and progression of dysplasia in patients with Crohn's disease. Dig Dis Sci 53, 2 (Feb), 474--480. DOI= http://dx.doi.org/10.1007/s10620-007--9886--1.Google ScholarGoogle ScholarCross RefCross Ref
  26. SALTON, G. and MCGILL, M.J., 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. SHAWVER, L.K., SLAMON, D., and ULLRICH, A., 2002. Smart drugs: tyrosine kinase inhibitors in cancer therapy. Cancer Cell 1, 2 (Mar), 117--123.Google ScholarGoogle ScholarCross RefCross Ref
  28. SHIEH, S.Y., AHN, J., TAMAI, K., TAYA, Y., and PRIVES, C., 2000. The human homologs of checkpoint kinases Chk1 and Cds1 (Chk2) phosphorylate p53 at multiple DNA damage-inducible sites. Genes Dev 14, 3 (Feb 1), 289--300.Google ScholarGoogle Scholar
  29. SIGANAKI, M., KOUTSOPOULOS, A.V., NEOFYTOU, E., VLACHAKI, E., PSARROU, M., SOULITZIS, N., PENTILAS, N., SCHIZA, S., SIAFAKAS, N.M., and TZORTZAKI, E.G., 2010. Deregulation of apoptosis mediators' p53 and bcl2 in lung tissue of COPD patients. Respir Res 11, 46. DOI= http://dx.doi.org/10.1186/1465--9921--11--46.Google ScholarGoogle ScholarCross RefCross Ref
  30. SRINIVASAN, P., 2004. Text mining: generating hypotheses from MEDLINE. J. Am. Soc. Inf. Sci. Technol. 55, 5, 396--413. DOI= http://dx.doi.org/10.1002/asi.10389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. SWANSON, D.R., 1986. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med 30, 1 (Autumn), 7--18.Google ScholarGoogle ScholarCross RefCross Ref
  32. UNIPROT, C., 2013. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41, Database issue (Jan), D43--47. DOI= http://dx.doi.org/10.1093/nar/gks1068.Google ScholarGoogle Scholar
  33. WHEELER, D.L., CHURCH, D.M., FEDERHEN, S., LASH, A.E., MADDEN, T.L., PONTIUS, J.U., SCHULER, G.D., SCHRIML, L.M., SEQUEIRA, E., TATUSOVA, T.A., and WAGNER, L., 2003. Database resources of the National Center for Biotechnology. Nucleic Acids Res 31, 1 (Jan 1), 28--33.Google ScholarGoogle ScholarCross RefCross Ref
  34. ZHOU, D., BOUSQUET, O., WESTON, J., and SCHOLKOPF, B., 2004. Learning with local and global consistency. In Adnvaces in Neural Information Processing Systems (NIPS) 16 MIT, 321--328.Google ScholarGoogle Scholar

Index Terms

  1. Automated hypothesis generation based on mining scientific literature

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
            August 2014
            2028 pages
            ISBN:9781450329569
            DOI:10.1145/2623330

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 August 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader