skip to main content
research-article
Open Access

Never-ending learning

Authors Info & Claims
Published:24 April 2018Publication History
Skip Abstract Section

Abstract

Whereas people learn many different types of knowledge from diverse experiences over many years, and become better learners over time, most current machine learning systems are much more narrow, learning just a single function or data model based on statistical analysis of a single data set. We suggest that people learn better than computers precisely because of this difference, and we suggest a key direction for machine learning research is to develop software architectures that enable intelligent agents to also learn many types of knowledge, continuously over many years, and to become better learners over time. In this paper we define more precisely this never-ending learning paradigm for machine learning, and we present one case study: the Never-Ending Language Learner (NELL), which achieves a number of the desired properties of a never-ending learner. NELL has been learning to read the Web 24hrs/day since January 2010, and so far has acquired a knowledge base with 120mn diverse, confidence-weighted beliefs (e.g., servedWith(tea,biscuits)), while learning thousands of interrelated functions that continually improve its reading competence over time. NELL has also learned to reason over its knowledge base to infer new beliefs it has not yet read from those it has, and NELL is inventing new relational predicates to extend the ontology it uses to represent beliefs. We describe the design of NELL, experimental results illustrating its behavior, and discuss both its successes and shortcomings as a case study in never-ending learning. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.

References

  1. Balcan, M.-F., Blum, A. A PAC-style model for learning from labeled and unlabeled data. Proc. of COLT (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bengio, Y. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1 (2009), 1--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bengio, Y., Louradour, J., Collobert, R., Weston, J. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (2009), ACM, 41--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Blum, A., Mitchell, T. Combining labeled and unlabeled data with co-training. Proc. of COLT (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brunskill, E., Leffler, B., Li, L., Littman, M.L., Roy, N. Corl: A continuous-state offset-dynamics reinforcement learner. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI) (2012), 53--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Callan, J. Clueweb12 data set (2013; http://lemurproject.org/clueweb12/.Google ScholarGoogle Scholar
  7. Callan, J., Hoy, M. Clueweb09 data set (2009) http://boston.lti.cs.cmu.edu/Data/clueweb09/.Google ScholarGoogle Scholar
  8. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr, E.R., Mitchell, T.M. Toward an architecture for never-ending language learning. AAAI 5, 3 (2010a). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M. Coupled semi-supervised learning for information extraction. Proc. of WSDM (2010b). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Caruana, R. Multitask learning. Machine Learning 28 (1997), 41--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen, Z., Liu, B. Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 10, 3 (2016), 1--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen, X., Shrivastava, A., Gupta, A. Neil: Extracting visual knowledge from web data. In Proceedings of ICCV (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S. Learning to extract symbolic knowledge from the world wide web. In Proceedings of the 15th National Conference on Artificial Intelligence (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dempster, A., Laird, N., Rubin, D. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Series B (1977).Google ScholarGoogle Scholar
  15. Donmez, P., Carbonell, J.G. Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In Proceedings of the 17th ACM conference on Information and knowledge management (2008), ACM, 619--628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Duarte, M.C., Hruschka Jr., E.R. How to read the web in portuguese using the never-ending language learner's principles. In Intelligent Systems Design and Applications (ISDA), 2014 14th International Conference on (2014), IEEE, 162--167.Google ScholarGoogle ScholarCross RefCross Ref
  17. Etzioni, O.e.a. Web-scale information extraction in knowitall (preliminary results). In WWW (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Etzioni, O.e.a. Open information extraction: The second generation. Proc. of IJCAI (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gardner, M., Talukdar, P., Krishnamurthy, J., Mitchell, T. Incorporating vector space similarity in random walk inference over knowledge bases. Proc. of EMNLP (2014).Google ScholarGoogle ScholarCross RefCross Ref
  20. Krishnamurthy, J., Mitchell, T.M. Which noun phrases denote which concepts. Proc. of ACL (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Laird, J., Newell, A., Rosenbloom, P. SOAR: An architecture for general intelligence. Artif. Intel. 33, (1987), 1--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Langley, P., McKusick, K.B., Allen, J.A., Iba, W.F., Thompson, K. A design for the ICARUS architecture. SIGART Bull. 2, 4 (1991), 104--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lao, N., Mitchell, T., Cohen, W.W. Random walk inference and learning in a large scale knowledge base. Proc. of EMNLP (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lenat, D.B. Eurisko: A program that learns new heuristics and domain concepts. Artif. Intel. 21, 1--2 (1983), 61--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Maaten, L.v.d., Hinton, G. Visualizing data using t-SNE. J. Machine Learning Res. 9, Nov (2008):2579--2605.Google ScholarGoogle Scholar
  26. Mitchell, T.M., Allen, J., Chalasani, P., Cheng, J., Etzioni, O., Ringuette, M.N., Schlimmer, J.C. THEO: A framework for self-improving systems. Arch. for Intel. (1991), 323--356.Google ScholarGoogle Scholar
  27. Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Betteridge, J., Carlson, A., Dalvi, B., Gardner, M., Kisiel, B., Krishnamurthy, J., Lao, N., Mazaitis, K., Mohamed, T., Nakashole, N., Platanios, E., Ritter, A., Samadi, M., Settles, B., Wang, R., Wijaya, D., Gupta, A., Chen, X., Saparov, A., Greaves, M., Welling, J. Never-ending learning. In AAAI Conference on Artificial Intelligence (2015), AAAI, 2302--2310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mohamed, T., Hruschka Jr., E.R., Mitchell, T.M. Discovering relations between noun categories. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (2011), Association for Computational Linguistics, Edinburgh, Scotland, UK, 1447--1455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Muggleton, S., Buntine, W. Machine invention of first-order predicates by inverting resolution. Inductivelogic programming (1992), 261--280.Google ScholarGoogle Scholar
  30. Nigam, K., McCallum, A., Thrun, S., Mitchell, T. Text classification using labeled and unlabeled documents. Machine Learning 39 (2000), 103--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pedro, S.D., Hruschka Jr, E.R. Conversing learning: Active learning and active social interaction for human supervision in never-ending learning systems. In Advances in Artificial Intelligence--IBERAMIA 2012 (Springer, 2012), 231--240.Google ScholarGoogle ScholarCross RefCross Ref
  32. Platanios, E.A., Blum, A., Mitchell, T.M. Estimating Accuracy from Unlabeled Data. Proc. of UAI (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Platanios, E.A., Dubey, A., Mitchell, T.M. Estimating Accuracy from Unlabeled Data: A Bayesian Approach. In Proceedings of the International Conference on Machine Learning (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Platanios, E.A., Poon, H., Mitchell, T.M., Horvitz, E. Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach (2017). preprint, https://arxiv.org/abs/1705.07086.Google ScholarGoogle Scholar
  35. Pujara, J., Miao, H., Getoor, L., Cohen, W. Knowledge graph identification. ISWC (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Samadi, M., Veloso, M.M., Blum, M. Openeval: Web information query evaluation. In AAAI (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Suchanek, F.M., Kasneci, G., Weikum, G. Yago: A Core of Semantic Knowledge. In 16th international World Wide Web conference (WWW 2007) (2007), ACM Press, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Thrun, S., Mitchell, T. Lifelong robot learning. Rob. Auton. Sys. 15, (1995), 25--46.Google ScholarGoogle ScholarCross RefCross Ref
  39. Thrun, S., Pratt, L. (eds) Learning to learn, Kluwer Academic Publishers, Norwell, MA, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tong, S., Koller, D. Active learning for structure in bayesian networks. IJCAI (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Wang, R.C., Cohen, W.W. Language-independent set expansion of named entities using the web. Proc. of ICDM (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wieting, J., Bansal, M., Gimpel, K., Livescu, K. Towards universal paraphrastic sentence embeddings. In Proceedings of the International Conference on Learning Representations (ICLR) (2015).Google ScholarGoogle Scholar
  43. Wijaya, D.T. VerbKB: A Knowledge Base of Verbs for Natural Language Understanding. Ph.D. Dissertation, Carnegie Mellon University, 2016.Google ScholarGoogle Scholar
  44. Yang, B., Mitchell, T. Leveraging knowledge bases in lstms for improving machine reading. ACL (2017).Google ScholarGoogle Scholar

Index Terms

  1. Never-ending learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Communications of the ACM
          Communications of the ACM  Volume 61, Issue 5
          May 2018
          104 pages
          ISSN:0001-0782
          EISSN:1557-7317
          DOI:10.1145/3210350
          Issue’s Table of Contents

          Copyright © 2018 Owner/Author

          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 April 2018

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format