skip to main content
10.1145/3197026.3197059acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles

Authors Info & Claims
Published:23 May 2018Publication History

ABSTRACT

Scientific articles usually follow a common pattern of discourse, and their contents can be divided into several facets, such as objective, method, and result. We examine the efficacy of using these discourse facets for citation recommendation. A method for learning multi-vector representations of scientific articles is proposed, in which each vector encodes a discourse facet present in an article. With each facet represented as a separate vector, the similarity of articles can be measured not in their entirety, but facet by facet. The proposed representation method is tested on a new citation recommendation task called context-based co-citation recommendation. This task calls for the evaluation of article similarity in terms of citation contexts, wherein facets help to abstract and generalize the diversity of contexts. The experimental results show that the facet-based representation outperforms the standard monolithic representation of articles.

References

  1. Awais Athar. 2011. Sentiment Analysis of Citations using Sentence StructureBased Features. In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2011 Student Session. 81--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  3. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (2003), 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016).Google ScholarGoogle Scholar
  5. Donald O Case and Georgeann M Higgins. 2000. How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the Association for Information Science and Technology 51, 7 (2000), 635--645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Danish Contractor, Yufan Guo, and Anna Korhonen. 2012. Using Argumentative Zones for Extractive Summarization of Scientific Articles. In Proceedings of COLING 2012. 663--678.Google ScholarGoogle Scholar
  7. Daniel Duma and Ewan Klein. 2014. Citation Resolution: A method for evaluating context-based citation recommendation systems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 358--363.Google ScholarGoogle ScholarCross RefCross Ref
  8. Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 213--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Masaki Eto. 2012. Spread co-citation relationship as a measure for document retrieval. In Proceedings of the fifth ACM workshop on Research advances in large digital book repositories and complementary media. 7--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bela Gipp and Joeran Beel. 2009. Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis. In Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 1. 571--575.Google ScholarGoogle Scholar
  11. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Lin Sun, and Ulla Stenius. 2010. Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing. 99--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Brian Haynes, Cynthia D. Mulrow, Edward J. Huth, Douglas G. Altman, and Martin J. Gardner. 1990. More informative abstracts revisited. Annals of Internal Medicine 113, 1 (1990), 69--76.Google ScholarGoogle ScholarCross RefCross Ref
  14. Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daumé III. 2014. A neural network for factoid question answering over paragraphs. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 633--644.Google ScholarGoogle ScholarCross RefCross Ref
  15. Kokil Jaidka, Muthu Kumar Chandrasekaran, Sajal Rustagi, and Min-Yen Kan. 2018. Insights from CL-SciSumm 2016: the faceted scientific document summarization shared task. International Journal on Digital Libraries (2018). To appear. Online version available atGoogle ScholarGoogle Scholar
  16. Kokil Jaidka, Devanshu Jain, and Min-Yen Kan. 2017. The CL-SciSumm shared task 2017: results and key insights. In Proceedings of the Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm 2017), organized as a part of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017). 1--15.Google ScholarGoogle Scholar
  17. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759 (2016).Google ScholarGoogle Scholar
  19. David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, and Dan Jurafsky. 2016. Citation classification for behavioral analysis of a scientific field. arXiv preprint arXiv:1609.00435 (2016).Google ScholarGoogle Scholar
  20. Senay Kafkas, Xingjun Pi, Nikos Marinos, Andrew Morrison, Johanna R McEntyre, et al. 2015. Section level search functionality in Europe PMC. Journal of biomedical semantics 6, 1 (2015), 7.Google ScholarGoogle ScholarCross RefCross Ref
  21. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14). 1188--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cynthia D. Mulrow, Stephen B. Thacker, and Jacqueline A. Pugh. 1988. A proposal for more informative abstracts of review articles. Annals of Internal Medicine 108, 4 (1988), 613--615.Google ScholarGoogle ScholarCross RefCross Ref
  24. Tsendsuren Munkhdalai, John Lalor, and Hong Yu. 2016. Citation analysis with neural attention models. In Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis. 69--77.Google ScholarGoogle ScholarCross RefCross Ref
  25. Hidetsugu Nanba and Manabu Okumura. 1999. Towards Multi-paper Summarization Using Reference Information. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. 926--931. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. U.S. National Library of Medicine. Oct 26, 2015. Structured Abstracts in MEDLINE. (Oct 26, 2015). https://structuredabstracts.nlm.nih.gov/Google ScholarGoogle Scholar
  27. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).Google ScholarGoogle Scholar
  29. Henry Small. 1973. Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of American Society for Information Science 24 (1973), 265--269.Google ScholarGoogle ScholarCross RefCross Ref
  30. Kazunari Sugiyama and Min-Yen Kan. 2015. A Comprehensive Evaluation of Scholarly Paper Recommendation Using Potential Citation Papers. International Journal on Digital Libraries 16, 2 (2015), 91--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. 1067--1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Simone Teufel and Marc Moens. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational linguistics 28, 4 (2002), 410--445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Simone Teufel, Advaith Siddharthan, and Dan Tidhar. 2006. Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing. 103--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Text Analysis Conference 2014. Text Analysis Conference 2014 Biomedical Summarization Task. https://tac.nist.gov/2014/BiomedSumm/index.html. (2014).Google ScholarGoogle Scholar
  35. Marco Valenzuela, Vu Ha, and Oren Etzioni. 2015. Identifying Meaningful Citations. In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar

Index Terms

  1. Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries
            May 2018
            453 pages
            ISBN:9781450351782
            DOI:10.1145/3197026

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 May 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            JCDL '18 Paper Acceptance Rate26of71submissions,37%Overall Acceptance Rate415of1,482submissions,28%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader