research-article

Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles

Authors:
Yuta Kobayashi

Nara Institute of Science and Technology, Ikoma, Nara, Japan

Nara Institute of Science and Technology, Ikoma, Nara, Japan
View Profile

,
Masashi Shimbo

Nara Institute of Science and Technology, Ikoma, Nara, Japan

Nara Institute of Science and Technology, Ikoma, Nara, Japan
View Profile

,
Yuji Matsumoto

Nara Institute of Science and Technology, Ikoma, Nara, Japan

Nara Institute of Science and Technology, Ikoma, Nara, Japan
View Profile

JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital LibrariesMay 2018Pages 243–251https://doi.org/10.1145/3197026.3197059

Published:23 May 2018Publication History

JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries

Pages 243–251

ABSTRACT

Scientific articles usually follow a common pattern of discourse, and their contents can be divided into several facets, such as objective, method, and result. We examine the efficacy of using these discourse facets for citation recommendation. A method for learning multi-vector representations of scientific articles is proposed, in which each vector encodes a discourse facet present in an article. With each facet represented as a separate vector, the similarity of articles can be measured not in their entirety, but facet by facet. The proposed representation method is tested on a new citation recommendation task called context-based co-citation recommendation. This task calls for the evaluation of article similarity in terms of citation contexts, wherein facets help to abstract and generalize the diversity of contexts. The experimental results show that the facet-based representation outperforms the standard monolithic representation of articles.

References

Awais Athar. 2011. Sentiment Analysis of Citations using Sentence StructureBased Features. In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2011 Student Session. 81--87. Google ScholarDigital Library
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (2003), 1137--1155. Google ScholarDigital Library
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016).Google Scholar
Donald O Case and Georgeann M Higgins. 2000. How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the Association for Information Science and Technology 51, 7 (2000), 635--645. Google ScholarDigital Library
Danish Contractor, Yufan Guo, and Anna Korhonen. 2012. Using Argumentative Zones for Extractive Summarization of Scientific Articles. In Proceedings of COLING 2012. 663--678.Google Scholar
Daniel Duma and Ewan Klein. 2014. Citation Resolution: A method for evaluating context-based citation recommendation systems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 358--363.Google ScholarCross Ref
Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 213--220. Google ScholarDigital Library
Masaki Eto. 2012. Spread co-citation relationship as a measure for document retrieval. In Proceedings of the fifth ACM workshop on Research advances in large digital book repositories and complementary media. 7--8. Google ScholarDigital Library
Bela Gipp and Joeran Beel. 2009. Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis. In Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 1. 571--575.Google Scholar
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864. Google ScholarDigital Library
Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Lin Sun, and Ulla Stenius. 2010. Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing. 99--107. Google ScholarDigital Library
R. Brian Haynes, Cynthia D. Mulrow, Edward J. Huth, Douglas G. Altman, and Martin J. Gardner. 1990. More informative abstracts revisited. Annals of Internal Medicine 113, 1 (1990), 69--76.Google ScholarCross Ref
Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daumé III. 2014. A neural network for factoid question answering over paragraphs. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 633--644.Google ScholarCross Ref
Kokil Jaidka, Muthu Kumar Chandrasekaran, Sajal Rustagi, and Min-Yen Kan. 2018. Insights from CL-SciSumm 2016: the faceted scientific document summarization shared task. International Journal on Digital Libraries (2018). To appear. Online version available atGoogle Scholar
Kokil Jaidka, Devanshu Jain, and Min-Yen Kan. 2017. The CL-SciSumm shared task 2017: results and key insights. In Proceedings of the Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm 2017), organized as a part of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017). 1--15.Google Scholar
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422--446. Google ScholarDigital Library
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759 (2016).Google Scholar
David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, and Dan Jurafsky. 2016. Citation classification for behavioral analysis of a scientific field. arXiv preprint arXiv:1609.00435 (2016).Google Scholar
Senay Kafkas, Xingjun Pi, Nikos Marinos, Andrew Morrison, Johanna R McEntyre, et al. 2015. Section level search functionality in Europe PMC. Journal of biomedical semantics 6, 1 (2015), 7.Google ScholarCross Ref
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14). 1188--1196. Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119. Google ScholarDigital Library
Cynthia D. Mulrow, Stephen B. Thacker, and Jacqueline A. Pugh. 1988. A proposal for more informative abstracts of review articles. Annals of Internal Medicine 108, 4 (1988), 613--615.Google ScholarCross Ref
Tsendsuren Munkhdalai, John Lalor, and Hong Yu. 2016. Citation analysis with neural attention models. In Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis. 69--77.Google ScholarCross Ref
Hidetsugu Nanba and Manabu Okumura. 1999. Towards Multi-paper Summarization Using Reference Information. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. 926--931. Google ScholarDigital Library
U.S. National Library of Medicine. Oct 26, 2015. Structured Abstracts in MEDLINE. (Oct 26, 2015). https://structuredabstracts.nlm.nih.gov/Google Scholar
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710. Google ScholarDigital Library
Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).Google Scholar
Henry Small. 1973. Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of American Society for Information Science 24 (1973), 265--269.Google ScholarCross Ref
Kazunari Sugiyama and Min-Yen Kan. 2015. A Comprehensive Evaluation of Scholarly Paper Recommendation Using Potential Citation Papers. International Journal on Digital Libraries 16, 2 (2015), 91--109. Google ScholarDigital Library
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. 1067--1077. Google ScholarDigital Library
Simone Teufel and Marc Moens. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational linguistics 28, 4 (2002), 410--445. Google ScholarDigital Library
Simone Teufel, Advaith Siddharthan, and Dan Tidhar. 2006. Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing. 103--110. Google ScholarDigital Library
Text Analysis Conference 2014. Text Analysis Conference 2014 Biomedical Summarization Task. https://tac.nist.gov/2014/BiomedSumm/index.html. (2014).Google Scholar
Marco Valenzuela, Vu Ha, and Oren Etzioni. 2015. Identifying Meaningful Citations. In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence.Google Scholar

Index Terms

Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval

Recommendations

Scientific Article Recommendation by using Distributed Representations of Text and Graph
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web Companion

Scientific article recommendation problem deals with recommending similar scientific articles given a query article. It can be categorized as a content based similarity system. Recent advancements in representation learning methods have proven to be ...
Read More
Co-citation analysis between coupler authors of a scientific domain’s citation identity: a case study in scientometrics
Abstract
This research aims to visualize the intensity of proximity between authors simultaneously cited (couplers) through a set of articles of great interest to the scientometric community. Author co-citation analysis consecutively to bibliographic ...
Read More
The proximity of co-citation

Traditional co-citation analysis has not taken the proximity of co-cited references into account. As long as two references are cited by the same article, they are retreated equally regardless the distance between where citations appear in the article. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries
May 2018
453 pages
ISBN:9781450351782
DOI:10.1145/3197026
General Chairs:
Jiangping Chen
College of Information, UNT, USA
,
Marcos André Gonçalves
, Brazil
,
Jeff M. Allen
College of Information, UNT, USA
,
Program Chairs:
Edward A. Fox
Virginia Tech, USA
,
Min-Yen Kan
National University of Singapore, Singapore
,
Vivien Petras
Humboldt-Universität zu Berlin, Germany
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
co-citation analysis
discourse facet
natural language processing
representation learning
scientific article
Qualifiers
- research-article
Conference

Acceptance Rates
JCDL '18 Paper Acceptance Rate26of71submissions,37%Overall Acceptance Rate415of1,482submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 453
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles

JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scientific Article Recommendation by using Distributed Representations of Text and Graph

Co-citation analysis between coupler authors of a scientific domain’s citation identity: a case study in scientometrics

The proximity of co-citation