ABSTRACT
Scientific articles usually follow a common pattern of discourse, and their contents can be divided into several facets, such as objective, method, and result. We examine the efficacy of using these discourse facets for citation recommendation. A method for learning multi-vector representations of scientific articles is proposed, in which each vector encodes a discourse facet present in an article. With each facet represented as a separate vector, the similarity of articles can be measured not in their entirety, but facet by facet. The proposed representation method is tested on a new citation recommendation task called context-based co-citation recommendation. This task calls for the evaluation of article similarity in terms of citation contexts, wherein facets help to abstract and generalize the diversity of contexts. The experimental results show that the facet-based representation outperforms the standard monolithic representation of articles.
- Awais Athar. 2011. Sentiment Analysis of Citations using Sentence StructureBased Features. In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2011 Student Session. 81--87. Google ScholarDigital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (2003), 1137--1155. Google ScholarDigital Library
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016).Google Scholar
- Donald O Case and Georgeann M Higgins. 2000. How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the Association for Information Science and Technology 51, 7 (2000), 635--645. Google ScholarDigital Library
- Danish Contractor, Yufan Guo, and Anna Korhonen. 2012. Using Argumentative Zones for Extractive Summarization of Scientific Articles. In Proceedings of COLING 2012. 663--678.Google Scholar
- Daniel Duma and Ewan Klein. 2014. Citation Resolution: A method for evaluating context-based citation recommendation systems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 358--363.Google ScholarCross Ref
- Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 213--220. Google ScholarDigital Library
- Masaki Eto. 2012. Spread co-citation relationship as a measure for document retrieval. In Proceedings of the fifth ACM workshop on Research advances in large digital book repositories and complementary media. 7--8. Google ScholarDigital Library
- Bela Gipp and Joeran Beel. 2009. Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis. In Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 1. 571--575.Google Scholar
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864. Google ScholarDigital Library
- Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Lin Sun, and Ulla Stenius. 2010. Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing. 99--107. Google ScholarDigital Library
- R. Brian Haynes, Cynthia D. Mulrow, Edward J. Huth, Douglas G. Altman, and Martin J. Gardner. 1990. More informative abstracts revisited. Annals of Internal Medicine 113, 1 (1990), 69--76.Google ScholarCross Ref
- Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daumé III. 2014. A neural network for factoid question answering over paragraphs. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 633--644.Google ScholarCross Ref
- Kokil Jaidka, Muthu Kumar Chandrasekaran, Sajal Rustagi, and Min-Yen Kan. 2018. Insights from CL-SciSumm 2016: the faceted scientific document summarization shared task. International Journal on Digital Libraries (2018). To appear. Online version available atGoogle Scholar
- Kokil Jaidka, Devanshu Jain, and Min-Yen Kan. 2017. The CL-SciSumm shared task 2017: results and key insights. In Proceedings of the Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm 2017), organized as a part of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017). 1--15.Google Scholar
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422--446. Google ScholarDigital Library
- Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759 (2016).Google Scholar
- David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, and Dan Jurafsky. 2016. Citation classification for behavioral analysis of a scientific field. arXiv preprint arXiv:1609.00435 (2016).Google Scholar
- Senay Kafkas, Xingjun Pi, Nikos Marinos, Andrew Morrison, Johanna R McEntyre, et al. 2015. Section level search functionality in Europe PMC. Journal of biomedical semantics 6, 1 (2015), 7.Google ScholarCross Ref
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14). 1188--1196. Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119. Google ScholarDigital Library
- Cynthia D. Mulrow, Stephen B. Thacker, and Jacqueline A. Pugh. 1988. A proposal for more informative abstracts of review articles. Annals of Internal Medicine 108, 4 (1988), 613--615.Google ScholarCross Ref
- Tsendsuren Munkhdalai, John Lalor, and Hong Yu. 2016. Citation analysis with neural attention models. In Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis. 69--77.Google ScholarCross Ref
- Hidetsugu Nanba and Manabu Okumura. 1999. Towards Multi-paper Summarization Using Reference Information. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. 926--931. Google ScholarDigital Library
- U.S. National Library of Medicine. Oct 26, 2015. Structured Abstracts in MEDLINE. (Oct 26, 2015). https://structuredabstracts.nlm.nih.gov/Google Scholar
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710. Google ScholarDigital Library
- Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).Google Scholar
- Henry Small. 1973. Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of American Society for Information Science 24 (1973), 265--269.Google ScholarCross Ref
- Kazunari Sugiyama and Min-Yen Kan. 2015. A Comprehensive Evaluation of Scholarly Paper Recommendation Using Potential Citation Papers. International Journal on Digital Libraries 16, 2 (2015), 91--109. Google ScholarDigital Library
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. 1067--1077. Google ScholarDigital Library
- Simone Teufel and Marc Moens. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational linguistics 28, 4 (2002), 410--445. Google ScholarDigital Library
- Simone Teufel, Advaith Siddharthan, and Dan Tidhar. 2006. Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing. 103--110. Google ScholarDigital Library
- Text Analysis Conference 2014. Text Analysis Conference 2014 Biomedical Summarization Task. https://tac.nist.gov/2014/BiomedSumm/index.html. (2014).Google Scholar
- Marco Valenzuela, Vu Ha, and Oren Etzioni. 2015. Identifying Meaningful Citations. In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence.Google Scholar
Index Terms
- Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles
Recommendations
Scientific Article Recommendation by using Distributed Representations of Text and Graph
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web CompanionScientific article recommendation problem deals with recommending similar scientific articles given a query article. It can be categorized as a content based similarity system. Recent advancements in representation learning methods have proven to be ...
Co-citation analysis between coupler authors of a scientific domain’s citation identity: a case study in scientometrics
AbstractThis research aims to visualize the intensity of proximity between authors simultaneously cited (couplers) through a set of articles of great interest to the scientometric community. Author co-citation analysis consecutively to bibliographic ...
The proximity of co-citation
Traditional co-citation analysis has not taken the proximity of co-cited references into account. As long as two references are cited by the same article, they are retreated equally regardless the distance between where citations appear in the article. ...
Comments