ABSTRACT
We develop a people-centered computational history of science that tracks authors over topics and apply it to the history of computational linguistics. We present four findings in this paper. First, we identify the topical subfields authors work on by assigning automatically generated topics to each paper in the ACL Anthology from 1980 to 2008. Next, we identify four distinct research epochs where the pattern of topical overlaps are stable and different from other eras: an early NLP period from 1980 to 1988, the period of US government-sponsored MUC and ATIS evaluations from 1989 to 1994, a transitory period until 2001, and a modern integration period from 2002 onwards. Third, we analyze the flow of authors across topics to discern how some subfields flow into the next, forming different stages of ACL research. We find that the government-sponsored bakeoffs brought new researchers to the field, and bridged early topics to modern probabilistic approaches. Last, we identify steep increases in author retention during the bakeoff era and the modern era, suggesting two points at which the field became more integrated.
- A. Aris, B. Shneiderman, V. Qazvinian, and D. Radev. 2009. Visual overviews for discovering key papers and influences across research fronts. Journal of the American Society for Information Science and Technology, 60(11): 2219--2228. Google ScholarDigital Library
- C. Au Yeung and A. Jatowt. 2011. Studying how the past is remembered: towards computational history through large scale text mining. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1231--1240. ACM. Google ScholarDigital Library
- S. Bird, R. Dale, B. J. Dorr, B. Gibson, M. Joseph, M. Y. Kan, D. Lee, B. Powley, D. R. Radev, and Y. F. Tan. 2008. The acl anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In Proc. of the 6th International Conference on Language Resources and Evaluation Conference (LREC'08), pages 1755--1759.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3(5): 993--1022. Google ScholarDigital Library
- D. A. Dahl, M. Bates, M. Brown, W. Fisher, K. Hunicke-Smith, D. Pallett, C. Pao, A. Rudnicky, and E. Shriberg. 1994. Expanding the scope of the atis task: The atis-3 corpus. In Proceedings of the workshop on Human Language Technology, pages 43--48. Association for Computational Linguistics. Google ScholarDigital Library
- S. Gerrish and D. M. Blei. 2010. A language-based approach to measuring scholarly impact. In Proceedings of the 26th International Conference on Machine Learning.Google Scholar
- T. L. Griffiths and M. Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1): 5228.Google ScholarCross Ref
- R. Grishman and B. Sundheim. 1996. Message understanding conference-6: A brief history. In Proceedings of COLING, volume 96, pages 466--471. Google ScholarDigital Library
- David Hall, Daniel Jurafsky, and Christopher D. Manning. 2008. Studying the history of ideas using topic models. In Proceedings of EMNLP 2008. Google ScholarDigital Library
- C. T. Hemphill, J. J. Godfrey, and G. R. Doddington. 1990. The atis spoken language systems pilot corpus. In Proceedings of the DARPA speech and natural language workshop, pages 96--101. Google ScholarDigital Library
- P. Price. 1990. Evaluation of spoken language systems: The atis domain. In Proceedings of the Third DARPA Speech and Natural Language Workshop, pages 91--95. Morgan Kaufmann. Google ScholarDigital Library
- D. R. Radev, P. Muthukrishnan, and V. Qazvinian. 2009. The acl anthology network corpus. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pages 54--61. Association for Computational Linguistics. Google ScholarDigital Library
- Y. Tu, N. Johri, D. Roth, and J. Hockenmaier. 2010. Citation author topic model in expert search. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 1265--1273. Association for Computational Linguistics. Google ScholarDigital Library
Recommendations
Displaying cultural history: the Smithsonian institution and the world's fairs
Meet me at the FairOf all displays exhibited at World's Fairs, some of the most iconic centered on habitat or environmental displays that presented groups of figures--live or fake--performing tasks or reenacting historical tableaux. This essay explores the Smithsonian ...
Comments