ABSTRACT
This paper describes design principles for and the implementation of Gavagai Explorer---a new application which builds on interactive text clustering to extract themes from topically coherent text sets such as open text answers to surveys or questionnaires. An automated system is quick, consistent, and has full coverage over the study material. A system allows an analyst to analyze more answers in a given time period; provides the same initial results regardless of who does the analysis, reducing the risks of inter-rater discrepancy; and does not risk miss responses due to fatige or boredom. These factors reduce the cost and increase the reliability of the service. The most important feature, however, is relieving the human analyst from the frustrating aspects of the coding task, freeing the effort to the central challenge of understanding themes. Gavagai Explorer is available on-line.
- Douglass R Cutting, David R Karger, Jan O Pedersen, and John W Tukey. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. ACM. Google ScholarDigital Library
- Gavagai. 2016. What do you most wish for the coming year? Stockholm. http: //gavagai.se/wp-content/uploads/2016/03/AMFPension-CustomerCase.pdfGoogle Scholar
- Gavagai. 2017. What makes airline passengers happy? Stockholm. http://gavagai. se/blog/2017/04/24/what-makes-airline-passengers-happy/Google Scholar
- Amaru Cuba Gyllensten and Magnus Sahlgren. 2015. Navigating the Semantic Horizon using Relative Neighborhood Graphs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP.Google ScholarCross Ref
- Svenska institutet. 2016. Feministisk utrikespolitik: rott skynke eller vit flagg' ¨ Stockholm. https://si.se/wp-content/uploads/2016/12/Sverigebilden-Rapportom synen pa -jamstalldhet.pdf (In Swedish; A slide deck with a summary in English is at http://gavagai.se/Gender Equality Study.pdf).Google Scholar
- Nick Jardine and Cornelis Joost van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information storage and retrieval 7, 5 (1971).Google Scholar
- Sofus A. Macskassy, Arunava Banerjee, Brian D. Davison, and Haym Hirsh. 1998. Human Performance on Clustering Web Pages: A Preliminary Study. In Proceedings of the Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
- Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. 1999. A machine learning approach to building domain-specific search engines. In Proceedings of the International Joint Conference on Artificial Intelligence. IJCAI. Google ScholarDigital Library
- Alicia O'Cathain and Kate J Thomas. 2004. " Any other comments?" Open questions on questionnaires--a bane or a bonus to research? BMC medical research methodology 4, 1 (2004).Google Scholar
- Peter Pirolli, Patricia Schank, Marti Hearst, and Christine Diehl. 1996. Scatter/gather browsing communicates the topic structure of a very large text collection. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM. Google ScholarDigital Library
- Dmitri G Roussinov and Hsinchun Chen. 1999. Document clustering for electronic meetings: an experimental comparison of two techniques. Decision Support Systems 27, 1 (1999). Google ScholarDigital Library
- Magnus Sahlgren, Amaru Cuba Gyllensten, Fredrik Espinoza, Ola Hamfors, Jussi Karlgren, Fredrik Olsson, Per Persson, Akshay Viswanathan, and Anders Holst. 2016. The Gavagai Living Lexicon. In Language Resources and Evaluation Conference. ELRA.Google Scholar
- Mark Sanderson and Bruce Croft. 1999. Deriving concept hierarchies from text. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM. Google ScholarDigital Library
- Greg Schohn and David Cohn. 2000. Less is More: Active Learning with Support Vector Machines. In Proceedings of the International Conference on Machine Learning. ACM. Google ScholarDigital Library
Index Terms
- Analysis of Open Answers to Survey Questions through Interactive Clustering and Theme Extraction
Recommendations
Interactive summarization and exploration of top aggregate query answers
We present a system for summarization and interactive exploration of high-valued aggregate query answers to make a large set of possible answers more informative to the user. Our system outputs a set of clusters on the high-valued query answers showing ...
Exploratory Visual Analysis and Interactive Pattern Extraction from Semi-Structured Data
Special Issue on Behavior Understanding for Arts and Entertainment (Part 2 of 2) and Regular ArticlesSemi-structured documents are a common type of data containing free text in natural language (unstructured data) as well as additional information about the document, or meta-data, typically following a schema or controlled vocabulary (structured data). ...
Interactive summarization and exploration of top aggregate query answers
We present a system for summarization and interactive exploration of high-valued aggregate query answers to make a large set of possible answers more informative to the user. Our system outputs a set of clusters on the high-valued query answers showing ...
Comments