skip to main content
10.1145/2858036.2858411acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Public Access
Honorable Mention

Alloy: Clustering with Crowds and Computation

Published:07 May 2016Publication History

ABSTRACT

Crowdsourced clustering approaches present a promising way to harness deep semantic knowledge for clustering complex information. However, existing approaches have difficulties supporting the global context needed for workers to generate meaningful categories, and are costly because all items require human judgments. We introduce Alloy, a hybrid approach that combines the richness of human judgments with the power of machine algorithms. Alloy supports greater global context through a new "sample and search" crowd pattern which changes the crowd's task from classifying a fixed subset of items to actively sampling and querying the entire dataset. It also improves efficiency through a two phase process in which crowds provide examples to help a machine cluster the head of the distribution, then classify low-confidence examples in the tail. To accomplish this, Alloy introduces a modular "cast and gather" approach which leverages a machine learning backbone to stitch together different types of judgment tasks.

References

  1. Paul Andre, Aniket Kittur, and Steven P Dow. Crowd synthesis: Extracting categories and clusters from complex data. In Proc. CSCW 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Richard Ernest Bellman. 2003. Dynamic Programming. Dover Publications, Incorporated.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michael S Bernstein, Greg Little, Robert C Miller, Bjorn Hartmann, Mark S Ackerman, David R Karger, David Crowell, and Katrina Panovich. Soylent: a word processor with a crowd inside. In Proc. UIST 2010. ACM, 313--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022. Google ScholarGoogle Scholar
  5. Jonathan Bragg, Daniel S Weld, and others. 2013. Crowdsourcing multi-label classification for taxonomy creation. In First AAAI conference on human computation and crowdsourcing.Google ScholarGoogle ScholarCross RefCross Ref
  6. Allison June-Barlow Chaney and David M Blei. 2012. Visualizing Topic Models.. In ICWSM.Google ScholarGoogle Scholar
  7. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM 2011 TIST 2, 3 (2011), 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lydia B Chilton, Juho Kim, Paul Andre, Felicia Cordeiro, James A Landay, Daniel S Weld, Steven P Dow, Robert C Miller, and Haoqi Zhang. 2014. Frenzy: Collaborative data organization for creating conference sessions. In Proc. CHI 2014. ACM, 1255--1264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lydia B Chilton, Greg Little, Darren Edge, Daniel S Weld, and James A Landay. 2013. Cascade: Crowdsourcing taxonomy creation. In Proc. CHI 2013. ACM, 1999--2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jason Chuang, Christopher D Manning, and Jeffrey Heer. 2012. Termite: Visualization techniques for assessing textual topic models. In Proc. of the International Working Conference on Advanced Visual Interfaces. ACM, 74--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jason Chuang, Daniel Ramage, Christopher Manning, and Jeffrey Heer. Interpretation and trust: Designing model-driven visualizations for text analysis. In Proc. CHI 2012. ACM, 443--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. JASIS 41, 6 (1990), 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. M. Schaffer D.L. Medin. 1978. Context theory of classification learning. Psychological review 85, 3 (1978), 207.Google ScholarGoogle Scholar
  14. Jerry Alan Fails and Dan R Olsen Jr. Interactive machine learning. In Proc. IUI 2003. ACM, 39--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John A Hartigan. 1975. Clustering algorithms. John Wiley & Sons, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Machine learning 95, 3 (2014), 423--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Anil K Jain and Richard C Dubes. 1988. Algorithms for clustering data. Prentice-Hall, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. K. Jain, M. N. Murty, and P. J. Flynn. 1999. Data clustering: a review. ACM Comput. Surv. 31 (1999), 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bernard J Jansen, Danielle L Booth, and Amanda Spink. 2009. Patterns of Query Reformulation During Web Searching. Journal of the American society for information science and technology 60, 7 (2009), 1358--1371. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Karen Sprck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1972), 11--21.Google ScholarGoogle ScholarCross RefCross Ref
  21. Aniket Kittur, Ed H Chi, and Bongwon Suh. Crowdsourcing user studies with Mechanical Turk. In Proc. CHI 2008. ACM, 453--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Aniket Kittur, Andrew M Peters, Abdigani Diriye, Trupti Telang, and Michael R Bove. Costs and benefits of structured information foraging. In Proc. CHI 2013. ACM, 2989--2998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. Crowdforge: Crowdsourcing complex work. In Proc UIST 2011. ACM, 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. 2009. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. Structured Labeling for Facilitating Concept Evolution in Machine Learning. In Proc. CHI 2014. 3075--3084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gierad Laput, Walter S Lasecki, Jason Wiese, Robert Xiao, Jeffrey P Bigham, and Chris Harrison. Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds. In Proc. CHI 2015. ACM, 1935--1944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C Weng. 2007. A note on Platts probabilistic outputs for support vector machines. Machine learning 68, 3 (2007), 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Christopher D Manning, Prabhakar Raghavan, and Hinrich Schutze. 2008. Introduction to information retrieval. Vol. 1. Cambridge university press Cambridge. Google ScholarGoogle Scholar
  29. Douglas L Medin and Marguerite M Schaffer. 1978. Context theory of classification learning. Psychological review 85, 3 (1978), 207.Google ScholarGoogle Scholar
  30. Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological review 106, 4 (1999), 643.Google ScholarGoogle Scholar
  31. John Platt and others. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers 10, 3 (1999), 61--74.Google ScholarGoogle Scholar
  32. Michael Steinbach, George Karypis, Vipin Kumar, and others. 2000. A comparison of document clustering techniques. In KDD workshop on text mining, Vol. 400. Boston, 525--526.Google ScholarGoogle Scholar
  33. Omer Tamuz, Ce Liu, Serge Belongie, Ohad Shamir, and Adam Tauman Kalai. 2011. Adaptively learning the crowd kernel. In In ICML11.Google ScholarGoogle Scholar
  34. Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In Proc. ICML 2009. ACM, 1073--1080. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ryen W White, Mikhail Bilenko, and Silviu Cucerzan. Studying the use of popular destinations to enhance web search interaction. In Proc. SIGIR 2007. ACM, 159--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ting-Fan Wu, Chih-Jen Lin, and Ruby C Weng. 2004. Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research 5, 975--1005 (2004), 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jinfeng Yi, Rong Jin, Anil K Jain, and Shaili Jain. 2012a. Crowdclustering with sparse pairwise labels: A matrix completion approach. In AAAI Workshop on Human Computation, Vol. 2.Google ScholarGoogle Scholar
  38. Jinfeng Yi, Rong Jin, Shaili Jain, Tianbao Yang, and Anil K Jain. 2012b. Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In Advances in Neural Information Processing Systems. 1772--1780.Google ScholarGoogle Scholar

Index Terms

  1. Alloy: Clustering with Crowds and Computation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
      May 2016
      6108 pages
      ISBN:9781450333627
      DOI:10.1145/2858036

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 May 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '16 Paper Acceptance Rate565of2,435submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI '24
      CHI Conference on Human Factors in Computing Systems
      May 11 - 16, 2024
      Honolulu , HI , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader