Alloy: Clustering with Crowds and Computation

Authors:
Joseph Chee Chang

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Aniket Kittur

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Nathan Hahn

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing SystemsMay 2016Pages 3180–3191https://doi.org/10.1145/2858036.2858411

Published:07 May 2016Publication History

CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

Pages 3180–3191

ABSTRACT

Crowdsourced clustering approaches present a promising way to harness deep semantic knowledge for clustering complex information. However, existing approaches have difficulties supporting the global context needed for workers to generate meaningful categories, and are costly because all items require human judgments. We introduce Alloy, a hybrid approach that combines the richness of human judgments with the power of machine algorithms. Alloy supports greater global context through a new "sample and search" crowd pattern which changes the crowd's task from classifying a fixed subset of items to actively sampling and querying the entire dataset. It also improves efficiency through a two phase process in which crowds provide examples to help a machine cluster the head of the distribution, then classify low-confidence examples in the tail. To accomplish this, Alloy introduces a modular "cast and gather" approach which leverages a machine learning backbone to stitch together different types of judgment tasks.

References

Paul Andre, Aniket Kittur, and Steven P Dow. Crowd synthesis: Extracting categories and clusters from complex data. In Proc. CSCW 2014. Google ScholarDigital Library
Richard Ernest Bellman. 2003. Dynamic Programming. Dover Publications, Incorporated.Google ScholarDigital Library
Michael S Bernstein, Greg Little, Robert C Miller, Bjorn Hartmann, Mark S Ackerman, David R Karger, David Crowell, and Katrina Panovich. Soylent: a word processor with a crowd inside. In Proc. UIST 2010. ACM, 313--322. Google ScholarDigital Library
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022. Google Scholar
Jonathan Bragg, Daniel S Weld, and others. 2013. Crowdsourcing multi-label classification for taxonomy creation. In First AAAI conference on human computation and crowdsourcing.Google ScholarCross Ref
Allison June-Barlow Chaney and David M Blei. 2012. Visualizing Topic Models.. In ICWSM.Google Scholar
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM 2011 TIST 2, 3 (2011), 27. Google ScholarDigital Library
Lydia B Chilton, Juho Kim, Paul Andre, Felicia Cordeiro, James A Landay, Daniel S Weld, Steven P Dow, Robert C Miller, and Haoqi Zhang. 2014. Frenzy: Collaborative data organization for creating conference sessions. In Proc. CHI 2014. ACM, 1255--1264. Google ScholarDigital Library
Lydia B Chilton, Greg Little, Darren Edge, Daniel S Weld, and James A Landay. 2013. Cascade: Crowdsourcing taxonomy creation. In Proc. CHI 2013. ACM, 1999--2008. Google ScholarDigital Library
Jason Chuang, Christopher D Manning, and Jeffrey Heer. 2012. Termite: Visualization techniques for assessing textual topic models. In Proc. of the International Working Conference on Advanced Visual Interfaces. ACM, 74--77. Google ScholarDigital Library
Jason Chuang, Daniel Ramage, Christopher Manning, and Jeffrey Heer. Interpretation and trust: Designing model-driven visualizations for text analysis. In Proc. CHI 2012. ACM, 443--452. Google ScholarDigital Library
Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. JASIS 41, 6 (1990), 391--407.Google ScholarCross Ref
M. M. Schaffer D.L. Medin. 1978. Context theory of classification learning. Psychological review 85, 3 (1978), 207.Google Scholar
Jerry Alan Fails and Dan R Olsen Jr. Interactive machine learning. In Proc. IUI 2003. ACM, 39--45.Google ScholarDigital Library
John A Hartigan. 1975. Clustering algorithms. John Wiley & Sons, Inc. Google ScholarDigital Library
Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Machine learning 95, 3 (2014), 423--469. Google ScholarDigital Library
Anil K Jain and Richard C Dubes. 1988. Algorithms for clustering data. Prentice-Hall, Inc. Google ScholarDigital Library
A. K. Jain, M. N. Murty, and P. J. Flynn. 1999. Data clustering: a review. ACM Comput. Surv. 31 (1999), 3. Google ScholarDigital Library
Bernard J Jansen, Danielle L Booth, and Amanda Spink. 2009. Patterns of Query Reformulation During Web Searching. Journal of the American society for information science and technology 60, 7 (2009), 1358--1371. Google ScholarDigital Library
Karen Sprck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1972), 11--21.Google ScholarCross Ref
Aniket Kittur, Ed H Chi, and Bongwon Suh. Crowdsourcing user studies with Mechanical Turk. In Proc. CHI 2008. ACM, 453--456. Google ScholarDigital Library
Aniket Kittur, Andrew M Peters, Abdigani Diriye, Trupti Telang, and Michael R Bove. Costs and benefits of structured information foraging. In Proc. CHI 2013. ACM, 2989--2998. Google ScholarDigital Library
Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. Crowdforge: Crowdsourcing complex work. In Proc UIST 2011. ACM, 43--52. Google ScholarDigital Library
Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. 2009. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD (2009). Google ScholarDigital Library
Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. Structured Labeling for Facilitating Concept Evolution in Machine Learning. In Proc. CHI 2014. 3075--3084. Google ScholarDigital Library
Gierad Laput, Walter S Lasecki, Jason Wiese, Robert Xiao, Jeffrey P Bigham, and Chris Harrison. Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds. In Proc. CHI 2015. ACM, 1935--1944. Google ScholarDigital Library
Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C Weng. 2007. A note on Platts probabilistic outputs for support vector machines. Machine learning 68, 3 (2007), 267--276. Google ScholarDigital Library
Christopher D Manning, Prabhakar Raghavan, and Hinrich Schutze. 2008. Introduction to information retrieval. Vol. 1. Cambridge university press Cambridge. Google Scholar
Douglas L Medin and Marguerite M Schaffer. 1978. Context theory of classification learning. Psychological review 85, 3 (1978), 207.Google Scholar
Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological review 106, 4 (1999), 643.Google Scholar
John Platt and others. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers 10, 3 (1999), 61--74.Google Scholar
Michael Steinbach, George Karypis, Vipin Kumar, and others. 2000. A comparison of document clustering techniques. In KDD workshop on text mining, Vol. 400. Boston, 525--526.Google Scholar
Omer Tamuz, Ce Liu, Serge Belongie, Ohad Shamir, and Adam Tauman Kalai. 2011. Adaptively learning the crowd kernel. In In ICML11.Google Scholar
Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In Proc. ICML 2009. ACM, 1073--1080. Google ScholarDigital Library
Ryen W White, Mikhail Bilenko, and Silviu Cucerzan. Studying the use of popular destinations to enhance web search interaction. In Proc. SIGIR 2007. ACM, 159--166. Google ScholarDigital Library
Ting-Fan Wu, Chih-Jen Lin, and Ruby C Weng. 2004. Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research 5, 975--1005 (2004), 4. Google ScholarDigital Library
Jinfeng Yi, Rong Jin, Anil K Jain, and Shaili Jain. 2012a. Crowdclustering with sparse pairwise labels: A matrix completion approach. In AAAI Workshop on Human Computation, Vol. 2.Google Scholar
Jinfeng Yi, Rong Jin, Shaili Jain, Tianbao Yang, and Anil K Jain. 2012b. Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In Advances in Neural Information Processing Systems. 1772--1780.Google Scholar

Index Terms

Alloy: Clustering with Crowds and Computation
1. Human-centered computing

Recommendations

TiO₂ deposition on AZ31 magnesium alloy using plasma electrolytic oxidation
Special issue on Nanosized Photocatalytic Materials 2013

Plasma electrolytic oxidation (PEO) has been used in the past as a useful surface treatment technique to improve the anticorrosion properties of Mg alloys by forming protective layer. Coatings were prepared on AZ31 magnesium alloy in phosphate ...
Read More
Electroless plating of copper on AZ31 magnesium alloy substrates

Chemical surface preparation for copper film on magnesium alloy by both electroless plating and organic coatings was studied. Organic coating was made by immersing the samples into an organosilicon heat-resisting varnish. A subsequent metallization ...
Read More
Structure of tunnel barrier oxide for Pb-alloy Josephson junctions

The oxide formed on Pb-In and Pb-In-Au alloy films by processes similar to those used to fabricate oxide tunnel barriers for experimental Josephson junction devices has been investigated with transmission electron microscopy and diffraction (TEM/TED), ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
May 2016
6108 pages
ISBN:9781450333627
DOI:10.1145/2858036
General Chairs:
Jofish Kaye
Yahoo
,
Allison Druin
University of Maryland / National Park Service
,
Program Chairs:
Cliff Lampe
University of Michigan
,
Dan Morris
Microsoft
,
Juan Pablo Hourcade
University of Iowa
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Honorable Mention
Author Tags
computer supported cooperative work (CSCW)
database access / informationretrieval
empirical methods
quantitative
worldwide web and hypermedia
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '16 Paper Acceptance Rate565of2,435submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%
More
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 1,247
  Total Downloads
- Downloads (Last 12 months)184
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Alloy: Clustering with Crowds and Computation

CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

TiO₂ deposition on AZ31 magnesium alloy using plasma electrolytic oxidation

Electroless plating of copper on AZ31 magnesium alloy substrates

Structure of tunnel barrier oxide for Pb-alloy Josephson junctions