ABSTRACT
Crowdsourced clustering approaches present a promising way to harness deep semantic knowledge for clustering complex information. However, existing approaches have difficulties supporting the global context needed for workers to generate meaningful categories, and are costly because all items require human judgments. We introduce Alloy, a hybrid approach that combines the richness of human judgments with the power of machine algorithms. Alloy supports greater global context through a new "sample and search" crowd pattern which changes the crowd's task from classifying a fixed subset of items to actively sampling and querying the entire dataset. It also improves efficiency through a two phase process in which crowds provide examples to help a machine cluster the head of the distribution, then classify low-confidence examples in the tail. To accomplish this, Alloy introduces a modular "cast and gather" approach which leverages a machine learning backbone to stitch together different types of judgment tasks.
- Paul Andre, Aniket Kittur, and Steven P Dow. Crowd synthesis: Extracting categories and clusters from complex data. In Proc. CSCW 2014. Google ScholarDigital Library
- Richard Ernest Bellman. 2003. Dynamic Programming. Dover Publications, Incorporated.Google ScholarDigital Library
- Michael S Bernstein, Greg Little, Robert C Miller, Bjorn Hartmann, Mark S Ackerman, David R Karger, David Crowell, and Katrina Panovich. Soylent: a word processor with a crowd inside. In Proc. UIST 2010. ACM, 313--322. Google ScholarDigital Library
- David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022. Google Scholar
- Jonathan Bragg, Daniel S Weld, and others. 2013. Crowdsourcing multi-label classification for taxonomy creation. In First AAAI conference on human computation and crowdsourcing.Google ScholarCross Ref
- Allison June-Barlow Chaney and David M Blei. 2012. Visualizing Topic Models.. In ICWSM.Google Scholar
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM 2011 TIST 2, 3 (2011), 27. Google ScholarDigital Library
- Lydia B Chilton, Juho Kim, Paul Andre, Felicia Cordeiro, James A Landay, Daniel S Weld, Steven P Dow, Robert C Miller, and Haoqi Zhang. 2014. Frenzy: Collaborative data organization for creating conference sessions. In Proc. CHI 2014. ACM, 1255--1264. Google ScholarDigital Library
- Lydia B Chilton, Greg Little, Darren Edge, Daniel S Weld, and James A Landay. 2013. Cascade: Crowdsourcing taxonomy creation. In Proc. CHI 2013. ACM, 1999--2008. Google ScholarDigital Library
- Jason Chuang, Christopher D Manning, and Jeffrey Heer. 2012. Termite: Visualization techniques for assessing textual topic models. In Proc. of the International Working Conference on Advanced Visual Interfaces. ACM, 74--77. Google ScholarDigital Library
- Jason Chuang, Daniel Ramage, Christopher Manning, and Jeffrey Heer. Interpretation and trust: Designing model-driven visualizations for text analysis. In Proc. CHI 2012. ACM, 443--452. Google ScholarDigital Library
- Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. JASIS 41, 6 (1990), 391--407.Google ScholarCross Ref
- M. M. Schaffer D.L. Medin. 1978. Context theory of classification learning. Psychological review 85, 3 (1978), 207.Google Scholar
- Jerry Alan Fails and Dan R Olsen Jr. Interactive machine learning. In Proc. IUI 2003. ACM, 39--45.Google ScholarDigital Library
- John A Hartigan. 1975. Clustering algorithms. John Wiley & Sons, Inc. Google ScholarDigital Library
- Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Machine learning 95, 3 (2014), 423--469. Google ScholarDigital Library
- Anil K Jain and Richard C Dubes. 1988. Algorithms for clustering data. Prentice-Hall, Inc. Google ScholarDigital Library
- A. K. Jain, M. N. Murty, and P. J. Flynn. 1999. Data clustering: a review. ACM Comput. Surv. 31 (1999), 3. Google ScholarDigital Library
- Bernard J Jansen, Danielle L Booth, and Amanda Spink. 2009. Patterns of Query Reformulation During Web Searching. Journal of the American society for information science and technology 60, 7 (2009), 1358--1371. Google ScholarDigital Library
- Karen Sprck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1972), 11--21.Google ScholarCross Ref
- Aniket Kittur, Ed H Chi, and Bongwon Suh. Crowdsourcing user studies with Mechanical Turk. In Proc. CHI 2008. ACM, 453--456. Google ScholarDigital Library
- Aniket Kittur, Andrew M Peters, Abdigani Diriye, Trupti Telang, and Michael R Bove. Costs and benefits of structured information foraging. In Proc. CHI 2013. ACM, 2989--2998. Google ScholarDigital Library
- Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. Crowdforge: Crowdsourcing complex work. In Proc UIST 2011. ACM, 43--52. Google ScholarDigital Library
- Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. 2009. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD (2009). Google ScholarDigital Library
- Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. Structured Labeling for Facilitating Concept Evolution in Machine Learning. In Proc. CHI 2014. 3075--3084. Google ScholarDigital Library
- Gierad Laput, Walter S Lasecki, Jason Wiese, Robert Xiao, Jeffrey P Bigham, and Chris Harrison. Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds. In Proc. CHI 2015. ACM, 1935--1944. Google ScholarDigital Library
- Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C Weng. 2007. A note on Platts probabilistic outputs for support vector machines. Machine learning 68, 3 (2007), 267--276. Google ScholarDigital Library
- Christopher D Manning, Prabhakar Raghavan, and Hinrich Schutze. 2008. Introduction to information retrieval. Vol. 1. Cambridge university press Cambridge. Google Scholar
- Douglas L Medin and Marguerite M Schaffer. 1978. Context theory of classification learning. Psychological review 85, 3 (1978), 207.Google Scholar
- Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological review 106, 4 (1999), 643.Google Scholar
- John Platt and others. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers 10, 3 (1999), 61--74.Google Scholar
- Michael Steinbach, George Karypis, Vipin Kumar, and others. 2000. A comparison of document clustering techniques. In KDD workshop on text mining, Vol. 400. Boston, 525--526.Google Scholar
- Omer Tamuz, Ce Liu, Serge Belongie, Ohad Shamir, and Adam Tauman Kalai. 2011. Adaptively learning the crowd kernel. In In ICML11.Google Scholar
- Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In Proc. ICML 2009. ACM, 1073--1080. Google ScholarDigital Library
- Ryen W White, Mikhail Bilenko, and Silviu Cucerzan. Studying the use of popular destinations to enhance web search interaction. In Proc. SIGIR 2007. ACM, 159--166. Google ScholarDigital Library
- Ting-Fan Wu, Chih-Jen Lin, and Ruby C Weng. 2004. Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research 5, 975--1005 (2004), 4. Google ScholarDigital Library
- Jinfeng Yi, Rong Jin, Anil K Jain, and Shaili Jain. 2012a. Crowdclustering with sparse pairwise labels: A matrix completion approach. In AAAI Workshop on Human Computation, Vol. 2.Google Scholar
- Jinfeng Yi, Rong Jin, Shaili Jain, Tianbao Yang, and Anil K Jain. 2012b. Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In Advances in Neural Information Processing Systems. 1772--1780.Google Scholar
Index Terms
- Alloy: Clustering with Crowds and Computation
Recommendations
TiO2 deposition on AZ31 magnesium alloy using plasma electrolytic oxidation
Special issue on Nanosized Photocatalytic Materials 2013Plasma electrolytic oxidation (PEO) has been used in the past as a useful surface treatment technique to improve the anticorrosion properties of Mg alloys by forming protective layer. Coatings were prepared on AZ31 magnesium alloy in phosphate ...
Electroless plating of copper on AZ31 magnesium alloy substrates
Chemical surface preparation for copper film on magnesium alloy by both electroless plating and organic coatings was studied. Organic coating was made by immersing the samples into an organosilicon heat-resisting varnish. A subsequent metallization ...
Structure of tunnel barrier oxide for Pb-alloy Josephson junctions
The oxide formed on Pb-In and Pb-In-Au alloy films by processes similar to those used to fabricate oxide tunnel barriers for experimental Josephson junction devices has been investigated with transmission electron microscopy and diffraction (TEM/TED), ...
Comments