skip to main content
research-article
Open Access

Crowdsourcing Human Annotation on Web Page Structure: Infrastructure Design and Behavior-Based Quality Control

Published:25 April 2016Publication History
Skip Abstract Section

Abstract

Parsing the semantic structure of a web page is a key component of web information extraction. Successful extraction algorithms usually require large-scale training and evaluation datasets, which are difficult to acquire. Recently, crowdsourcing has proven to be an effective method of collecting large-scale training data in domains that do not require much domain knowledge. For more complex domains, researchers have proposed sophisticated quality control mechanisms to replicate tasks in parallel or sequential ways and then aggregate responses from multiple workers. Conventional annotation integration methods often put more trust in the workers with high historical performance; thus, they are called performance-based methods. Recently, Rzeszotarski and Kittur have demonstrated that behavioral features are also highly correlated with annotation quality in several crowdsourcing applications. In this article, we present a new crowdsourcing system, called Wernicke, to provide annotations for web information extraction. Wernicke collects a wide set of behavioral features and, based on these features, predicts annotation quality for a challenging task domain: annotating web page structure. We evaluate the effectiveness of quality control using behavioral features through a case study where 32 workers annotate 200 Q&A web pages from five popular websites. In doing so, we discover several things: (1) Many behavioral features are significant predictors for crowdsourcing quality. (2) The behavioral-feature-based method outperforms performance-based methods in recall prediction, while performing equally with precision prediction. In addition, using behavioral features is less vulnerable to the cold-start problem, and the corresponding prediction model is more generalizable for predicting recall than precision for cross-website quality analysis. (3) One can effectively combine workers’ behavioral information and historical performance information to further reduce prediction errors.

References

  1. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction for the web. In International Joint Conference on Artificial Intelligence, Vol. 7. 2670--2676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell, and Katrina Panovich. 2010. Soylent: A word processor with a crowd inside. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology. ACM, 313--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chia Hui Chang, Mohammed Kayed, Moheb R. Girgis, and Khaled F. Shaalan. 2006. A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering 18, 10 (2006), 1411--1428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peng Dai, Christopher H. Lin, and Daniel S. Weld. 2013. POMDP-based control of workflows for crowdsourcing. Artificial Intelligence 202 (2013), 52--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peng Dai, Jeffrey Rzeszotarski, Praveen Paritosh, and Ed Chi. 2015. And now for something completely different: Improving crowdsourcing workflows with micro-diversions. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW’’15). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexander Philip Dawid and Allan M. Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics 28, 1 (1979), 20--28.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ofer Dekel and Ohad Shamir. 2009. Vox populi: Collecting high-quality labels from a crowd. In Proceedings of the 22nd Annual Conference on Learning Theory.Google ScholarGoogle Scholar
  8. Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are your participants gaming the system? Screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2399--2402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Qi Guo, Haojian Jin, Dmitry Lagun, Shuai Yuan, and Eugene Agichtein. 2013. Mining touch interaction data on mobile devices to predict web search result relevance. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 153--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shuguang Han, Zhen Yue, and Daqing He. 2015. Understanding and supporting cross-device web search for exploratory tasks with mobile touch interactions. ACM Transactions on Information Systems (TOIS) 33, 4 (2015), 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Andrew Hogue and David Karger. 2005. Thresher: Automating the unwrapping of semantic content from the world wide web. In Proceedings of the 14th International Conference on World Wide Web. ACM, 86--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jeff Howe. 2006. The rise of crowdsourcing. Wired Magazine 14, 6 (2006), 1--4.Google ScholarGoogle Scholar
  13. Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In Proceedings of th 8th IEEE International Conference on Data Mining, 2008 (ICDM’08). IEEE, 263--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jeff Huang, Ryen W. White, and Susan Dumais. 2011. No clicks, no problem: Using cursor movements to understand and improve search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1225--1234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. David Huynh, Stefano Mazzocchi, and David Karger. 2005. Piggy bank: Experience the semantic web inside your web browser. In The Semantic Web (ISWC’05). Springer, 413--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. David F. Huynh, Robert C. Miller, and David R. Karger. 2006. Enabling web browsers to augment web sites’ filtering and sorting functionalities. In Proceedings of the 19th Annual ACM Symposium on User Interface Software and Technology. ACM, 125--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Rob J. Hyndman and Anne B. Koehler. 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22, 4 (2006), 679--688.Google ScholarGoogle ScholarCross RefCross Ref
  18. Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on Amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, 64--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. N. Joanes and C. A. Gill. 1998. Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician) 47, 1 (1998), 183--189.Google ScholarGoogle ScholarCross RefCross Ref
  20. Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 453--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christian Kohlschütter, Peter Fankhauser, and Wolfgang Nejdl. 2010. Boilerplate detection using shallow text features. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 441--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hongwei Li and Bin Yu. 2014. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086 (2014).Google ScholarGoogle Scholar
  23. Greg Little, Lydia B. Chilton, Max Goldman, and Robert C. Miller. 2009. Turkit: Tools for iterative tasks on mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, 29--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision 77, 1--3 (2008), 157--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jeffrey Rzeszotarski and Aniket Kittur. 2012. CrowdScape: Interactively visualizing user behavior and output. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology. ACM, 55--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jeffrey M. Rzeszotarski and Aniket Kittur. 2011. Instrumenting the crowd: Using implicit behavioral measures to predict task performance. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. ACM, 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xinying Song, Jing Liu, Yunbo Cao, Chin-Yew Lin, and Hsiao-Wuen Hon. 2010. Automatic extraction of web data records containing user-generated content. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 39--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. BRAT: A web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 102--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Fei Wu, Raphael Hoffmann, and Daniel S. Weld. 2008. Information extraction from wikipedia: Moving down the long tail. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 731--739. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fei Wu and Daniel S. Weld. 2010. Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xing Yi, Liangjie Hong, Erheng Zhong, Nanthan Nan Liu, and Suju Rajan. 2014. Beyond clicks: Dwell time for personalization. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 113--120. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Crowdsourcing Human Annotation on Web Page Structure: Infrastructure Design and Behavior-Based Quality Control

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 7, Issue 4
      Special Issue on Crowd in Intelligent Systems, Research Note/Short Paper and Regular Papers
      July 2016
      498 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2906145
      • Editor:
      • Yu Zheng
      Issue’s Table of Contents

      Copyright © 2016 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 April 2016
      • Revised: 1 December 2015
      • Accepted: 1 December 2015
      • Received: 1 February 2015
      Published in tist Volume 7, Issue 4

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader