skip to main content
research-article

Metrics and Algorithms for Routing Questions to User Communities

Published:09 March 2015Publication History
Skip Abstract Section

Abstract

An online community consists of a group of users who share a common interest, background, or experience, and their collective goal is to contribute toward the welfare of the community members. Several websites allow their users to create and manage niche communities, such as Yahoo! Groups, Facebook Groups, Google+ Circles, and WebMD Forums. These community services also exist within enterprises, such as IBM Connections. Question answering within these communities enables their members to exchange knowledge and information with other community members. However, the onus of finding the right community for question asking lies with an individual user. The overwhelming number of communities necessitates the need for a good question routing strategy so that new questions get routed to an appropriately focused community and thus get resolved in a reasonable time frame.

In this article, we consider the novel problem of routing a question to the right community and propose a framework for selecting and ranking the relevant communities for a question. We propose several novel features for modeling the three main entities of the system: questions, users, and communities. We propose features such as language attributes, inclination to respond, user familiarity, and difficulty of a question; based on these features, we propose similarity metrics between the routed question and the system entities. We introduce a Cutoff-Aggregation (CA) algorithm that aggregates the entity similarity within a community to compute that community's relevance. We introduce two k-nearest-neighbor (knn) algorithms that are a natural instantiation of the CA algorithm, which are computationally efficient and evaluate several ranking algorithms over the aggregate similarity scores computed by the two knn algorithms. We propose clustering techniques to speed up our recommendation framework and show how pipelining can improve the model performance. We demonstrate the effectiveness of our framework on two large real-world datasets.

References

  1. Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawlat, Gautam Das, and Cong Yu. 2009. Group recommendation: Semantics and efficiency. Proceedings of the VLDB Endowment 2, 1 (Aug. 2009), 754--765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2001. Latent Dirichlet allocation. In Advances in Neural Information Processing Systems 14 Neural Information Processing Systems: Natural and Synthetic (NIPS'01). MIT Press, 601--608.Google ScholarGoogle Scholar
  3. Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert Endre Tarjan. 1972. Linear time bounds for median computations. In Proceedings of the 4th Annual ACM Symposium on Theory of Computing. ACM, 119--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Mohamed Bouguessa, Benoît Dumoulin, and Shengrui Wang. 2008. Identifying authoritative actors in question-answering forums: The case of Yahoo! answers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08). ACM, 866--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yunbo Cao, Huizhong Duan, Chin yew Lin, Yong Yu, and Hsiao wuen Hon. 2008. Recommending questions using the mdl-based tree cut model. In Proceeding of the 17th International Conference on World Wide Web (WWW'08). ACM, 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shuo Chang and Aditya Pal. 2013. Routing questions for collaborative answering in community question answering. In Advances in Social Networks Analysis and Mining (ASONAM'13). ACM, 494--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kenneth Ward Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In 2nd Applied Natural Language Processing Conference (ANLP'88). ACL, 136--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Don Coppersmith, Lisa Fleischer, and Atri Rudra. 2006. Ordering by weighted number of wins gives a good ranking for weighted tournaments. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'06). ACM, 776--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of ACM 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2004. Kernel k-means: Spectral clustering and normalized cuts. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). ACM, 551--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ronald Fagin, Ravi Kumar, and D. Sivakumar. 2003. Comparing top k lists. SIAM Journal of Discrete Mathematics 17, 1 (2003), 134--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mike Gartrell, Xinyu Xing, Qin Lv, Aaron Beach, Richard Han, Shivakant Mishra, and Karim Seada. 2010. Enhancing group recommendation by incorporating social relationship interactions. In Proceedings of the 2010 International ACM SIGGROUP Conference on Supporting Group Work (GROUP'10). ACM, 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jagadeesh Gorla, Neal Lathia, Stephen Robertson, and Jun Wang. 2013. Probabilistic group recommendation via information matching. In Proceedings of the 22nd International World Wide Web Conference, (WWW'13). 495--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michael Grant and Stephen Boyd. 2008. Graph implementations for nonsmooth convex programs. In Recent Advances in Learning and Control, V. Blondel, S. Boyd, and H. Kimura (Eds.). Springer-Verlag Limited, 95--110. http://stanford.edu/ boyd/graph_dcp.html.Google ScholarGoogle Scholar
  15. Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu. 2008. Tapping on the potential of q&a community by recommending answer providers. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, 921--930. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ralf Herbrich, Tom Minka, and Thore Graepel. 2007. TrueSkillTM: A Bayesian skill rating system. In Advances in Neural Information Processing Systems 19 (NIPS'06). MIT Press, 569--576.Google ScholarGoogle Scholar
  17. Liangjie Hong, Ron Bekkerman, Joseph Adler, and Brian D. Davison. 2012. Learning to rank social update streams. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'12). ACM, 651--660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Matthew A. Jaro. 1989. Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida. Journal of the American Statistics Association 84, 406 (1989), 414--420.Google ScholarGoogle ScholarCross RefCross Ref
  19. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). ACM, 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pawel Jurczyk and Eugene Agichtein. 2007. Discovering authorities in question answer communities by using link analysis. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM, 919--922. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ritwik Kumar, Arunava Banerjee, Baba C. Vemuri, and Hanspeter Pfister. 2011. Maximizing all margins: Pushing face recognition with kernel plurality. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11). IEEE, 2375--2382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liang-Cheng Lai and Hung-Yu Kao. 2012. Question routing by modeling user expertise and activity in cQA services. In The 26th Annual Conference of the Japanese Society for Artificial Intelligence.Google ScholarGoogle Scholar
  23. Baichuan Li, Irwin King, and Michael R. Lyu. 2011. Question routing in community question answering: Putting category in its place. In Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM'11). ACM, 2041--2044. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wei Li, Charles Zhang, and Songlin Hu. 2010. G-Finder: Routing programming questions closer to the experts. In ACM Sigplan Notices, Vol. 45. ACM, 62--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jing Liu, Young-In Song, and Chin-Yew Lin. 2011. Competition-based user expertise score estimation. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM, New York, NY, 425--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jing Liu, Quan Wang, Chin-Yew Lin, and Hsiao-Wuen Hon. 2013. Question Difficulty Estimation in Community Question Answering Services. In EMNLP. ACL, 85--90.Google ScholarGoogle Scholar
  27. Qiaoling Liu and Eugene Agichtein. 2011. Modeling answerer behavior in collaborative question answering systems. In ECIR (Lecture Notes in Computer Science), Vol. 6611. Springer, 67--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. George Lann Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions I. Mathematical Programming 14, 1 (1978), 265--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mark O'Connor, Dan Cosley, Joseph A. Konstan, and John Riedl. 2001. PolyLens: A recommender system for groups of user. In Proceedings of the 7th Conference on European Conference on Computer Supported Cooperative Work (ECSCW'01). Kluwer Academic, 199--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Aditya Pal and Scott Counts. 2011. Identifying topical authorities in microblogs. In Proceedings of the 4th International Conference on Web Search and Web Data Mining (WSDM'11). ACM, 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Aditya Pal, F. Maxwell Harper, and Joseph A. Konstan. 2012. Exploring question selection bias to identify experts and potential experts in community question answering. ACM Transactions on Information Systems 30, 2 (2012), 10:1--10:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Aditya Pal and Joseph A. Konstan. 2010. Expert identification in community question answering: Exploring question selection bias. In Proceedings of the 19th ACM Conference on Information and Knowledge Management, (CIKM). ACM, 1505--1508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Aditya Pal, Fei Wang, Michelle X. Zhou, Jeffrey Nichols, and Barton A. Smith. 2013. Question routing to user communities. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management (CIKM'13). ACM, New York, NY, 2357--2362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jenny Preece and Diane Maloney Krichmar. 2005. Online communities: Design, theory and practice. Journal of Computer Mediated Communication 10, 4 (2005).Google ScholarGoogle ScholarCross RefCross Ref
  36. David J. Rogers and Taffee T. Tanimoto. 1960. A computer program for classifying plants. Science 132, 3434 (Oct. 1960), 1115--1118.Google ScholarGoogle ScholarCross RefCross Ref
  37. Lee Sproull and Manuel Arriaga. 2007. Online communities. In The Handbook of Computer Networks, H. Bidgoli (Ed.). Wiley Publishing.Google ScholarGoogle Scholar
  38. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining. Addison-Wesley Longman, Boston, MA.Google ScholarGoogle Scholar
  39. Mao Ye, Xingjie Liu, and Wang-Chien Lee. 2012. Exploring social influence for recommendation: A generative model approach. In The 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 671--680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Dell Zhang and Wee Sun Lee. 2003. Question classification using support vector machines. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'03). ACM, 26--32.Google ScholarGoogle ScholarCross RefCross Ref
  41. Jun Zhang, Mark S. Ackerman, and Lada Adamic. 2007. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM, 221--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yanhong Zhou, Gao Cong, Bin Cui, Christian S. Jensen, and Junjie Yao. 2009. Routing questions to the right users in online communities. In Proceedings of the 25th International Conference on Data Engineering (ICDE'09). IEEE, 700--711. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Metrics and Algorithms for Routing Questions to User Communities

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Information Systems
          ACM Transactions on Information Systems  Volume 33, Issue 3
          March 2015
          184 pages
          ISSN:1046-8188
          EISSN:1558-2868
          DOI:10.1145/2737814
          Issue’s Table of Contents

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 March 2015
          • Revised: 1 January 2015
          • Accepted: 1 January 2015
          • Received: 1 March 2014
          Published in tois Volume 33, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader