Abstract
An online community consists of a group of users who share a common interest, background, or experience, and their collective goal is to contribute toward the welfare of the community members. Several websites allow their users to create and manage niche communities, such as Yahoo! Groups, Facebook Groups, Google+ Circles, and WebMD Forums. These community services also exist within enterprises, such as IBM Connections. Question answering within these communities enables their members to exchange knowledge and information with other community members. However, the onus of finding the right community for question asking lies with an individual user. The overwhelming number of communities necessitates the need for a good question routing strategy so that new questions get routed to an appropriately focused community and thus get resolved in a reasonable time frame.
In this article, we consider the novel problem of routing a question to the right community and propose a framework for selecting and ranking the relevant communities for a question. We propose several novel features for modeling the three main entities of the system: questions, users, and communities. We propose features such as language attributes, inclination to respond, user familiarity, and difficulty of a question; based on these features, we propose similarity metrics between the routed question and the system entities. We introduce a Cutoff-Aggregation (CA) algorithm that aggregates the entity similarity within a community to compute that community's relevance. We introduce two k-nearest-neighbor (knn) algorithms that are a natural instantiation of the CA algorithm, which are computationally efficient and evaluate several ranking algorithms over the aggregate similarity scores computed by the two knn algorithms. We propose clustering techniques to speed up our recommendation framework and show how pipelining can improve the model performance. We demonstrate the effectiveness of our framework on two large real-world datasets.
- Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawlat, Gautam Das, and Cong Yu. 2009. Group recommendation: Semantics and efficiency. Proceedings of the VLDB Endowment 2, 1 (Aug. 2009), 754--765. Google ScholarDigital Library
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2001. Latent Dirichlet allocation. In Advances in Neural Information Processing Systems 14 Neural Information Processing Systems: Natural and Synthetic (NIPS'01). MIT Press, 601--608.Google Scholar
- Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert Endre Tarjan. 1972. Linear time bounds for median computations. In Proceedings of the 4th Annual ACM Symposium on Theory of Computing. ACM, 119--124. Google ScholarDigital Library
- Mohamed Bouguessa, Benoît Dumoulin, and Shengrui Wang. 2008. Identifying authoritative actors in question-answering forums: The case of Yahoo! answers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08). ACM, 866--874. Google ScholarDigital Library
- Yunbo Cao, Huizhong Duan, Chin yew Lin, Yong Yu, and Hsiao wuen Hon. 2008. Recommending questions using the mdl-based tree cut model. In Proceeding of the 17th International Conference on World Wide Web (WWW'08). ACM, 81--90. Google ScholarDigital Library
- Shuo Chang and Aditya Pal. 2013. Routing questions for collaborative answering in community question answering. In Advances in Social Networks Analysis and Mining (ASONAM'13). ACM, 494--501. Google ScholarDigital Library
- Kenneth Ward Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In 2nd Applied Natural Language Processing Conference (ANLP'88). ACL, 136--143. Google ScholarDigital Library
- Don Coppersmith, Lisa Fleischer, and Atri Rudra. 2006. Ordering by weighted number of wins gives a good ranking for weighted tournaments. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'06). ACM, 776--782. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
- Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2004. Kernel k-means: Spectral clustering and normalized cuts. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). ACM, 551--556. Google ScholarDigital Library
- Ronald Fagin, Ravi Kumar, and D. Sivakumar. 2003. Comparing top k lists. SIAM Journal of Discrete Mathematics 17, 1 (2003), 134--160. Google ScholarDigital Library
- Mike Gartrell, Xinyu Xing, Qin Lv, Aaron Beach, Richard Han, Shivakant Mishra, and Karim Seada. 2010. Enhancing group recommendation by incorporating social relationship interactions. In Proceedings of the 2010 International ACM SIGGROUP Conference on Supporting Group Work (GROUP'10). ACM, 97--106. Google ScholarDigital Library
- Jagadeesh Gorla, Neal Lathia, Stephen Robertson, and Jun Wang. 2013. Probabilistic group recommendation via information matching. In Proceedings of the 22nd International World Wide Web Conference, (WWW'13). 495--504. Google ScholarDigital Library
- Michael Grant and Stephen Boyd. 2008. Graph implementations for nonsmooth convex programs. In Recent Advances in Learning and Control, V. Blondel, S. Boyd, and H. Kimura (Eds.). Springer-Verlag Limited, 95--110. http://stanford.edu/ boyd/graph_dcp.html.Google Scholar
- Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu. 2008. Tapping on the potential of q&a community by recommending answer providers. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, 921--930. Google ScholarDigital Library
- Ralf Herbrich, Tom Minka, and Thore Graepel. 2007. TrueSkillTM: A Bayesian skill rating system. In Advances in Neural Information Processing Systems 19 (NIPS'06). MIT Press, 569--576.Google Scholar
- Liangjie Hong, Ron Bekkerman, Joseph Adler, and Brian D. Davison. 2012. Learning to rank social update streams. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'12). ACM, 651--660. Google ScholarDigital Library
- Matthew A. Jaro. 1989. Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida. Journal of the American Statistics Association 84, 406 (1989), 414--420.Google ScholarCross Ref
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). ACM, 133--142. Google ScholarDigital Library
- Pawel Jurczyk and Eugene Agichtein. 2007. Discovering authorities in question answer communities by using link analysis. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM, 919--922. Google ScholarDigital Library
- Ritwik Kumar, Arunava Banerjee, Baba C. Vemuri, and Hanspeter Pfister. 2011. Maximizing all margins: Pushing face recognition with kernel plurality. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11). IEEE, 2375--2382. Google ScholarDigital Library
- Liang-Cheng Lai and Hung-Yu Kao. 2012. Question routing by modeling user expertise and activity in cQA services. In The 26th Annual Conference of the Japanese Society for Artificial Intelligence.Google Scholar
- Baichuan Li, Irwin King, and Michael R. Lyu. 2011. Question routing in community question answering: Putting category in its place. In Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM'11). ACM, 2041--2044. Google ScholarDigital Library
- Wei Li, Charles Zhang, and Songlin Hu. 2010. G-Finder: Routing programming questions closer to the experts. In ACM Sigplan Notices, Vol. 45. ACM, 62--73. Google ScholarDigital Library
- Jing Liu, Young-In Song, and Chin-Yew Lin. 2011. Competition-based user expertise score estimation. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM, New York, NY, 425--434. Google ScholarDigital Library
- Jing Liu, Quan Wang, Chin-Yew Lin, and Hsiao-Wuen Hon. 2013. Question Difficulty Estimation in Community Question Answering Services. In EMNLP. ACL, 85--90.Google Scholar
- Qiaoling Liu and Eugene Agichtein. 2011. Modeling answerer behavior in collaborative question answering systems. In ECIR (Lecture Notes in Computer Science), Vol. 6611. Springer, 67--79. Google ScholarDigital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY. Google ScholarDigital Library
- George Lann Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions I. Mathematical Programming 14, 1 (1978), 265--294.Google ScholarDigital Library
- Mark O'Connor, Dan Cosley, Joseph A. Konstan, and John Riedl. 2001. PolyLens: A recommender system for groups of user. In Proceedings of the 7th Conference on European Conference on Computer Supported Cooperative Work (ECSCW'01). Kluwer Academic, 199--218. Google ScholarDigital Library
- Aditya Pal and Scott Counts. 2011. Identifying topical authorities in microblogs. In Proceedings of the 4th International Conference on Web Search and Web Data Mining (WSDM'11). ACM, 45--54. Google ScholarDigital Library
- Aditya Pal, F. Maxwell Harper, and Joseph A. Konstan. 2012. Exploring question selection bias to identify experts and potential experts in community question answering. ACM Transactions on Information Systems 30, 2 (2012), 10:1--10:28. Google ScholarDigital Library
- Aditya Pal and Joseph A. Konstan. 2010. Expert identification in community question answering: Exploring question selection bias. In Proceedings of the 19th ACM Conference on Information and Knowledge Management, (CIKM). ACM, 1505--1508. Google ScholarDigital Library
- Aditya Pal, Fei Wang, Michelle X. Zhou, Jeffrey Nichols, and Barton A. Smith. 2013. Question routing to user communities. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management (CIKM'13). ACM, New York, NY, 2357--2362. Google ScholarDigital Library
- Jenny Preece and Diane Maloney Krichmar. 2005. Online communities: Design, theory and practice. Journal of Computer Mediated Communication 10, 4 (2005).Google ScholarCross Ref
- David J. Rogers and Taffee T. Tanimoto. 1960. A computer program for classifying plants. Science 132, 3434 (Oct. 1960), 1115--1118.Google ScholarCross Ref
- Lee Sproull and Manuel Arriaga. 2007. Online communities. In The Handbook of Computer Networks, H. Bidgoli (Ed.). Wiley Publishing.Google Scholar
- Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining. Addison-Wesley Longman, Boston, MA.Google Scholar
- Mao Ye, Xingjie Liu, and Wang-Chien Lee. 2012. Exploring social influence for recommendation: A generative model approach. In The 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 671--680. Google ScholarDigital Library
- Dell Zhang and Wee Sun Lee. 2003. Question classification using support vector machines. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'03). ACM, 26--32.Google ScholarCross Ref
- Jun Zhang, Mark S. Ackerman, and Lada Adamic. 2007. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM, 221--230. Google ScholarDigital Library
- Yanhong Zhou, Gao Cong, Bin Cui, Christian S. Jensen, and Junjie Yao. 2009. Routing questions to the right users in online communities. In Proceedings of the 25th International Conference on Data Engineering (ICDE'09). IEEE, 700--711. Google ScholarDigital Library
Index Terms
- Metrics and Algorithms for Routing Questions to User Communities
Recommendations
Question routing to user communities
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementAn online community consists of a group of users who share a common interest, background, or experience and their collective goal is to contribute towards the welfare of the community members. Question answering is an important feature that enables ...
Increasing Activity in Enterprise Online Communities Using Content Recommendation
Although online communities have become popular both on the web and within enterprises, many of them often experience low levels of activity and engagement from their members. Previous studies identified the important role of community leaders in ...
Clinical Questions in Online Health Communities: The Case of "See your doctor" Threads
CSCW '15: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social ComputingOnline health communities are known to provide psychosocial support. However, concerns for misinformation being shared around clinical information persist. An existing practice addressing this concern includes monitoring and, as needed, discouraging ...
Comments