skip to main content
article
Free Access

Learning bilinear model for matching queries and documents

Published:01 January 2013Publication History
Skip Abstract Section

Abstract

The task of matching data from two heterogeneous domains naturally arises in various areas such as web search, collaborative filtering, and drug design. In web search, existing work has designed relevance models to match queries and documents by exploiting either user clicks or content of queries and documents. To the best of our knowledge, however, there has been little work on principled approaches to leveraging both clicks and content to learn a matching model for search. In this paper, we propose a framework for learning to match heterogeneous objects. The framework learns two linear mappings for two objects respectively, and matches them via the dot product of their images after mapping. Moreover, when different regularizations are enforced, the framework renders a rich family of matching models. With orthonormal constraints on mapping functions, the framework subsumes Partial Least Squares (PLS) as a special case. Alternatively, with a l1+l2 regularization, we obtain a new model called Regularized Mapping to Latent Structures (RMLS). RMLS enjoys many advantages over PLS, including lower time complexity and easy parallelization. To further understand the matching framework, we conduct generalization analysis and apply the result to both PLS and RMLS. We apply the framework to web search and implement both PLS and RMLS using a click-through bipartite with metadata representing features of queries and documents. We test the efficacy and scalability of RMLS and PLS on large scale web search problems. The results show that both PLS and RMLS can significantly outperform baseline methods, while RMLS substantially speeds up the learning process.

References

  1. J. Abernethy, F. Bach, T. Evgeniou, and J.P. Vert. A new approach to collaborative filtering: Operator estimation with spectral regularization. JMLR '09, 10:803-826, 2009. Google ScholarGoogle Scholar
  2. S. Agarwal and P. Niyogi. Stability and generalization of bipartite ranking algorithms. In COLT'05, pages 32-47, 2005. Google ScholarGoogle Scholar
  3. F. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS'08, pages 105-112, 2008.Google ScholarGoogle Scholar
  4. F. Bach, G. Lanckriet, and M. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In ICML'04, 2004. Google ScholarGoogle Scholar
  5. R. Baeza-Yates and A. Tiberi. Extracting semantic relations from query logs. In SIGKDD'07, pages 76-85, 2007. Google ScholarGoogle Scholar
  6. B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and K. Weinberger. Supervised semantic indexing. In CIKM'09, pages 187-196, 2009. Google ScholarGoogle Scholar
  7. P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. JMLR'02, 3:463-482, 2002. Google ScholarGoogle Scholar
  8. C. Burges, R. Ragno, and Q. Le. Learning to rank with nonsmooth cost functions. In NIPS'06, pages 395-402. 2006.Google ScholarGoogle Scholar
  9. Y. Cao, J. Xu, T.Y. Liu, H. Li, Y. Huang, and H.W. Hon. Adapting ranking svm to document retrieval. In SIGIR '06, pages 186-193, 2006. Google ScholarGoogle Scholar
  10. Z. Cao, T. Qin, T.Y. Liu, M.F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In ICML '07, pages 129-136, 2007. Google ScholarGoogle Scholar
  11. T.Q. Chen, W.N. Zhang, Q.X. Lu, K.L. Chen, Z. Zhao, and Y. Yu. Svdfeature: A toolkit for feature-based collaborative filtering. JMLR'12, 13, 2012. Google ScholarGoogle Scholar
  12. W. Chen, T.Y. Liu, and Z. Ma. Two-layer generalization analysis for ranking using rademacher average. NIPS'10, 23:370-378, 2010.Google ScholarGoogle Scholar
  13. C. Cortes. Invited talk: Can learning kernels help performance? In ICML'09, page 161, 2009. Google ScholarGoogle Scholar
  14. K. Crammer and Y. Singer. Pranking with ranking. In NIPS'01, pages 641-647, 2001.Google ScholarGoogle Scholar
  15. N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR'07, pages 239-246, 2007. Google ScholarGoogle Scholar
  16. N. Cristianini, J. Shawe-taylor, A. Elisseeff, and J. Kandola. On kernel-target alignment. In NIPS'01, 2001.Google ScholarGoogle Scholar
  17. J.V. Davis, B. Kulis, P. Jain, S. Sra, and I.S. Dhillon. Information-theoretic metric learning. In ICML'07, page 216, 2007. Google ScholarGoogle Scholar
  18. S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. JASIS'90, 41:391-407, 1990.Google ScholarGoogle Scholar
  19. J.H. Friedman. Greedy function approximation: a gradient boosting machine. Ann. Statist, 29(5): 1189-1232, 2001.Google ScholarGoogle Scholar
  20. J. Gao, K. Toutanova, and W. Yih. Clickthrough-based latent semantic models for web search. In SIGIR'11, pages 675-684, 2011. Google ScholarGoogle Scholar
  21. D. Grangier and S. Bengio. A discriminative kernel-based model to rank images from text queries. IEEE Transactions on PAMI, 30(8):1371-1384, 2008. Google ScholarGoogle Scholar
  22. D.R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639-2664, 2004. Google ScholarGoogle Scholar
  23. R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. NIPS'99, pages 115-132, 1999.Google ScholarGoogle Scholar
  24. T. Hertz, A. Bar-Hillel, and D. Weinshall. Boosting margin based distance functions for clustering. In ICML'04, page 50, 2004. Google ScholarGoogle Scholar
  25. T. Hofmann. Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst., 22:89-115, 2004. Google ScholarGoogle Scholar
  26. S. CH. Hoi, W. Liu, M.R. Lyu, and W.Y. Ma. Learning distance metrics with contextual constraints for image retrieval. In CVPR'06, volume 2, pages 2072-2078, 2006. Google ScholarGoogle Scholar
  27. S. JacobGoldberger and R. GeoffHinton. Neighbourhood components analysis. NIPS'04, 2004.Google ScholarGoogle Scholar
  28. K. Jarvelin and J. Kekalainen. Ir evaluation methods for retrieving highly relevant documents. In SIGIR'00, pages 41-48, 2000. Google ScholarGoogle Scholar
  29. T. Joachims. Optimizing search engines using clickthrough data. In KDD '02, pages 133-142, 2002. Google ScholarGoogle Scholar
  30. I.T. Jolliffe. Principal Component Analysis. Springer verlag, 2002.Google ScholarGoogle Scholar
  31. G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semi-definite programming. In ICML'02, pages 323-330, 2002. Google ScholarGoogle Scholar
  32. H. Li. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies, 4(1):1-113, 2011.Google ScholarGoogle Scholar
  33. T.Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval , 3(3):225-331, 2009. Google ScholarGoogle Scholar
  34. H. Ma, H.X. Yang, I. King, and M. R. Lyu. Learning latent semantic relations from clickthrough data for query suggestion. In CIKM'08, pages 709-718, 2008. Google ScholarGoogle Scholar
  35. C. Micchelli and M. Pontil. Learning the kernel function via regularization. JMLR'05, 6:1099- 1125, 2005. Google ScholarGoogle Scholar
  36. C.S. Ong, A.J. Smola, and R.C. Williamson. Learning the kernel with hyperkernels. JMLR '05, 6: 1043-1071, 2005. Google ScholarGoogle Scholar
  37. J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In SIGIR'98, pages 275-281, 1998. Google ScholarGoogle Scholar
  38. S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC, 1994.Google ScholarGoogle Scholar
  39. R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. Subspace, Latent Structure and Feature Selection, pages 34-51, 2006. Google ScholarGoogle Scholar
  40. C. Rudin, C. Cortes, M. Mohri, and R. E. Schapire. Margin-based ranking meets boosting in the middle. In COLT'05, pages 63-78, 2005. Google ScholarGoogle Scholar
  41. G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA, 1986. Google ScholarGoogle Scholar
  42. P.J. Schreier. A unifying discussion of correlation analysis for complex random vectors. Signal Processing, IEEE Transactions on, 56(4):1327-1336, 2008. Google ScholarGoogle Scholar
  43. M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. NIPS'03, 2003.Google ScholarGoogle Scholar
  44. N. Srebro, J.D.M. Rennie, and T. Jaakkola. Maximum-margin matrix factorization. NIPS'05, pages 1329-1336, 2005.Google ScholarGoogle Scholar
  45. M. Sugiyama. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. JMLR'07, 8:1027-1061, 2007. Google ScholarGoogle Scholar
  46. M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In ICML'09, page 134, 2009. Google ScholarGoogle Scholar
  47. J.A. Wegelin. A survey of partial least squares (pls) methods, with emphasis on the two-block case. Technical Report, No.371, Seattle: Department of Statistics, Univ. of Wash., 2000.Google ScholarGoogle Scholar
  48. K.Q. Weinberger and L.K. Saul. Distance metric learning for large margin nearest neighbor classification. JMLR'09, 10:207-244, 2009. Google ScholarGoogle Scholar
  49. W. Wu, J. Xu, H. Li, and O. Satoshi. Learning a robust relevance model for search using kernel methods. JMLR'11, 12:1429-1458, 2011. Google ScholarGoogle Scholar
  50. E.P. Xing, A.Y. Ng, M.I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. NIPS'03, pages 521-528, 2003.Google ScholarGoogle Scholar
  51. J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In SIGIR' 07, pages 391-398, 2007. Google ScholarGoogle Scholar
  52. J. Xu, H. Li, and Z.L. Zhong. Relevance ranking using kernels. In AIRS '10, 2010.Google ScholarGoogle Scholar
  53. L. Yang, R. Jin, R. Sukthankar, and Y. Liu. An efficient algorithm for local distance metric learning. In AAAI'06, page 543, 2006. Google ScholarGoogle Scholar
  54. Y. Ying, K. Huang, and C. Campbell. Sparse Metric Learning via Smooth Optimization. NIPS'09, pages 521-528, 2009.Google ScholarGoogle Scholar
  55. C.X. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179-214, 2004. Google ScholarGoogle Scholar

Index Terms

  1. Learning bilinear model for matching queries and documents
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader