skip to main content
10.1145/1183579.1183582acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Efficient peer-to-peer semantic overlay networks based on statistical language models

Published:11 November 2006Publication History

ABSTRACT

In this paper we address the query routing problem in peer-to-peer (P2P) information retrieval. Our system builds up on the idea of a Semantic Overlay Network (SON), in which each peer becomes neighbor of a small number of peers, chosen among those that are most similar to it. Peers in the network are represented by a statistical Language Model derived from their local data collections but, instead of using the non-metric Kullback-Leibler divergence to compute the similarity between them, we use a symmetrized and "metricized" related measure, the square root of the Jensen-Shannon divergence, which let us map the problem to a metric search problem. The search strategy exploits the triangular inequality to efficiently prune the search space and relies on a priority queue to visit the most promising peers first. To keep communications costs low and to perform an efficient comparison between Language Models, we devise a compression technique that builds on Bloom-filters and histograms and we provide error bounds for the approximation and a cost analysis for the algorithms used to build and maintain the SON.

References

  1. K. Aberer and P. Cudré-Mauroux. Semantic overlay networks. In VLDB Tutorial, page 1367, Aug. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W.-T. Balke, W. Nejdl, W. Siberski, and U. Thaden. DL meets P2P - distributed document retrieval based on classification and content. In ECDL, pages 379--390, Sep. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Batko, C. Gennaro, and P. Zezula. A scalable nearest neighbor search in p2p systems. In DBISP2P, pages 79--92, Aug. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Batko, D. Novk, F. Falchi, and P. Zezula. On scalability of the similarity search in the world of peers. In INFOSCALE, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in P2P search engines. In SIGIR, pages 67--74, Aug. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Broder and M. Mitzenmacher. Network applications of bloom filters: a survey. Internet Mathematics, 1(4):485--509, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR, pages 21--28, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & sons, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Crespo and H. Garcia-Molina. Semantic overlay networks for P2P systems. In AP2PC, pages 1--13, July 2004. Google ScholarGoogle Scholar
  11. C. Doulkeridis, K. Noervaag, and M. Vazirgiannis. Scalable semantic overlay generation for P2P-based digital libraries. In ECDL, Sep. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Doulkeridis, K. Noervaag, and M. Vazirgiannis. The SOWES approach to P2P web search using semantic overlays. In WWW, pages 1027--1028, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. M. Endres and J. E. Schindelin. A new metric for probability distributions. IEEE Trans. Inf. Theory, 49(7):1858--1860, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Falchi, C. Gennaro, and P. Zezula. A content-addressable network for similarity search in metric spaces. In DBISP2P, pages 126--137, Aug. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst., 27(3):261--298, Sep. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Gravano, H. Garcia-Molina, and A.Tomasic. GlOSS: Text-source discovery over the internet. ACM Trans. Database Syst., 24(2):229--264, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces. ACM Trans. Database Syst., 28(4):517--580, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Kalnis, W. S. Ng, B. C. Ooi, and K.-L. Tan. Answering similarity queries in peer-to-peer networks. Inf. Syst., 31(1):57--72, Mar. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. C. König and G. Weikum. Automatic tuning of data synopses. Inf. Syst., 28(1-2):85--109, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. Liu and W. B. Croft. Statistical language modeling for information retrieval. Annual Review of Information Science and Technology, 39:3--31, 2005.Google ScholarGoogle Scholar
  21. A. Löser, C. Tempich, B. Quilitz, W.-T. Balke, S. Staab, and W. Nejdl. Searching dynamic communities with personal indexes. In ISWC, pages 491--505, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Lu and J. Callan. Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In ECIR, pages 52--66, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Lu and J. Callan. User modeling for full-text federated search in peer-to-peer networks. In SIGIR, aug 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Lu and J. P. Callan. Content-based retrieval in hybrid peer-to-peer networks. In CIKM, pages 199--206, Nov. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Mahlmann and C. Schindelhauer. Peer-to-peer networks based on random transformations of connected regular undirected graphs. In SPAA, pages 155--164, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Mahlmann and C. Schindelhauer. Distributed random digraph transformations for peer-to-peer networks. In SPAA, Aug. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. Meng, C. T. Yu, and K.-L. Liu. Building efficient and effective metasearch engines. ACM Comput. Surv., 34(1):48--89, Mar. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Michel, M. Bender, P. Triantafillou, and G. Weikum. Iqn routing: Integrating quality and novelty in P2P querying and ranking. In EDBT, pages 149--166, Mar. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Nottelmann and N. Fuhr. Combining CORI and the decision-theoretic approach for advanced resource selection. In ECIR, pages 138--153, Apr. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  30. H. Nottelmann and N. Fuhr. Comparing different architectures for query routing in peer-to-peer networks. In ECIR, pages 253--264, Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. X. Parreira, S. Michel, and G. Weikum. p2pDating: Real life inspired semantic overlay networks for web search. In SIGIR workshop on Heterogeneous and Distributed Information Retrieval, aug 2005.Google ScholarGoogle Scholar
  32. I. Podnar, M. Rajman, T. Luu, F. Klemm, and K. Aberer. Beyond term indexing: A P2P framework for web information retrieval. Informatica, 30(2):153--161, June 2006.Google ScholarGoogle Scholar
  33. R. Steinmetz and K. Wehrle. Peer-to-Peer Systems and Applications. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to information retrieval ACM Trans. Inf. Syst., 22(2):179--214, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient peer-to-peer semantic overlay networks based on statistical language models

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                P2PIR '06: Proceedings of the international workshop on Information retrieval in peer-to-peer networks
                November 2006
                66 pages
                ISBN:1595935274
                DOI:10.1145/1183579

                Copyright © 2006 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 11 November 2006

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Upcoming Conference

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader