ABSTRACT
In this paper we address the query routing problem in peer-to-peer (P2P) information retrieval. Our system builds up on the idea of a Semantic Overlay Network (SON), in which each peer becomes neighbor of a small number of peers, chosen among those that are most similar to it. Peers in the network are represented by a statistical Language Model derived from their local data collections but, instead of using the non-metric Kullback-Leibler divergence to compute the similarity between them, we use a symmetrized and "metricized" related measure, the square root of the Jensen-Shannon divergence, which let us map the problem to a metric search problem. The search strategy exploits the triangular inequality to efficiently prune the search space and relies on a priority queue to visit the most promising peers first. To keep communications costs low and to perform an efficient comparison between Language Models, we devise a compression technique that builds on Bloom-filters and histograms and we provide error bounds for the approximation and a cost analysis for the algorithms used to build and maintain the SON.
- K. Aberer and P. Cudré-Mauroux. Semantic overlay networks. In VLDB Tutorial, page 1367, Aug. 2005. Google ScholarDigital Library
- W.-T. Balke, W. Nejdl, W. Siberski, and U. Thaden. DL meets P2P - distributed document retrieval based on classification and content. In ECDL, pages 379--390, Sep. 2005. Google ScholarDigital Library
- M. Batko, C. Gennaro, and P. Zezula. A scalable nearest neighbor search in p2p systems. In DBISP2P, pages 79--92, Aug. 2004. Google ScholarDigital Library
- M. Batko, D. Novk, F. Falchi, and P. Zezula. On scalability of the similarity search in the world of peers. In INFOSCALE, May 2006. Google ScholarDigital Library
- M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in P2P search engines. In SIGIR, pages 67--74, Aug. 2005. Google ScholarDigital Library
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, July 1970. Google ScholarDigital Library
- A. Broder and M. Mitzenmacher. Network applications of bloom filters: a survey. Internet Mathematics, 1(4):485--509, 2003.Google ScholarCross Ref
- J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR, pages 21--28, July 1995. Google ScholarDigital Library
- T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & sons, 1991. Google ScholarDigital Library
- A. Crespo and H. Garcia-Molina. Semantic overlay networks for P2P systems. In AP2PC, pages 1--13, July 2004. Google Scholar
- C. Doulkeridis, K. Noervaag, and M. Vazirgiannis. Scalable semantic overlay generation for P2P-based digital libraries. In ECDL, Sep. 2006. Google ScholarDigital Library
- C. Doulkeridis, K. Noervaag, and M. Vazirgiannis. The SOWES approach to P2P web search using semantic overlays. In WWW, pages 1027--1028, May 2006. Google ScholarDigital Library
- D. M. Endres and J. E. Schindelin. A new metric for probability distributions. IEEE Trans. Inf. Theory, 49(7):1858--1860, July 2003. Google ScholarDigital Library
- F. Falchi, C. Gennaro, and P. Zezula. A content-addressable network for similarity search in metric spaces. In DBISP2P, pages 126--137, Aug. 2005. Google ScholarDigital Library
- P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst., 27(3):261--298, Sep. 2002. Google ScholarDigital Library
- L. Gravano, H. Garcia-Molina, and A.Tomasic. GlOSS: Text-source discovery over the internet. ACM Trans. Database Syst., 24(2):229--264, June 1999. Google ScholarDigital Library
- G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces. ACM Trans. Database Syst., 28(4):517--580, Dec. 2003. Google ScholarDigital Library
- P. Kalnis, W. S. Ng, B. C. Ooi, and K.-L. Tan. Answering similarity queries in peer-to-peer networks. Inf. Syst., 31(1):57--72, Mar. 2006. Google ScholarDigital Library
- A. C. König and G. Weikum. Automatic tuning of data synopses. Inf. Syst., 28(1-2):85--109, Mar. 2003. Google ScholarDigital Library
- X. Liu and W. B. Croft. Statistical language modeling for information retrieval. Annual Review of Information Science and Technology, 39:3--31, 2005.Google Scholar
- A. Löser, C. Tempich, B. Quilitz, W.-T. Balke, S. Staab, and W. Nejdl. Searching dynamic communities with personal indexes. In ISWC, pages 491--505, Nov. 2005. Google ScholarDigital Library
- J. Lu and J. Callan. Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In ECIR, pages 52--66, Mar. 2005. Google ScholarDigital Library
- J. Lu and J. Callan. User modeling for full-text federated search in peer-to-peer networks. In SIGIR, aug 2006. Google ScholarDigital Library
- J. Lu and J. P. Callan. Content-based retrieval in hybrid peer-to-peer networks. In CIKM, pages 199--206, Nov. 2003. Google ScholarDigital Library
- P. Mahlmann and C. Schindelhauer. Peer-to-peer networks based on random transformations of connected regular undirected graphs. In SPAA, pages 155--164, July 2005. Google ScholarDigital Library
- P. Mahlmann and C. Schindelhauer. Distributed random digraph transformations for peer-to-peer networks. In SPAA, Aug. 2006. Google ScholarDigital Library
- W. Meng, C. T. Yu, and K.-L. Liu. Building efficient and effective metasearch engines. ACM Comput. Surv., 34(1):48--89, Mar. 2002. Google ScholarDigital Library
- S. Michel, M. Bender, P. Triantafillou, and G. Weikum. Iqn routing: Integrating quality and novelty in P2P querying and ranking. In EDBT, pages 149--166, Mar. 2006. Google ScholarDigital Library
- H. Nottelmann and N. Fuhr. Combining CORI and the decision-theoretic approach for advanced resource selection. In ECIR, pages 138--153, Apr. 2004.Google ScholarCross Ref
- H. Nottelmann and N. Fuhr. Comparing different architectures for query routing in peer-to-peer networks. In ECIR, pages 253--264, Apr. 2006. Google ScholarDigital Library
- J. X. Parreira, S. Michel, and G. Weikum. p2pDating: Real life inspired semantic overlay networks for web search. In SIGIR workshop on Heterogeneous and Distributed Information Retrieval, aug 2005.Google Scholar
- I. Podnar, M. Rajman, T. Luu, F. Klemm, and K. Aberer. Beyond term indexing: A P2P framework for web information retrieval. Informatica, 30(2):153--161, June 2006.Google Scholar
- R. Steinmetz and K. Wehrle. Peer-to-Peer Systems and Applications. Springer, 2005. Google ScholarDigital Library
- P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., 2005. Google ScholarDigital Library
- C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to information retrieval ACM Trans. Inf. Syst., 22(2):179--214, Apr. 2004. Google ScholarDigital Library
Index Terms
- Efficient peer-to-peer semantic overlay networks based on statistical language models
Recommendations
Ranking factors in peer-to-peer overlay networks
A large number of peer processes are distributed in a peer-to-peer (P2P) overlay network. It is difficult, maybe impossible for a peer to perceive the membership and location of every resource object due to the scalability and openness of a P2P network. ...
Understanding overlay characteristics of a large-scale peer-to-peer IPTV system
This article presents results from our measurement and modeling efforts on the large-scale peer-to-peer (p2p) overlay graphs spanned by the PPLive system, the most popular and largest p2p IPTV (Internet Protocol Television) system today. Unlike other ...
Comments