ABSTRACT
Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality.This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.
- K. Aberer. P-Grid: A self-organizing access structure for P2P information systems. In CoopIS, 2001.]] Google ScholarDigital Library
- K. Aberer and P. Cudré-Mauroux. Semantic overlay networks. In VLDB, 2005.]] Google ScholarDigital Library
- K. Aberer, F. Klemm, T. Luu, I. Podnar, and M. Rajman. Building a peer-to-peer full-text Web search engine with highly discriminative keys. Technical report, EPFL (LSIR), 2005.]]Google Scholar
- D. Agrawal, A. E. Abbadi, and S. Suri. Attribute-based access to distributed data over p2p networks. In DNIS, 2005.]] Google ScholarDigital Library
- R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.]] Google ScholarDigital Library
- M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in p2p search engines. In SIGIR, 2005.]] Google ScholarDigital Library
- M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Minerva: collaborative p2p search. In VLDB, 2005.]] Google ScholarDigital Library
- M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. P2p content search: Give the web back to the people. In IPTPS, 2006.]]Google Scholar
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7), 1970.]] Google ScholarDigital Library
- A. Broder. On the resemblance and containment of documents. In SEQS, 1997.]] Google ScholarDigital Library
- A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. Journal of Computer and System Sciences, 60(3), 2000.]] Google ScholarDigital Library
- J. Callan. Distributed information retrieval. Advances in information retrieval, Kluwer Academic Publishers, 2000.]]Google Scholar
- J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR, 1995.]] Google ScholarDigital Library
- E. Cohen, A. Fiat, and H. Kaplan. Associative search in peer to peer networks: Harnessing latent semantics. In INFOCOM, 2003.]]Google ScholarCross Ref
- G. Cormode and M. N. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In VLDB, 2005.]] Google ScholarDigital Library
- A. Crespo and H. Garcia-Molina. Semantic overlay networks for p2p systems. In AP2PC, 2004.]]Google Scholar
- P. Druschel and A. Rowstron. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Middleware, 2001.]] Google ScholarDigital Library
- M. Durand and P. Flajolet. Loglog counting of large cardinalities. In ESA, 2003.]]Google ScholarCross Ref
- M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. In VLDB, 1998.]] Google ScholarDigital Library
- P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2), 1985.]] Google ScholarDigital Library
- N. Fuhr. A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3), 1999.]] Google ScholarDigital Library
- J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufman, 2001.]] Google ScholarDigital Library
- T. Hernandez and S. Kambhampati. Improving text collection selection with coverage and overlap statistics. poster at WWW 2005.]] Google ScholarDigital Library
- R. Huebsch, B. N. Chun, J. M. Hellerstein, B. T. Loo, P. Maniatis, T. Roscoe, S. Shenker, I. Stoica, and A. R. Yumerefendi. The architecture of pier: an internet-scale query processor. In CIDR, 2005.]]Google Scholar
- M. Jelasity, W. Kowalczyk, and M. van Steen. An approach to massively distributed aggregate computing on peer-to-peer networks. In PDP, 2004.]]Google ScholarCross Ref
- M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst., 23(3):219--252, 2005.]] Google ScholarDigital Library
- P. Kalnis, W. S. Ng, B. C. Ooi, and K.-L. Tan. Answering similarity queries in peer-to-peer networks. Inf. Syst., 31(1), 2006.]]Google Scholar
- R. M. Karp, C. Schindelhauer, S. Shenker, and B. Voecking. Randomized rumor spreading. In FOCS, 2000]] Google ScholarDigital Library
- W. Litwin and M.-A. Neimat. k-rp*s: A scalable distributed data structure for high-performance multi-attribute access. In PDIS, 1996.]] Google ScholarDigital Library
- J. Lu and J. P. Callan. Content-based retrieval in hybrid peer-to-peer networks. In CIKM, 2003.]] Google ScholarDigital Library
- V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, 2005.]] Google ScholarDigital Library
- W. Meng, C. T. Yu, and K.-L. Liu. Building efficient and effective metasearch engines. ACM Computing Surveys, 34(1):48--89, 2002.]] Google ScholarDigital Library
- S. Michel, M. Bender, P. Triantafillou, and G. Weikum. IQN Routing: Integrating quality and novelty for web search. In EDBT, 2006.]]Google Scholar
- S. Ratnasamy, et al. A scalable content-addressable network. In SIGCOMM, 2001.]] Google ScholarDigital Library
- L. Si, R. Jin, J. P. Callan, and P. Ogilvie. A language modeling framework for resource selection and results merging. In CIKM, 2002.]] Google ScholarDigital Library
- K. Sripanidkulchai, B. M. Maggs, and H. Zhang. Efficient content location using interest-based locality in peer-to-peer systems. In INFOCOM, 2003.]]Google ScholarCross Ref
- I. Stoica, et al. Chord: A scalable Peer-To-Peer lookup service for internet applications. In SIGCOMM, 2001.]] Google ScholarDigital Library
- Text REtrieval Conference (TREC). http://trec.nist.gov/.]]Google Scholar
- P. Triantafillou, C. Xiruhaki, M. Koubarakis, and N. Ntarmos. Towards high performance peer-to-peer content and resource sharing systems. In CIDR, 2003.]]Google Scholar
- S. Voulgaris, A.-M. Kermarrec, L. Massoulié, and M. van Steen. Exploiting semantic proximity in peer-to-peer content searching. In FTDCS, 2004.]]Google ScholarCross Ref
- L. Wasserman. All of Statistics: A Concise Course in Statistical Inference. Springer, Feb. 2004.]]Google ScholarDigital Library
- B. Yang, P. Vinograd, and H. Garcia-Molina. Evaluating guess and non-forwarding peer-to-peer search. In ICDCS, 2004.]] Google ScholarDigital Library
- J. Zhang and T. Suel. Efficient query evaluation on large textual collections in a peer-to-peer environment. In Peer-to-Peer Computing, 2005.]] Google ScholarDigital Library
Index Terms
- Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
Recommendations
Scalable summary based retrieval in P2P networks
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementMuch of the present P2P-IR literature is focused on distributed indexing structures. Within this paper, we present an approach based on the replication of peer data summaries via rumor spreading and multicast in a structured overlay.We will describe ...
Data allocation scheme based on term weight for P2P information retrieval
WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data managementMany Peer-to-Peer information retrieval systems that use a global index have already been proposed that can retrieve documents relevant to a query. Since documents are allocated to peers regardless of the query, the system needs to connect many peers to ...
Improving collection selection with overlap awareness in P2P search engines
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrievalCollection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our ...
Comments