skip to main content
10.1145/1183614.1183643acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices

Published:06 November 2006Publication History

ABSTRACT

Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality.This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.

References

  1. K. Aberer. P-Grid: A self-organizing access structure for P2P information systems. In CoopIS, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Aberer and P. Cudré-Mauroux. Semantic overlay networks. In VLDB, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Aberer, F. Klemm, T. Luu, I. Podnar, and M. Rajman. Building a peer-to-peer full-text Web search engine with highly discriminative keys. Technical report, EPFL (LSIR), 2005.]]Google ScholarGoogle Scholar
  4. D. Agrawal, A. E. Abbadi, and S. Suri. Attribute-based access to distributed data over p2p networks. In DNIS, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in p2p search engines. In SIGIR, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Minerva: collaborative p2p search. In VLDB, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. P2p content search: Give the web back to the people. In IPTPS, 2006.]]Google ScholarGoogle Scholar
  9. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7), 1970.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Broder. On the resemblance and containment of documents. In SEQS, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. Journal of Computer and System Sciences, 60(3), 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Callan. Distributed information retrieval. Advances in information retrieval, Kluwer Academic Publishers, 2000.]]Google ScholarGoogle Scholar
  13. J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Cohen, A. Fiat, and H. Kaplan. Associative search in peer to peer networks: Harnessing latent semantics. In INFOCOM, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. G. Cormode and M. N. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In VLDB, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Crespo and H. Garcia-Molina. Semantic overlay networks for p2p systems. In AP2PC, 2004.]]Google ScholarGoogle Scholar
  17. P. Druschel and A. Rowstron. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Middleware, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Durand and P. Flajolet. Loglog counting of large cardinalities. In ESA, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. In VLDB, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2), 1985.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. Fuhr. A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3), 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufman, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Hernandez and S. Kambhampati. Improving text collection selection with coverage and overlap statistics. poster at WWW 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Huebsch, B. N. Chun, J. M. Hellerstein, B. T. Loo, P. Maniatis, T. Roscoe, S. Shenker, I. Stoica, and A. R. Yumerefendi. The architecture of pier: an internet-scale query processor. In CIDR, 2005.]]Google ScholarGoogle Scholar
  25. M. Jelasity, W. Kowalczyk, and M. van Steen. An approach to massively distributed aggregate computing on peer-to-peer networks. In PDP, 2004.]]Google ScholarGoogle ScholarCross RefCross Ref
  26. M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst., 23(3):219--252, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Kalnis, W. S. Ng, B. C. Ooi, and K.-L. Tan. Answering similarity queries in peer-to-peer networks. Inf. Syst., 31(1), 2006.]]Google ScholarGoogle Scholar
  28. R. M. Karp, C. Schindelhauer, S. Shenker, and B. Voecking. Randomized rumor spreading. In FOCS, 2000]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Litwin and M.-A. Neimat. k-rp*s: A scalable distributed data structure for high-performance multi-attribute access. In PDIS, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Lu and J. P. Callan. Content-based retrieval in hybrid peer-to-peer networks. In CIKM, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. Meng, C. T. Yu, and K.-L. Liu. Building efficient and effective metasearch engines. ACM Computing Surveys, 34(1):48--89, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Michel, M. Bender, P. Triantafillou, and G. Weikum. IQN Routing: Integrating quality and novelty for web search. In EDBT, 2006.]]Google ScholarGoogle Scholar
  34. S. Ratnasamy, et al. A scalable content-addressable network. In SIGCOMM, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Si, R. Jin, J. P. Callan, and P. Ogilvie. A language modeling framework for resource selection and results merging. In CIKM, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Sripanidkulchai, B. M. Maggs, and H. Zhang. Efficient content location using interest-based locality in peer-to-peer systems. In INFOCOM, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  37. I. Stoica, et al. Chord: A scalable Peer-To-Peer lookup service for internet applications. In SIGCOMM, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Text REtrieval Conference (TREC). http://trec.nist.gov/.]]Google ScholarGoogle Scholar
  39. P. Triantafillou, C. Xiruhaki, M. Koubarakis, and N. Ntarmos. Towards high performance peer-to-peer content and resource sharing systems. In CIDR, 2003.]]Google ScholarGoogle Scholar
  40. S. Voulgaris, A.-M. Kermarrec, L. Massoulié, and M. van Steen. Exploiting semantic proximity in peer-to-peer content searching. In FTDCS, 2004.]]Google ScholarGoogle ScholarCross RefCross Ref
  41. L. Wasserman. All of Statistics: A Concise Course in Statistical Inference. Springer, Feb. 2004.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. B. Yang, P. Vinograd, and H. Garcia-Molina. Evaluating guess and non-forwarding peer-to-peer search. In ICDCS, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Zhang and T. Suel. Efficient query evaluation on large textual collections in a peer-to-peer environment. In Peer-to-Peer Computing, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
                  November 2006
                  916 pages
                  ISBN:1595934332
                  DOI:10.1145/1183614

                  Copyright © 2006 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 6 November 2006

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  Overall Acceptance Rate1,861of8,427submissions,22%

                  Upcoming Conference

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader