Article

Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices

Authors:
Sebastian Michel

Max-Planck-Institut für Informatik, Saarbrücken, Germany

Max-Planck-Institut für Informatik, Saarbrücken, Germany
View Profile

,
Matthias Bender

Max-Planck-Institut für Informatik, Saarbrücken, Germany

Max-Planck-Institut für Informatik, Saarbrücken, Germany
View Profile

,
Nikos Ntarmos

RACTI and University of Patras, Rio, Greece

RACTI and University of Patras, Rio, Greece
View Profile

,
Peter Triantafillou

RACTI and University of Patras, Rio, Greece

RACTI and University of Patras, Rio, Greece
View Profile

,
Gerhard Weikum

Max-Planck-Institut für Informatik, Saarbrücken, Germany

Max-Planck-Institut für Informatik, Saarbrücken, Germany
View Profile

,
Christian Zimmer

Max-Planck-Institut für Informatik, Saarbrücken, Germany

Max-Planck-Institut für Informatik, Saarbrücken, Germany
View Profile

CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementNovember 2006Pages 172–181https://doi.org/10.1145/1183614.1183643

Published:06 November 2006Publication History

CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

Pages 172–181

ABSTRACT

Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality.This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.

References

K. Aberer. P-Grid: A self-organizing access structure for P2P information systems. In CoopIS, 2001.]] Google ScholarDigital Library
K. Aberer and P. Cudré-Mauroux. Semantic overlay networks. In VLDB, 2005.]] Google ScholarDigital Library
K. Aberer, F. Klemm, T. Luu, I. Podnar, and M. Rajman. Building a peer-to-peer full-text Web search engine with highly discriminative keys. Technical report, EPFL (LSIR), 2005.]]Google Scholar
D. Agrawal, A. E. Abbadi, and S. Suri. Attribute-based access to distributed data over p2p networks. In DNIS, 2005.]] Google ScholarDigital Library
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.]] Google ScholarDigital Library
M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in p2p search engines. In SIGIR, 2005.]] Google ScholarDigital Library
M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Minerva: collaborative p2p search. In VLDB, 2005.]] Google ScholarDigital Library
M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. P2p content search: Give the web back to the people. In IPTPS, 2006.]]Google Scholar
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7), 1970.]] Google ScholarDigital Library
A. Broder. On the resemblance and containment of documents. In SEQS, 1997.]] Google ScholarDigital Library
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. Journal of Computer and System Sciences, 60(3), 2000.]] Google ScholarDigital Library
J. Callan. Distributed information retrieval. Advances in information retrieval, Kluwer Academic Publishers, 2000.]]Google Scholar
J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR, 1995.]] Google ScholarDigital Library
E. Cohen, A. Fiat, and H. Kaplan. Associative search in peer to peer networks: Harnessing latent semantics. In INFOCOM, 2003.]]Google ScholarCross Ref
G. Cormode and M. N. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In VLDB, 2005.]] Google ScholarDigital Library
A. Crespo and H. Garcia-Molina. Semantic overlay networks for p2p systems. In AP2PC, 2004.]]Google Scholar
P. Druschel and A. Rowstron. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Middleware, 2001.]] Google ScholarDigital Library
M. Durand and P. Flajolet. Loglog counting of large cardinalities. In ESA, 2003.]]Google ScholarCross Ref
M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. In VLDB, 1998.]] Google ScholarDigital Library
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2), 1985.]] Google ScholarDigital Library
N. Fuhr. A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3), 1999.]] Google ScholarDigital Library
J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufman, 2001.]] Google ScholarDigital Library
T. Hernandez and S. Kambhampati. Improving text collection selection with coverage and overlap statistics. poster at WWW 2005.]] Google ScholarDigital Library
R. Huebsch, B. N. Chun, J. M. Hellerstein, B. T. Loo, P. Maniatis, T. Roscoe, S. Shenker, I. Stoica, and A. R. Yumerefendi. The architecture of pier: an internet-scale query processor. In CIDR, 2005.]]Google Scholar
M. Jelasity, W. Kowalczyk, and M. van Steen. An approach to massively distributed aggregate computing on peer-to-peer networks. In PDP, 2004.]]Google ScholarCross Ref
M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst., 23(3):219--252, 2005.]] Google ScholarDigital Library
P. Kalnis, W. S. Ng, B. C. Ooi, and K.-L. Tan. Answering similarity queries in peer-to-peer networks. Inf. Syst., 31(1), 2006.]]Google Scholar
R. M. Karp, C. Schindelhauer, S. Shenker, and B. Voecking. Randomized rumor spreading. In FOCS, 2000]] Google ScholarDigital Library
W. Litwin and M.-A. Neimat. k-rp*s: A scalable distributed data structure for high-performance multi-attribute access. In PDIS, 1996.]] Google ScholarDigital Library
J. Lu and J. P. Callan. Content-based retrieval in hybrid peer-to-peer networks. In CIKM, 2003.]] Google ScholarDigital Library
V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, 2005.]] Google ScholarDigital Library
W. Meng, C. T. Yu, and K.-L. Liu. Building efficient and effective metasearch engines. ACM Computing Surveys, 34(1):48--89, 2002.]] Google ScholarDigital Library
S. Michel, M. Bender, P. Triantafillou, and G. Weikum. IQN Routing: Integrating quality and novelty for web search. In EDBT, 2006.]]Google Scholar
S. Ratnasamy, et al. A scalable content-addressable network. In SIGCOMM, 2001.]] Google ScholarDigital Library
L. Si, R. Jin, J. P. Callan, and P. Ogilvie. A language modeling framework for resource selection and results merging. In CIKM, 2002.]] Google ScholarDigital Library
K. Sripanidkulchai, B. M. Maggs, and H. Zhang. Efficient content location using interest-based locality in peer-to-peer systems. In INFOCOM, 2003.]]Google ScholarCross Ref
I. Stoica, et al. Chord: A scalable Peer-To-Peer lookup service for internet applications. In SIGCOMM, 2001.]] Google ScholarDigital Library
Text REtrieval Conference (TREC). http://trec.nist.gov/.]]Google Scholar
P. Triantafillou, C. Xiruhaki, M. Koubarakis, and N. Ntarmos. Towards high performance peer-to-peer content and resource sharing systems. In CIDR, 2003.]]Google Scholar
S. Voulgaris, A.-M. Kermarrec, L. Massoulié, and M. van Steen. Exploiting semantic proximity in peer-to-peer content searching. In FTDCS, 2004.]]Google ScholarCross Ref
L. Wasserman. All of Statistics: A Concise Course in Statistical Inference. Springer, Feb. 2004.]]Google ScholarDigital Library
B. Yang, P. Vinograd, and H. Garcia-Molina. Evaluating guess and non-forwarding peer-to-peer search. In ICDCS, 2004.]] Google ScholarDigital Library
J. Zhang and T. Suel. Efficient query evaluation on large textual collections in a peer-to-peer environment. In Peer-to-Peer Computing, 2005.]] Google ScholarDigital Library

Index Terms

Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
1. Information systems

Recommendations

Scalable summary based retrieval in P2P networks
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Much of the present P2P-IR literature is focused on distributed indexing structures. Within this paper, we present an approach based on the replication of peer data summaries via rumor spreading and multicast in a structured overlay.We will describe ...
Read More
Data allocation scheme based on term weight for P2P information retrieval
WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data management

Many Peer-to-Peer information retrieval systems that use a global index have already been proposed that can retrieve documents relevant to a query. Since documents are allocated to peers regardless of the query, the system needs to connect many peers to ...
Read More
Improving collection selection with overlap awareness in P2P search engines
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
November 2006
916 pages
ISBN:1595934332
DOI:10.1145/1183614
General Chair:
Philip S. Yu
IBM T.J. Watson Research Center (USA)
,
Program Chairs:
Vassilis Tsotras
University of California-Riverside (USA)
,
Edward Fox
Virginia Tech (USA)
,
Bing Liu
University of Illinois at Chicago (USA)
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed IR
key co-occurrences
peer-to-peer information systems
query routing
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 568
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices

CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable summary based retrieval in P2P networks

Data allocation scheme based on term weight for P2P information retrieval

Improving collection selection with overlap awareness in P2P search engines