research-article

Scalability of findability: effective and efficient IR operations in large information networks

Authors:
Weimao Ke

University of North Carolina, Chapel Hill, NC, USA

University of North Carolina, Chapel Hill, NC, USA
View Profile

,
Javed Mostafa

University of North Carolina, Chapel Hill, NC, USA

University of North Carolina, Chapel Hill, NC, USA
View Profile

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalJuly 2010Pages 74–81https://doi.org/10.1145/1835449.1835465

Published:19 July 2010Publication History

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Pages 74–81

ABSTRACT

It is crucial to study basic principles that support adaptive and scalable retrieval functions in large networked environments such as the Web, where information is distributed among dynamic systems. We conducted experiments on decentralized IR operations on various scales of information networks and analyzed effectiveness, efficiency, and scalability of various search methods. Results showed network structure, i.e., how distributed systems connect to one another, is crucial for retrieval performance. Relying on partial indexes of distributed systems, some level of network clustering enabled very efficient and effective discovery of relevant information in large scale networks. For a given network clustering level, search time was well explained by a poly-logarithmic relation to network size (i.e., the number of distributed systems), indicating a high scalability potential for searching in a growing information space. In addition, network clustering only involved local self-organization and required no global control - clustering time remained roughly constant across the various scales of networks.

References

L. Adamic and E. Adar. How to search a social network. Social Networks, 27(3):187 -- 203, 2005.Google ScholarCross Ref
R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47--97, 2002.Google ScholarCross Ref
R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. ICDE 2007: Data Engineering 2007., pages 6--20, April 2007Google ScholarCross Ref
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman Publishing, 2004. Google ScholarDigital Library
M. Bawa, G. S. Manku, and P. Raghavan. Sets: search enhanced by topic segmentation. In SIGIR '03, pages 306--313, 2003. Google ScholarDigital Library
F. L. Bellifemine, G. Caire, and D. Greenwood. Developing Multi-Agent Systems with JADE (Wiley Series in Agent Technology). John Wiley & Sons, 2007. Google ScholarCross Ref
M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in p2p search engines. In SIGIR '05, pages 67--74, 2005. Google ScholarDigital Library
M. Boguna, D. Krioukov, and K. C. Claffy. Navigability of complex networks. Nature Physics, 5(1):74--80, 2009.Google ScholarCross Ref
J. Callan, F. Crestani, and M. Sanderson. SIGIR 2003 workshop on distributed information retrieval. SIGIR Forum, 37(2):33--37, 2003. Google ScholarDigital Library
A. Crespo and H. Garcia-Molina. Semantic overlay networks for p2p systems. In Agents and Peer-to-Peer Computing, pages 1--13, 2005. Google ScholarDigital Library
C. Doulkeridis, K. Norvag, and M. Vazirgiannis. Peer-to-peer similarity search over widely distributed document collections. In LSDS-IR '08, pages 35--42, 2008. Google ScholarDigital Library
M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):1360--1380, May 1973.Google ScholarCross Ref
E. Hatcher, O. Gospodnetic,, and M. McCandless. Lucene in Action. Manning Publications, second edition edition, March 2010.Google Scholar
W. Ke and J. Mostafa. Strong ties vs. weak ties: Studying the clustering paradox for decentralized search. In LSDS-IR'08, pages 49--56, Boston, USA, July 23 2009.Google Scholar
J. M. Kleinberg. Navigation in a small world. Nature, 406(6798), August 2000.Google Scholar
J. Lu and J. Callan. User modeling for full-text federated search in peer-to-peer networks. In SIGIR'06, pages 332--339, 2006. Google ScholarDigital Library
E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim. A survey and comparison of peer-to-peer overlay network schemes. IEEE Communications Surveys and Tutorials, 7:72--93, 2005. Google ScholarDigital Library
T. Luu, F. Klemm, I. Podnar, M. Rajman, and K. Aberer. Alvis peers: a scalable full-text peer-to-peer retrieval engine. In P2PIR '06, pages 41--48, 2006. Google ScholarDigital Library
A. L. Powell and J. C. French. Comparing the performance of collection selection algorithms. ACM Transactions on Information Systems (TOIS), 21(4):412--456, October 2003. Google ScholarDigital Library
O. Simsek and D. Jensen. Navigating networks by using homophily and degree. Proceedings of the National Academy of Sciences, 105(35):12758--12762, 2008.Google ScholarCross Ref
G. Skobeltsyn, T. Luu, I. P. Zarko, M. Rajman, and K. Aberer. Web text retrieval with a p2p query-driven index. In SIGIR '07, pages 679--686, 2007. Google ScholarDigital Library
D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and Search in Social Networks. Science, 296(5571):1302--1305, 2002.Google ScholarCross Ref
I. P. Zarko and F. Silvestri. The CIKM 2006 workshop on information retrieval in peer-to-peer networks. SIGIR Forum, 41(1):101--103, 2007. Google ScholarDigital Library

Index Terms

Scalability of findability: effective and efficient IR operations in large information networks
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage
      2. Storage network architectures

Recommendations

Studying the clustering paradox and scalability of search in highly distributed environments

With the ubiquitous production, distribution and consumption of information, today's digital environments such as the Web are increasingly large and decentralized. It is hardly possible to obtain central control over information collections and systems ...
Read More
Scalability of findability: decentralized search and retrieval in large information networks

Amid the rapid growth of information today is the increasing challenge for people to survive and navigate its magnitude. Dynamics and heterogeneity of large information spaces such as the Web challenge information retrieval in these environments. ...
Read More
Scalability of findability: decentralized search and retrieval in large information networks
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
July 2010
944 pages
ISBN:9781450301534
DOI:10.1145/1835449
General Chairs:
Fabio Crestani
University of Lugano, CH
,
Stéphane Marchand-Maillet
University of Geneva, CH
,
Program Chairs:
Hsin-Hsi Chen
National Taiwan University, TW
,
Efthimis N. Efthimiadis
University of Washington, USA
,
Jacques Savoy
University of Neuchatel, CH
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering paradox
connectivity
decentralized search
distributed IR
network clustering
scalability
strong tie
weak tie
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '10 Paper Acceptance Rate87of520submissions,17%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 641
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.