ABSTRACT
It is crucial to study basic principles that support adaptive and scalable retrieval functions in large networked environments such as the Web, where information is distributed among dynamic systems. We conducted experiments on decentralized IR operations on various scales of information networks and analyzed effectiveness, efficiency, and scalability of various search methods. Results showed network structure, i.e., how distributed systems connect to one another, is crucial for retrieval performance. Relying on partial indexes of distributed systems, some level of network clustering enabled very efficient and effective discovery of relevant information in large scale networks. For a given network clustering level, search time was well explained by a poly-logarithmic relation to network size (i.e., the number of distributed systems), indicating a high scalability potential for searching in a growing information space. In addition, network clustering only involved local self-organization and required no global control - clustering time remained roughly constant across the various scales of networks.
- L. Adamic and E. Adar. How to search a social network. Social Networks, 27(3):187 -- 203, 2005.Google ScholarCross Ref
- R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47--97, 2002.Google ScholarCross Ref
- R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. ICDE 2007: Data Engineering 2007., pages 6--20, April 2007Google ScholarCross Ref
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman Publishing, 2004. Google ScholarDigital Library
- M. Bawa, G. S. Manku, and P. Raghavan. Sets: search enhanced by topic segmentation. In SIGIR '03, pages 306--313, 2003. Google ScholarDigital Library
- F. L. Bellifemine, G. Caire, and D. Greenwood. Developing Multi-Agent Systems with JADE (Wiley Series in Agent Technology). John Wiley & Sons, 2007. Google ScholarCross Ref
- M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in p2p search engines. In SIGIR '05, pages 67--74, 2005. Google ScholarDigital Library
- M. Boguna, D. Krioukov, and K. C. Claffy. Navigability of complex networks. Nature Physics, 5(1):74--80, 2009.Google ScholarCross Ref
- J. Callan, F. Crestani, and M. Sanderson. SIGIR 2003 workshop on distributed information retrieval. SIGIR Forum, 37(2):33--37, 2003. Google ScholarDigital Library
- A. Crespo and H. Garcia-Molina. Semantic overlay networks for p2p systems. In Agents and Peer-to-Peer Computing, pages 1--13, 2005. Google ScholarDigital Library
- C. Doulkeridis, K. Norvag, and M. Vazirgiannis. Peer-to-peer similarity search over widely distributed document collections. In LSDS-IR '08, pages 35--42, 2008. Google ScholarDigital Library
- M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):1360--1380, May 1973.Google ScholarCross Ref
- E. Hatcher, O. Gospodnetic,, and M. McCandless. Lucene in Action. Manning Publications, second edition edition, March 2010.Google Scholar
- W. Ke and J. Mostafa. Strong ties vs. weak ties: Studying the clustering paradox for decentralized search. In LSDS-IR'08, pages 49--56, Boston, USA, July 23 2009.Google Scholar
- J. M. Kleinberg. Navigation in a small world. Nature, 406(6798), August 2000.Google Scholar
- J. Lu and J. Callan. User modeling for full-text federated search in peer-to-peer networks. In SIGIR'06, pages 332--339, 2006. Google ScholarDigital Library
- E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim. A survey and comparison of peer-to-peer overlay network schemes. IEEE Communications Surveys and Tutorials, 7:72--93, 2005. Google ScholarDigital Library
- T. Luu, F. Klemm, I. Podnar, M. Rajman, and K. Aberer. Alvis peers: a scalable full-text peer-to-peer retrieval engine. In P2PIR '06, pages 41--48, 2006. Google ScholarDigital Library
- A. L. Powell and J. C. French. Comparing the performance of collection selection algorithms. ACM Transactions on Information Systems (TOIS), 21(4):412--456, October 2003. Google ScholarDigital Library
- O. Simsek and D. Jensen. Navigating networks by using homophily and degree. Proceedings of the National Academy of Sciences, 105(35):12758--12762, 2008.Google ScholarCross Ref
- G. Skobeltsyn, T. Luu, I. P. Zarko, M. Rajman, and K. Aberer. Web text retrieval with a p2p query-driven index. In SIGIR '07, pages 679--686, 2007. Google ScholarDigital Library
- D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and Search in Social Networks. Science, 296(5571):1302--1305, 2002.Google ScholarCross Ref
- I. P. Zarko and F. Silvestri. The CIKM 2006 workshop on information retrieval in peer-to-peer networks. SIGIR Forum, 41(1):101--103, 2007. Google ScholarDigital Library
Index Terms
- Scalability of findability: effective and efficient IR operations in large information networks
Recommendations
Studying the clustering paradox and scalability of search in highly distributed environments
With the ubiquitous production, distribution and consumption of information, today's digital environments such as the Web are increasingly large and decentralized. It is hardly possible to obtain central control over information collections and systems ...
Scalability of findability: decentralized search and retrieval in large information networks
Amid the rapid growth of information today is the increasing challenge for people to survive and navigate its magnitude. Dynamics and heterogeneity of large information spaces such as the Web challenge information retrieval in these environments. ...
Comments