skip to main content
10.1145/1835449.1835465acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Scalability of findability: effective and efficient IR operations in large information networks

Published:19 July 2010Publication History

ABSTRACT

It is crucial to study basic principles that support adaptive and scalable retrieval functions in large networked environments such as the Web, where information is distributed among dynamic systems. We conducted experiments on decentralized IR operations on various scales of information networks and analyzed effectiveness, efficiency, and scalability of various search methods. Results showed network structure, i.e., how distributed systems connect to one another, is crucial for retrieval performance. Relying on partial indexes of distributed systems, some level of network clustering enabled very efficient and effective discovery of relevant information in large scale networks. For a given network clustering level, search time was well explained by a poly-logarithmic relation to network size (i.e., the number of distributed systems), indicating a high scalability potential for searching in a growing information space. In addition, network clustering only involved local self-organization and required no global control - clustering time remained roughly constant across the various scales of networks.

References

  1. L. Adamic and E. Adar. How to search a social network. Social Networks, 27(3):187 -- 203, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  2. R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47--97, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. ICDE 2007: Data Engineering 2007., pages 6--20, April 2007Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman Publishing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bawa, G. S. Manku, and P. Raghavan. Sets: search enhanced by topic segmentation. In SIGIR '03, pages 306--313, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. L. Bellifemine, G. Caire, and D. Greenwood. Developing Multi-Agent Systems with JADE (Wiley Series in Agent Technology). John Wiley & Sons, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in p2p search engines. In SIGIR '05, pages 67--74, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Boguna, D. Krioukov, and K. C. Claffy. Navigability of complex networks. Nature Physics, 5(1):74--80, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Callan, F. Crestani, and M. Sanderson. SIGIR 2003 workshop on distributed information retrieval. SIGIR Forum, 37(2):33--37, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Crespo and H. Garcia-Molina. Semantic overlay networks for p2p systems. In Agents and Peer-to-Peer Computing, pages 1--13, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Doulkeridis, K. Norvag, and M. Vazirgiannis. Peer-to-peer similarity search over widely distributed document collections. In LSDS-IR '08, pages 35--42, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):1360--1380, May 1973.Google ScholarGoogle ScholarCross RefCross Ref
  13. E. Hatcher, O. Gospodnetic,, and M. McCandless. Lucene in Action. Manning Publications, second edition edition, March 2010.Google ScholarGoogle Scholar
  14. W. Ke and J. Mostafa. Strong ties vs. weak ties: Studying the clustering paradox for decentralized search. In LSDS-IR'08, pages 49--56, Boston, USA, July 23 2009.Google ScholarGoogle Scholar
  15. J. M. Kleinberg. Navigation in a small world. Nature, 406(6798), August 2000.Google ScholarGoogle Scholar
  16. J. Lu and J. Callan. User modeling for full-text federated search in peer-to-peer networks. In SIGIR'06, pages 332--339, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim. A survey and comparison of peer-to-peer overlay network schemes. IEEE Communications Surveys and Tutorials, 7:72--93, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Luu, F. Klemm, I. Podnar, M. Rajman, and K. Aberer. Alvis peers: a scalable full-text peer-to-peer retrieval engine. In P2PIR '06, pages 41--48, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. L. Powell and J. C. French. Comparing the performance of collection selection algorithms. ACM Transactions on Information Systems (TOIS), 21(4):412--456, October 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O. Simsek and D. Jensen. Navigating networks by using homophily and degree. Proceedings of the National Academy of Sciences, 105(35):12758--12762, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  21. G. Skobeltsyn, T. Luu, I. P. Zarko, M. Rajman, and K. Aberer. Web text retrieval with a p2p query-driven index. In SIGIR '07, pages 679--686, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and Search in Social Networks. Science, 296(5571):1302--1305, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  23. I. P. Zarko and F. Silvestri. The CIKM 2006 workshop on information retrieval in peer-to-peer networks. SIGIR Forum, 41(1):101--103, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalability of findability: effective and efficient IR operations in large information networks

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
            July 2010
            944 pages
            ISBN:9781450301534
            DOI:10.1145/1835449

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 July 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            SIGIR '10 Paper Acceptance Rate87of520submissions,17%Overall Acceptance Rate792of3,983submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader