skip to main content
10.1145/1141753.1141771acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

Building a research library for the history of the web

Published:11 June 2006Publication History

ABSTRACT

This paper describes the building of a research library for studying the Web, especially research on how the structure and content of the Web change over time. The library is particularly aimed at supporting social scientists for whom the Web is both a fascinating social phenomenon and a mirror on society.The library is built on the collections of the Internet Archive, which has been preserving a crawl of the Web every two months since 1996. The technical challenges in organizing this data for research fall into two categories: high-performance computing to transfer and manage the very large amounts of data, and human-computer interfaces that empower research by non-computer specialists.

References

  1. Arms, W., Aya, S., Dmitriev, P., Kot, B., Mitchell, R., Walle, L., A Research Library for the Web based on the Historical Collections of the Internet Archive. D-Lib Magazine. February 2006. http://www.dlib.org/dlib/february06/arms/02arms.htmlGoogle ScholarGoogle Scholar
  2. Bergmark, D., Collection synthesis. ACM/IEEE-CS Joint Conference on Digital Libraries, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brin, S., and Page. L., The anatomy of a large-scale hypertextual Web search engine. Seventh International World Wide Web Conference. Brisbane, Australia, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Burner, M., and Kahle, B., Internet Archive ARC File Format, 1996. http://archive.org/web/researcher/ArcFileFormat.phpGoogle ScholarGoogle Scholar
  5. Chakrabarti, D., Zhan, Y., and Faloutsos, C., R-MAT: recursive model for graph mining. SIAM International Conference on Data Mining, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. Gerner, N., Sosa, C., Fall 2005 Semester Report for Web Lab Database Load Group. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/Gerner2005.doc.Google ScholarGoogle Scholar
  7. Ghemawat, S., Gobioff, H. and Leung, S., The Google File System. 19th ACM Symposium on Operating Systems Principles, October 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jeyabalan, K., Kallukalam, J., Representation of Web Graph for in Memory Computation. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/JeyabalanKallukalam2005.doc.Google ScholarGoogle Scholar
  9. J. Kleinberg. Authoritative sources in a hyperlinked environment. Ninth ACM-SIAM Symposium on Discrete Algorithms, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mitchell, S., Mooney, M., Mason, J., Paynter, G., Ruscheinski, J., Kedzierski, A., Humphreys, K., iVia Open Source Virtual Library System. D-Lib Magazine, 9 (1), January 2003. http://www.dlib.org/dlib/january03/mitchell/01mitchell.htmlGoogle ScholarGoogle Scholar
  11. Shah, S., Generating a web graph. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/Shah2005a.doc.Google ScholarGoogle Scholar
  12. Shah, S., Retro Browser. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/Shah2005b.pdf.Google ScholarGoogle Scholar

Index Terms

  1. Building a research library for the history of the web

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
          June 2006
          402 pages
          ISBN:1595933549
          DOI:10.1145/1141753

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate415of1,482submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader