ABSTRACT
In this paper, we propose the use of records management principles to identify and manage Web site resources with enduring value as records. Current Web archiving activities, collaborative or organisational, whilst extremely valuable in their own right, often do not and cannot incorporate requirements for proper records management. Material collected under such initiatives therefore may not be reliable or authentic from a legal or archival perspective, with insufficient metadata collected about the object during its active life, and valuable materials destroyed whilst ephemeral items are maintained. Education, training, and collaboration between stakeholders are integral to avoiding these risks and successfully preserving valuable Web-based materials.
- Weiss, R quoting Brewster Kahle, in On the Web, Research Work Proves Ephemeral Washington Post Nov 24 2003.Google Scholar
- DCC/Wellcome Library workshop, Future-Proofing Web sites http://www.dcc.ac.uk/events/fpw-2006/Google Scholar
- PANDAS/PANDORA archive, http://www.kb.se/kw3/ENG/Default.aspxGoogle Scholar
- Archiving Scientific Data, Buneman et al in ACM Transactions on Database Systems, Vol 27, 2004, 2--42. Google ScholarDigital Library
Index Terms
- Archiving web site resources: a records management view
Recommendations
Intelligent crawling of web applications for web archiving
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebThe steady growth of the World Wide Web raises challenges regarding the preservation of meaningful Web data. Tools used currently by Web archivists blindly crawl and store Web pages found while crawling, disregarding the kind of Web site currently ...
Web site metadata
The currently established formats for how a Web site can publish metadata about a site's pages, the robots.txt file and sitemaps, focus on how to provide information to crawlers about where to not go and where to go on a site. This is sufficient as ...
Web Archiving and Digital Libraries (WADL)
JCDL '15: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital LibrariesThis workshop will explore integration of Web archiving and digital libraries, so the complete life cycle involved is covered: creation/authoring, uploading/publishing in the Web (2.0), (focused) crawling, indexing, exploration (searching, browsing), ...
Comments