skip to main content
research-article
Free Access

Keeping Bits Safe: How Hard Can It Be?: As storage systems grow larger and larger, protecting their data for long-term storage is becoming more and more challenging.

Published:01 October 2010Publication History
Skip Abstract Section

Abstract

These days, we are all data pack rats. Storage is cheap, so if there’s a chance the data could possibly be useful, we keep it. We know that storage isn’t completely reliable, so we keep backup copies as well. But the more data we keep, and the longer we keep it, the greater the chance that some of it will be unrecoverable when we need it.

References

  1. }}Adams, D. 1978. The Hitchhiker's Guide to the Galaxy. British Broadcasting Corp.Google ScholarGoogle Scholar
  2. }}Amazon. 2006. Amazon S3 API Reference (March).; <a href="http://docs.amazonwebservices.com/AmazonS3/latest/API/" target="_blank">http://docs.amazonwebservices.<wbr>com/AmazonS3/latest/API/</a>.Google ScholarGoogle Scholar
  3. }}Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V. 2009. FAWN: a fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles: 1-14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Anderson. D. 2009. Hard drive directions (September); <a href="http://www.digitalpreservation.gov/news/events/other_meetings/storage09/docs/2-4_Anderson-seagate-v3_HDtrends.pdf" target="_blank">http://www.<wbr>digitalpreservation.gov/news/<wbr>events/other_meetings/<wbr>storage09/docs/2-4_Anderson-<wbr>seagate-v3_HDtrends.pdf</a>.Google ScholarGoogle Scholar
  5. }}Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2008. An analysis of data corruption in the storage stack. In Proceedings of 6th Usenix Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}Baker, M., Shah, M., Rosenthal, D. S. H., Roussopoulos, M., Maniatis, P., Giuli, TJ, Bungale, P. 2006. A fresh look at the reliability of long-term digital storage. In Proceedings of EuroSys2006 (April). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Cappello, F., Geist, A., Gropp, B., Kale, S., Kramer, B., Snir, M. 2009. Toward exascale resilience. Technical Report TR-JLPC-09-01. INRIA-Illinois Joint Laboratory on Petascale Computing (July).Google ScholarGoogle Scholar
  8. }}CERN. 2008. Worldwide LHC Computing Grid; <a href="http://lcg.web.cern.ch/LCG/" target="_blank">http://lcg.web.cern.ch/LCG/</a>.Google ScholarGoogle Scholar
  9. }}Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., Grube, R. E. 2006. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th Usenix Symposium on Operating System Design and Implementation: 205-218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}Christensen, C. M. 1997. The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail. Cambridge, MA: Harvard Business School Press (June). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In 3rd Usenix Conference on File and Storage Technologies (March). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}Elerath. J. 2009. Hard-disk drives: the good, the bad, and the ugly. Communications of the ACM 52(6). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}Elerath, J. G., Pecht, M. 2007. Enhanced reliability modeling of RAID storage systems. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks: 175-184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}Engler, D. 2007. A system's hackers crash course: techniques that find lots of bugs in real (storage) system code. In Proceedings of 5th Usenix Conference on File and Storage Technologies (February).Google ScholarGoogle Scholar
  15. }}Haber, S., Stornetta, W. S. 1991. How to timestamp a digital document. Journal of Cryptology 3(2): 99-111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}Hafner, J. L., Deenadhayalan, V., Belluomini, W., Rao, K. 2008. Undetected disk errors in RAID arrays. IBM Journal of Research &amp; Development 52(4/5). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}Jiang, W., Hu, C., Zhou, Y., Kanevsky, A. 2008. Are disks the dominant contributor for storage failures? A comprehensive study of storage subsystem failure characteristics. In Proceedings of 6th Usenix Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}Kelemen, P. 2007. Silent corruptions. In 8th Annual Workshop on Linux Clusters for Super Computing.Google ScholarGoogle Scholar
  19. }}Klima, V. 2005. Finding MD5 collisions&mdash;a toy for a notebook. Cryptology ePrint Archive, Report 2005/075; <a href="http://eprint.iacr.org/2005/075" target="_blank">http://eprint.iacr.org/2005/<wbr>075</a>.Google ScholarGoogle Scholar
  20. }}Krioukov, A., Bairavasundaram, L. N., Goodson, G. R., Srinivasan, K., Thelen, R., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2008. Parity lost and parity regained. In Proceedings of 6th Usenix Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}Maniatis, P., Roussopoulos, M., Giuli, TJ, Rosenthal, D. S, H, Baker, M., Muliadi, Y. 2003. Preserving peer replicas by rate-limited sampled voting. In Proceedings of the 19th ACM Symposium on Operating Systems Principles: 44-59 (October). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}Marshall, C. 2008. "It's like a fire. You just have to move on": Rethinking personal digital archiving. In 6th Usenix Conference on File and Storage Technologies.Google ScholarGoogle Scholar
  23. }}Mearian, L. 2009. Start-up claims its DVDs last 1,000 years. Computerworld (November).Google ScholarGoogle Scholar
  24. }}Mellor, C. 2010. Drive suppliers hit capacity increase difficulties. The Register (July).Google ScholarGoogle Scholar
  25. }}Michail, H. E., Kakarountas, A. P., Theodoridis, G., Goutis, C. E. 2005. A low-power and high-throughput implementation of the SHA-1 hash function. In Proceedings of the 9th WSEAS International Conference on Computers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. }}Mielke, N., Marquart, T., Wu1, N., Kessenich, J., Belgal, H., Schares, E., Trivedi, F., Goodness, E.,&nbsp; Nevill, L. R. 2008. Bit error rate in NAND flash memories. In 46th Annual International Reliability Physics Symposium: 9-19.Google ScholarGoogle ScholarCross RefCross Ref
  27. }}Moore, R. L., D'Aoust, J., McDonald, R. H., Minor, D. 2007. Disk and tape storage cost models. In Archiving 2007.Google ScholarGoogle Scholar
  28. }}National Institute of Standards and Technology (NIST). 1995. Federal Information Processing Standard Publication 180-1: Secure Hash Standard (SHA-1) (April).Google ScholarGoogle Scholar
  29. }}Patterson, D. A., Gibson, G., Katz, R. H. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the ACM SIGMOD International Conference on Management of Data: 109-116 (June). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. }}Pinheiro, E., Weber, W.-D., Barroso, L. A. 2007. Failure trends in a large disk drive population. In Proceedings of 5th Usenix Conference on File and Storage Technologies (February). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. }}Prabhakaran, V., Agrawal, N., Bairavasundaram, L., Gunawi, H., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2005. IRON file systems. In Proceedings of the 20th Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. }}Rosenthal, D. S. H. 2010. Bit preservation; a solved problem? International Journal of Digital Curation 1(5).Google ScholarGoogle Scholar
  33. }}Rosenthal, D. S. H. 2010. LOCKSS: Lots of copies keep stuff safe. In NIST Digital Preservation Interoperability Framework Workshop (March).Google ScholarGoogle Scholar
  34. }}Rosenthal, D. S. H., Robertson, T. S., Lipkis, T., Reich, V., Morabito, S. 2005. Requirements for digital preservation systems: a bottom-up approach. D-Lib Magazine 11(11).Google ScholarGoogle Scholar
  35. }}Schroeder, B., Gibson, G. 2007. Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? In Proceedings of 5th Usenix Conference on File and Storage Technologies (February). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. }}Schwarz, T., Baker, M., Bassi, S., Baumgart, B., Flagg, W., van Imngen, C., Joste, K., Manasse, M., Shah, M. 2006. Disk failure investigations at the Internet Archive. In Work-in-Progress Session, NASA/IEEE Conference on Mass Storage Systems and Technologies.Google ScholarGoogle Scholar
  37. }}SDSS (Sloan Digital Sky Survey) 2008; <a href="http://www.sdss.org/" target="_blank">http://www.sdss.org/</a>.Google ScholarGoogle Scholar
  38. }}Shah, M. A., Baker, M., Mogul, J. C., Swaminathan, R. 2007. Auditing to keep online storage services honest. In 11th Workshop on Hot Topics in Operating Systems (May). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. }}Storer, M. W., Greenan, K. M., Miller, E. L., Voruganti, K. 2008. Pergamum: replacing tape with energy-efficient, reliable, disk-based archival storage. In Proceedings of 6th Usenix Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. }}Sun Microsystems. 2006. Sales Terms and Conditions, Section 11.2 (December); <a href="http://store.sun.com/CMTemplate/docs/legal_terms/TnC.jsp#11" target="_blank">http://store.sun.com/<wbr>CMTemplate/docs/legal_terms/<wbr>TnC.jsp#11</a>.Google ScholarGoogle Scholar
  41. }}Sun Microsystems. 2008. ST5800 presentation. Sun PASIG Meeting (June).Google ScholarGoogle Scholar
  42. }}Talagala, N. 1999. Characterizing large storage systems: error behavior and performance benchmarks. Ph.D. thesis, Computer Science Division, University of California at Berkeley&nbsp; (October). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. }}Williams, P., Rosenthal, D. S. H., Roussopoulos, M., Georgis, S. 2008. Predicting the archival life of removable hard disk drives. In Archiving 2008 (June).Google ScholarGoogle Scholar
  44. }}Zhang, Y., Rajimwale, A., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2010. End-to-end data integrity for file systems: a ZFS case study. In 8th Usenix Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. a }}Numbers are expressed in powers-of-10 notation to help readers focus on the scale of the problems and the extraordinary level of reliability required.Google ScholarGoogle Scholar
  46. b }}Purveyors of chatty doors, existential elevators, and paranoid androids to the nobility and gentry of this galaxy.1Google ScholarGoogle Scholar
  47. c }}Figures for 2007 are in Moore et al.27Google ScholarGoogle Scholar
  48. d }}Assuming the digest algorithm hasn't been broken, not a safe assumption for MD5.19Google ScholarGoogle Scholar

Index Terms

  1. Keeping Bits Safe: How Hard Can It Be?: As storage systems grow larger and larger, protecting their data for long-term storage is becoming more and more challenging.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Queue
          Queue  Volume 8, Issue 10
          Storage
          October 2010
          26 pages
          ISSN:1542-7730
          EISSN:1542-7749
          DOI:10.1145/1866296
          Issue’s Table of Contents

          Copyright © 2010 Copyright is held by the owner/author(s)

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 October 2010

          Check for updates

          Qualifiers

          • research-article
          • Popular
          • Editor picked

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format