Abstract
These days, we are all data pack rats. Storage is cheap, so if there’s a chance the data could possibly be useful, we keep it. We know that storage isn’t completely reliable, so we keep backup copies as well. But the more data we keep, and the longer we keep it, the greater the chance that some of it will be unrecoverable when we need it.
- }}Adams, D. 1978. The Hitchhiker's Guide to the Galaxy. British Broadcasting Corp.Google Scholar
- }}Amazon. 2006. Amazon S3 API Reference (March).; <a href="http://docs.amazonwebservices.com/AmazonS3/latest/API/" target="_blank">http://docs.amazonwebservices.<wbr>com/AmazonS3/latest/API/</a>.Google Scholar
- }}Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V. 2009. FAWN: a fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles: 1-14. Google ScholarDigital Library
- }}Anderson. D. 2009. Hard drive directions (September); <a href="http://www.digitalpreservation.gov/news/events/other_meetings/storage09/docs/2-4_Anderson-seagate-v3_HDtrends.pdf" target="_blank">http://www.<wbr>digitalpreservation.gov/news/<wbr>events/other_meetings/<wbr>storage09/docs/2-4_Anderson-<wbr>seagate-v3_HDtrends.pdf</a>.Google Scholar
- }}Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2008. An analysis of data corruption in the storage stack. In Proceedings of 6th Usenix Conference on File and Storage Technologies. Google ScholarDigital Library
- }}Baker, M., Shah, M., Rosenthal, D. S. H., Roussopoulos, M., Maniatis, P., Giuli, TJ, Bungale, P. 2006. A fresh look at the reliability of long-term digital storage. In Proceedings of EuroSys2006 (April). Google ScholarDigital Library
- }}Cappello, F., Geist, A., Gropp, B., Kale, S., Kramer, B., Snir, M. 2009. Toward exascale resilience. Technical Report TR-JLPC-09-01. INRIA-Illinois Joint Laboratory on Petascale Computing (July).Google Scholar
- }}CERN. 2008. Worldwide LHC Computing Grid; <a href="http://lcg.web.cern.ch/LCG/" target="_blank">http://lcg.web.cern.ch/LCG/</a>.Google Scholar
- }}Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., Grube, R. E. 2006. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th Usenix Symposium on Operating System Design and Implementation: 205-218. Google ScholarDigital Library
- }}Christensen, C. M. 1997. The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail. Cambridge, MA: Harvard Business School Press (June). Google ScholarDigital Library
- }}Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In 3rd Usenix Conference on File and Storage Technologies (March). Google ScholarDigital Library
- }}Elerath. J. 2009. Hard-disk drives: the good, the bad, and the ugly. Communications of the ACM 52(6). Google ScholarDigital Library
- }}Elerath, J. G., Pecht, M. 2007. Enhanced reliability modeling of RAID storage systems. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks: 175-184. Google ScholarDigital Library
- }}Engler, D. 2007. A system's hackers crash course: techniques that find lots of bugs in real (storage) system code. In Proceedings of 5th Usenix Conference on File and Storage Technologies (February).Google Scholar
- }}Haber, S., Stornetta, W. S. 1991. How to timestamp a digital document. Journal of Cryptology 3(2): 99-111.Google ScholarDigital Library
- }}Hafner, J. L., Deenadhayalan, V., Belluomini, W., Rao, K. 2008. Undetected disk errors in RAID arrays. IBM Journal of Research & Development 52(4/5). Google ScholarDigital Library
- }}Jiang, W., Hu, C., Zhou, Y., Kanevsky, A. 2008. Are disks the dominant contributor for storage failures? A comprehensive study of storage subsystem failure characteristics. In Proceedings of 6th Usenix Conference on File and Storage Technologies. Google ScholarDigital Library
- }}Kelemen, P. 2007. Silent corruptions. In 8th Annual Workshop on Linux Clusters for Super Computing.Google Scholar
- }}Klima, V. 2005. Finding MD5 collisions—a toy for a notebook. Cryptology ePrint Archive, Report 2005/075; <a href="http://eprint.iacr.org/2005/075" target="_blank">http://eprint.iacr.org/2005/<wbr>075</a>.Google Scholar
- }}Krioukov, A., Bairavasundaram, L. N., Goodson, G. R., Srinivasan, K., Thelen, R., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2008. Parity lost and parity regained. In Proceedings of 6th Usenix Conference on File and Storage Technologies. Google ScholarDigital Library
- }}Maniatis, P., Roussopoulos, M., Giuli, TJ, Rosenthal, D. S, H, Baker, M., Muliadi, Y. 2003. Preserving peer replicas by rate-limited sampled voting. In Proceedings of the 19th ACM Symposium on Operating Systems Principles: 44-59 (October). Google ScholarDigital Library
- }}Marshall, C. 2008. "It's like a fire. You just have to move on": Rethinking personal digital archiving. In 6th Usenix Conference on File and Storage Technologies.Google Scholar
- }}Mearian, L. 2009. Start-up claims its DVDs last 1,000 years. Computerworld (November).Google Scholar
- }}Mellor, C. 2010. Drive suppliers hit capacity increase difficulties. The Register (July).Google Scholar
- }}Michail, H. E., Kakarountas, A. P., Theodoridis, G., Goutis, C. E. 2005. A low-power and high-throughput implementation of the SHA-1 hash function. In Proceedings of the 9th WSEAS International Conference on Computers. Google ScholarDigital Library
- }}Mielke, N., Marquart, T., Wu1, N., Kessenich, J., Belgal, H., Schares, E., Trivedi, F., Goodness, E., Nevill, L. R. 2008. Bit error rate in NAND flash memories. In 46th Annual International Reliability Physics Symposium: 9-19.Google ScholarCross Ref
- }}Moore, R. L., D'Aoust, J., McDonald, R. H., Minor, D. 2007. Disk and tape storage cost models. In Archiving 2007.Google Scholar
- }}National Institute of Standards and Technology (NIST). 1995. Federal Information Processing Standard Publication 180-1: Secure Hash Standard (SHA-1) (April).Google Scholar
- }}Patterson, D. A., Gibson, G., Katz, R. H. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the ACM SIGMOD International Conference on Management of Data: 109-116 (June). Google ScholarDigital Library
- }}Pinheiro, E., Weber, W.-D., Barroso, L. A. 2007. Failure trends in a large disk drive population. In Proceedings of 5th Usenix Conference on File and Storage Technologies (February). Google ScholarDigital Library
- }}Prabhakaran, V., Agrawal, N., Bairavasundaram, L., Gunawi, H., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2005. IRON file systems. In Proceedings of the 20th Symposium on Operating Systems Principles. Google ScholarDigital Library
- }}Rosenthal, D. S. H. 2010. Bit preservation; a solved problem? International Journal of Digital Curation 1(5).Google Scholar
- }}Rosenthal, D. S. H. 2010. LOCKSS: Lots of copies keep stuff safe. In NIST Digital Preservation Interoperability Framework Workshop (March).Google Scholar
- }}Rosenthal, D. S. H., Robertson, T. S., Lipkis, T., Reich, V., Morabito, S. 2005. Requirements for digital preservation systems: a bottom-up approach. D-Lib Magazine 11(11).Google Scholar
- }}Schroeder, B., Gibson, G. 2007. Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? In Proceedings of 5th Usenix Conference on File and Storage Technologies (February). Google ScholarDigital Library
- }}Schwarz, T., Baker, M., Bassi, S., Baumgart, B., Flagg, W., van Imngen, C., Joste, K., Manasse, M., Shah, M. 2006. Disk failure investigations at the Internet Archive. In Work-in-Progress Session, NASA/IEEE Conference on Mass Storage Systems and Technologies.Google Scholar
- }}SDSS (Sloan Digital Sky Survey) 2008; <a href="http://www.sdss.org/" target="_blank">http://www.sdss.org/</a>.Google Scholar
- }}Shah, M. A., Baker, M., Mogul, J. C., Swaminathan, R. 2007. Auditing to keep online storage services honest. In 11th Workshop on Hot Topics in Operating Systems (May). Google ScholarDigital Library
- }}Storer, M. W., Greenan, K. M., Miller, E. L., Voruganti, K. 2008. Pergamum: replacing tape with energy-efficient, reliable, disk-based archival storage. In Proceedings of 6th Usenix Conference on File and Storage Technologies. Google ScholarDigital Library
- }}Sun Microsystems. 2006. Sales Terms and Conditions, Section 11.2 (December); <a href="http://store.sun.com/CMTemplate/docs/legal_terms/TnC.jsp#11" target="_blank">http://store.sun.com/<wbr>CMTemplate/docs/legal_terms/<wbr>TnC.jsp#11</a>.Google Scholar
- }}Sun Microsystems. 2008. ST5800 presentation. Sun PASIG Meeting (June).Google Scholar
- }}Talagala, N. 1999. Characterizing large storage systems: error behavior and performance benchmarks. Ph.D. thesis, Computer Science Division, University of California at Berkeley (October). Google ScholarDigital Library
- }}Williams, P., Rosenthal, D. S. H., Roussopoulos, M., Georgis, S. 2008. Predicting the archival life of removable hard disk drives. In Archiving 2008 (June).Google Scholar
- }}Zhang, Y., Rajimwale, A., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2010. End-to-end data integrity for file systems: a ZFS case study. In 8th Usenix Conference on File and Storage Technologies. Google ScholarDigital Library
- a }}Numbers are expressed in powers-of-10 notation to help readers focus on the scale of the problems and the extraordinary level of reliability required.Google Scholar
- b }}Purveyors of chatty doors, existential elevators, and paranoid androids to the nobility and gentry of this galaxy.1Google Scholar
- c }}Figures for 2007 are in Moore et al.27Google Scholar
- d }}Assuming the digest algorithm hasn't been broken, not a safe assumption for MD5.19Google Scholar
Index Terms
- Keeping Bits Safe: How Hard Can It Be?: As storage systems grow larger and larger, protecting their data for long-term storage is becoming more and more challenging.
Recommendations
Keeping bits safe: how hard can it be?
As storage systems grow larger and larger, protecting their data for long-term storage is becoming ever more challenging.
Hard disk
Encyclopedia of Computer ScienceA hard disk is a high-capacity, high-speed rotational storage device. Hard disks are sometimes called fixed disks because they usually cannot be removed from the computer. This nomenclature is somewhat outdated now that removable hard disk storage ...
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services
Due to the relatively low bandwidth of WAN that supports cloud backup services and the increasing amount of backed-up data stored at service providers, the deduplication scheme used in the cloud backup environment must remove the redundant data for ...
Comments