This dissertation characterizes two causes of variability in a large storage system: soft error behavior and disk drive heterogeneity. The first half of the dissertation focuses on understanding the error behavior and component failure characteristics of a storage prototype. The prototype is a loosely coupled collection of Pentium machines; each machine acts as a storage node, hosting disk drives via the SCSI interface. Examination of long term system log data from this prototype reveals several interesting insights. In particular, the study reveals that data disk drives are among the most reliable components in the storage system and that soft errors tend to fall into a small number of well defined categories. An in-depth study of hard failures reveals data to support the notion that failing devices exhibit warning signs and investigates the effectiveness of failure prediction. The second half of the dissertation, dealing with disk drive heterogeneity, focuses on a new measurement technique to characterize disk drives. The technique, linearly increasing strides, counteracts the rotational effect that makes disk drives difficult to measure. The linearly increasing stride pattern interacts with the drive mechanism to create a latency vs. stride size graph that exposes many low level disk details. This micro-benchmark extracts a drive''s minimum time to access media, rotation time, sectors/track, head switch time, cylinder switch time, number of platters, as well as several other pieces of information. The dissertation describes the read and write versions of this micro-benchmark, named Skippy, as well as analytical models explaining its behavior, results on modern SCSI and IDE disk drives, techniques for automatically extracting parameter values from the graphical output, and extensions.
Recommendations
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud
Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM (virtual machine) platforms. However, ...
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...