Abstract
In this work, we develop an intelligent storage system framework for soft real-time applications. Modern software systems consist of a collection of layers and information exchange across the layers is performed via well-defined interfaces. Due to the strictness and inflexibility of interface definition, it is not possible to pass the information specific to one layer to other layers. In practice, the exploitation of this information across the layers can greatly enhance the performance, reliability, and manageability of the system. We address the limitation of legacy interface definition via enabling intelligence in the storage system. The objective is to enable the lower-layer entity, for example, a physical or block device, to conjecture the semantic and contextual information of that application behavior which cannot be passed via the legacy interface. Based upon the knowledge obtained by the intelligence module, the system can perform a number of actions to improve the performance, reliability, security, and manageability of the system. Our intelligence storage system focuses on optimizing the I/O subsystem performance for a soft real-time application. Our intelligence framework consists of three components: the workload monitor, workload analyzer, and system optimizer. The workload monitor maintains a window of recent I/O requests and extracts feature vectors in regular intervals. The workload analyzer is trained to determine the class of the incoming workload by using the feature vector. The system optimizer performs various actions to tune the storage system for a given workload. We use confidence rate boosting to train the workload analyzer. This sophisticated learner achieves a higher than 97% accuracy of workload class prediction. We develop a prototype intelligence storage system on the legacy operating system platform. The system optimizer performs; (1) dynamic adjustment of the file-system-level read-ahead size; (2) dynamic adjustment of I/O request size; and (3) filtering of I/O requests. We examine the effect of this autonomic optimization via experimentation. We find that the storage level pro-active optimization greatly enhances the efficiency of the underlying storage system. The sophisticated intelligence module developed in this work does not restrict its usage for performance optimization. It can be effectively used as classification engine for generic autonomic computing environment, i.e. management, diagnosis, security and etc.
- Aboutabl, M., Agrawala, A., and Decotignie, J.-D. 1998. Temporally determinate disk access: An experimental approach. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 280--281. Google ScholarDigital Library
- Acharya, A., Uysal, M., and Saltz, J. 1998. Active disks: Programming model, algorithms and evaluation. In ASPLOS-VIII: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 81--91. Google ScholarDigital Library
- ANSI. 2002. At attachment with packet interface entension-(ata/atapi-6). American National Standard for Information Technology, T13-1410D.Google Scholar
- Bovet, D. P. and Cesati, M. 2005. Understanding the LINUX Kernel. O'REILLY. Google ScholarDigital Library
- Breiman, L., Friedman, J., Olshen, R., and Stone., C. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA.Google Scholar
- Burnett, N. C., Bent, J., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2000. Exploiting gray-box knowledge of buffer-cache management. In Proceedings of 2002 USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 29--44. Google ScholarDigital Library
- Choi, J. and Won, Y. 2002. Power constraints: Another dimension of complexity in continuous media playback. In Proceedings of the Joint International Workshops on Interactive Distributed Multimedia Systems and Protocols for Multimedia Systems. Coimbra, Portugal, 288--299. Google ScholarDigital Library
- Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., and Chase, J. S. 2004. Correlating instrumentation data to system states: A building block for automated diagnosis and control. Tech. Rep. HPL-2004-183, HP Laboratories, Palo Alto, CA, Oct.Google Scholar
- David, R. R. 2004. Diskbench: User-Level disk feature extraction tool. Tech. rep. UCSB TR-2004-18. Nov.Google Scholar
- Dimitrijevic, Z., Rangaswami, R., and Chang, E. 2003. Design and implementation of semi-preemptible IO. In FAST '03: Proceedings of the Conference on File and Storage Technologies. San Jose, CA. 145--158. Google ScholarDigital Library
- Freud, Y. and Schapire, R. E. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT '95: Proceedings of the 2nd European Conference on Computational Learning Theory. Springer Verlag, London, 23--37. Google ScholarDigital Library
- Friedman, J. 2001. Greedy function approximation: A gradient boosting machine. Ann. Statist. 29, 1189--1232.Google ScholarCross Ref
- Ganger, G. 2001. Blurring the line between OSES and storage devices. Tech. rep. Technical Report CMU-CS-01-166, Carnegie Mellon University. Dec.Google Scholar
- Ganger, G. R., Worthington, B. L., and Patt, Y. 1998. The Disksim simulation environment. Tech. rep. CSE-TR-358-98, Dept. of Electrical Engineering and Computer Science, Univ. of Michigan. Feb.Google Scholar
- Hughes, G. 2002. Wise drives. IEEE Spectrum 39, 8 (Aug.), 37--41. Google ScholarDigital Library
- Huston, L., Sukthankar, R., Wickremesinghe, R., Satyanarayanan, M., Ganger, G., Riedel, E., and Ailamaki, A. 2004. Diamond: A storage architecture for early discard in interactive search. In FAST '04: Proceedings of the 3rd USENIX Conference on File and Techonologies. San Jose, CA. Google ScholarDigital Library
- Iyer, S. and Druschel, P. 2001. Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O. In SOSP '01: Proceedings of the 18th ACM Symposium on Operating Systems Principles. ACM, New York, 117--130. Google ScholarDigital Library
- Karlsson, M. and Covell, M. 2005. Dynamic black-box performance model estimation for self-tuning regulators. In Proceedings of Internation Conference on Autonomic Computing. Seattle, WA, 172--182. Google ScholarDigital Library
- Kim, T., Won, Y., and Koh, K. 2005. Apollon: File system support for qos augmented I/O. In Proceedings of the Pacific Rim Conference on Multimedia. Jeju, Korea. Google ScholarDigital Library
- Li, Z., Chen, Z., Srinivasan, S. M., and Zhou, Y. 2004. C-Miner: Mining block correlations in storage. In FAST '04: Proceedings of the 3rd USENIX Conference on File and Storage Technologies. San Francisco, CA, 173--186. Google ScholarDigital Library
- Lu, Y., Du, D. H., and Ruwart, T. 2005. Qos provisioning framework for an OSD-Based storage system. In Proceedings of the 22nd IEEE/13th NASA Goddard Conferene on Mass Storage Systems and Technologies (MSST). 28--35. Google ScholarDigital Library
- Lumb, C. R., Schindler, J., and Ganger, G. R. 2002. Freeblock scheduling outside of disk firmware. In FAST '02: Proceedings of the Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, 275--288. Google ScholarDigital Library
- Mesnier, M., Thereska, E., Gregory Ganger, D. E., and Seltzer, M. 2004. File classification in self-*stroage systems. In Proceedings of the 1st International Conference on Autonomic Computing. Google ScholarDigital Library
- Mitechelle, T. M. 1997. Machine Learning. Donnelly and Sons.Google Scholar
- mpeg2dec. http://libmpeg2.sourceforge.net.Google Scholar
- mplayer. http://www.mplayerhq.hu.Google Scholar
- Niranjan, T., Chiueh, T., and Schloss, G. A. 1997. Implementation and evaluation of a multimedia file system. In ICMCS '97: Proceedings of the International Conference on Multimedia Computing and Systems (ICMCS '97). IEEE Computer Society, Ottawa, Ontario, Canada, 269--276. Google ScholarDigital Library
- Performance Evaluation Laboratory, B. Y. U. 2006. Dtb: Linux disk trace buffer. http://traces.byu.edu/new/Tools/.Google Scholar
- Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Riedel, E., Faloutsos, C., Ganger, G. R., and Nagle, D. F. 2000. Data mining on an oltp system (nearly) for free. In SIGMOD '00: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 13--21. Google ScholarDigital Library
- Riedel, E., Gibson, G. A., and Faloutsos, C. 1998. Active storage for large-scale data mining and multimedia. In VLDB '98: Proceedings of the 24th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 62--73. Google ScholarDigital Library
- Schapire, R. E. and Singer, Y. 1999. Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 3 (Dec.), 297--336. Google ScholarDigital Library
- Schindler, J., Griffin, J. L., Lumb, C. R., and Ganger, G. R. 2002. Track-Aligned extents: Matching access patterns to disk drive characteristics. In FAST '02: Proceedings of the Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, 259--274. Google ScholarDigital Library
- Sivathanu, M., Prabhakaran, V., Popovici, F. I., Denehy, T. E., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2003. Semantically-Smart disk systems. In FAST '03: Proceedings of 2nd USENIX Conference on File and Storage Technologies (FAST). USENIX Association. Google ScholarDigital Library
- Wang, C., Goebel, V., and Plagemann, T. 1999. Techniques to increase disk access locality in the minorca multimedia file system. In Proceedings of the 7th ACM Multimedia Conference. 147--150. Google ScholarDigital Library
- Wang, R. Y., Anderson, T. E., and Patterson, D. A. 1999. Virtual log based file systems for a programmable disk. In OSDI '99: Proceedings of the 3rd Symposium on Operating Systems Design and Implementation. USENIX Association, Berkeley, CA, 29--43. Google ScholarDigital Library
- Weissel, A., Beutel, B., and Bellosa, F. 2002. Cooperative I/O: A novel I/O semantics for energy-aware applications. SIGOPS Oper. Syst. Rev. 36, SI (Dec.), 117--129. Google ScholarDigital Library
- Wildstrom, J., Stone, P., Witchel, E., Mooney, R., and Dahlin, M. 2005. Towards self-configuring hardware for distributed computer systems. In Proceedings of the International Conference on Autonomic Computing. Seattle, WA, 241--249. Google ScholarDigital Library
- Won, Y., Park, J., Kim, D., and Lee, S. 2005. Hermes: Embedded file system for a/v workload. Multimedia Tools and Applications, Springer.Google Scholar
- Worthington, B. L., Ganger, G. R., Patt, Y. N., and Wilkes, J. 1995. On-line extraction of SCSI disk drive parameters. In SIGMETRICS '95/PERFORMANCE '95: Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 146--156. Google ScholarDigital Library
- xine. http://xinehq.de.Google Scholar
- Xu, W., Bodik, P., and Patterson, D. 2004. A flexible architecture for statistical learning and data mining from system log streams. In Proceedings of the Workshop on Temporal Data Mining: Algorithms, Theory and Applications Conjunction with the International Conference on Data Mining. Brighton, UK.Google Scholar
- Zhang, Z., Lian, Q., lin, S., Chen, W., Chen, Y., and Jin, C. 2005. Bitvault: A highly reliable distributed retension platform. Tech. rep. MSR-TR-2005-179, Microsoft Research, China. Dec.Google Scholar
- Zhang, Z., Lin, S., Lian, Q., and Jin, C. 2004. Repstore: A self-managing and self-tuning storage backend with smart bricks. In Proceedings of the International Conference on Autonomic Computing. 122--129. Google ScholarDigital Library
Index Terms
- Intelligent storage: Cross-layer optimization for soft real-time workload
Recommendations
On Variable Scope of Parity Protection in Disk Arrays
In a common form of a RAID 5 architecture, data is organized on a disk array consisting of N + 1 disks into stripes of N data blocks and one parity block (with parity block locations staggered so as to balance the number of parity blocks on each disk). ...
Design Methodologies of Transaction-Safe Cluster Allocations in TFAT File System for Embedded Storage Devices
ICIT '14: Proceedings of the 2014 International Conference on Information TechnologyThe File Allocation Table (FAT) file system is widely used file system in tablet personal computers, mobile phones, digital cameras and other embedded devices for data storage and multi-media applications such as video imaging, audio/video playback and ...
Using Memcached to Promote Read Throughput in Massive Small-File Storage System
GCC '10: Proceedings of the 2010 Ninth International Conference on Grid and Cloud ComputingBecause of the bottleneck of disk I/O, the distributed file system based on disk is limited in the performance on data throughput and latency. It is a big challenge for such a system to meet the high performance requirement of the massive small-file ...
Comments