research-article

Leveraging Glocality for Fast Failure Recovery in Distributed RAM Storage

Authors:
Yiming Zhang

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

0000-0001-6450-8485
View Profile

,
Dongsheng Li

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China
View Profile

,
Ling Liu

Georgia Institute of Technology, Atlanta, USA

Georgia Institute of Technology, Atlanta, USA
View Profile

Authors Info & Claims

ACM Transactions on Storage Volume 15 Issue 1Article No.: 3pp 1–24https://doi.org/10.1145/3289604

Published:18 February 2019Publication History

ACM Transactions on Storage

Abstract

Distributed RAM storage aggregates the RAM of servers in data center networks (DCN) to provide extremely high I/O performance for large-scale cloud systems. For quick recovery of storage server failures, MemCube [53] exploits the proximity of the BCube network to limit the recovery traffic to the recovery servers’ 1-hop neighborhood. However, the previous design is applicable only to the symmetric BCube(n,k) network with n^k+1 nodes and has suboptimal recovery performance due to congestion and contention.

To address these problems, in this article, we propose CubeX, which (i) generalizes the “1-hop” principle of MemCube for arbitrary cube-based networks and (ii) improves the throughput and recovery performance of RAM-based key-value (KV) store via cross-layer optimizations. At the core of CubeX is to leverage the glocality (= globality + locality) of cube-based networks: It scatters backup data across a large number of disks globally distributed throughout the cube and restricts all recovery traffic within the small local range of each server node. Our evaluation shows that CubeX not only efficiently supports RAM-based KV store for cube-based networks but also significantly outperforms MemCube and RAMCloud in both throughput and recovery time.

References

AWS Team. Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region. Retrieved from http://aws.amazon.com/message/65648/.Google Scholar
NiceX Lab. Ursa Block Store. Retrieved from http://nicexlab.com/ursa/.Google Scholar
RedisLabs. Redis Official Website. Retrieved from http://redis.io/.Google Scholar
Dhruba Borthakur. HDFS Architecture Guide. Retrieved from https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.Google Scholar
SOSP 2011 PC meeting. SOSP 2011 Reviews and Comments on RAMCloud. https://ramcloud.stanford.edu/wiki/pages/viewpage.action?pageId=8355860SOSP-2011-Reviews-and-comments-on-RAMCloud.Google Scholar
Josh Norem. Samsung SSD 960 EVO (500GB). Retrieved from https://www.pcmag.com/review/358847/samsung-ssd-960-evo-500gb.Google Scholar
Rich Miller. Failure Rates in Google Data Centers. Retrieved from http://www.datacenterknowledge.com/archives/2008/05/30/failure-rates-in-google-data-centers/.Google Scholar
Dormando. Memcached Official Website. Retrieved from http://www.memcached.org/.Google Scholar
Stephen Aiken, Dirk Grunwald, Andrew R. Pleszkun, and Jesse Willeke. 2003. A performance analysis of the iSCSI protocol. In Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST’03). IEEE, 123--134. Google ScholarDigital Library
Ashok Anand, Chitra Muthukrishnan, Steven Kappes, Aditya Akella, and Suman Nath. 2010. Cheap and large CAMs for high performance data-intensive networked systems. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’10). USENIX Association, 433--448. Retrieved from http://www.usenix.org/events/nsdi10/tech/full_papers/anand.pdf. Google ScholarDigital Library
David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. 2009. FAWN: A fast array of wimpy nodes. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’09), Jeanna Neefe Matthews and Thomas E. Anderson (Eds.). ACM, 1--14. Retrieved from http://dblp.uni-trier.de/db/conf/sosp/sosp2009.html#AndersenFKPTV09. Google ScholarDigital Library
Antirez. {n.d.}. An update on the memcached/redis benchmark. Retrieved from http://antirez.com/post/update-on-memcached-redis-benchmark.html.Google Scholar
Ed L. Cashin. 2005. Kernel korner: Ata over ethernet: Putting hard drives on the lan. Linux J. 2005, 134 (2005), 10. Google ScholarDigital Library
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 205--218. Google ScholarDigital Library
Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, 228--243. Google ScholarDigital Library
Mosharaf Chowdhury, Srikanth Kandula, and Ion Stoica. 2013. Leveraging endpoint flexibility in data-intensive clusters. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’13), Dah Ming Chiu, Jia Wang, Paul Barford, and Srinivasan Seshan (Eds.). ACM, 231--242. Google ScholarDigital Library
Mosharaf Chowdhury, Matei Zaharia, Justin Ma, Michael I. Jordan, and Ion Stoica. 2011. Managing data transfers in computer clusters with orchestra. In ACM SIGCOMM Computer Communication Review, Vol. 41. ACM, 98--109. Google ScholarDigital Library
Biplob K. Debnath, Sudipta Sengupta, and Jin Li. 2010. FlashStore: High throughput persistent key-value store. Proc. VLDB Endow. 3, 2 (2010), 1414--1425. Retrieved from http://dblp.uni-trier.de/db/journals/pvldb/pvldb3.html#DebnathSL10. Google ScholarDigital Library
Biplob K. Debnath, Sudipta Sengupta, and Jin Li. 2011. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the SIGMOD Conference, Timos K. Sellis, Rene J. Miller, Anastasios Kementsietsidis, and Yannis Velegrakis (Eds.). ACM, 25--36. Retrieved from http://dblp.uni-trier.de/db/conf/sigmod/sigmod2011.html#DebnathSL11. Google ScholarDigital Library
Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 401--414. Google ScholarDigital Library
Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and concurrent memcache with dumber caching and smarter hashing. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI’13). 371--384. Google ScholarDigital Library
Clayto S. Ferner and Kyungsook Y. Lee. 1992. Hyperbanyan networks: A new class of networks for distributed memory multiprocessors. IEEE Trans. Comput. 41, 3 (1992), 254--261.Google Scholar
Armando Fox. 2002. Toward recovery-oriented computing. In Proceedings of the Conference on Very Large Data Bases (VLDB’02). 873--876. Google ScholarDigital Library
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’03). 29--43. Google ScholarDigital Library
Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. 2011. Understanding network failures in data centers: Measurement, analysis, and implications. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’11), Srinivasan Keshav, Jörg Liebeherr, John W. Byers, and Jeffrey C. Mogul (Eds.). ACM, 350--361. Google ScholarDigital Library
Jim Gray and Gianfranco R. Putzolu. 1987. The 5 minute rule for trading memory for disk accesses and the 10 byte rule for trading memory for CPU time. In Proceedings of the Association for Computing Machinery Special Interest Group on Management of Data, Umeshwar Dayal and Irving L. Traiger (Eds.). ACM Press, 395--398. Google ScholarDigital Library
Albert G. Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2011. VL2: A scalable and flexible data center network. Commun. ACM 54, 3 (2011), 95--104. Google ScholarDigital Library
Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi, Chen Tian, Yongguang Zhang, and Songwu Lu. 2009. BCube: A high performance, server-centric network architecture for modular data centers. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’09). 63--74. Google ScholarDigital Library
Chuanxiong Guo, Lihua Yuan, Dong Xiang, Yingnong Dang, Ray Huang, Dave Maltz, Zhaoyi Liu, Vin Wang, Bin Pang, Hua Chen et al. 2015. Pingmesh: A large-scale system for data center network latency measurement and analysis. In ACM SIGCOMM Computer Communication Review, Vol. 45. ACM, 139--152. Google ScholarDigital Library
Tyler Harter, Dhruba Borthakur, Siying Dong, Amitanand Aiyer, Liyin Tang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Analysis of hdfs under hbase: A facebook messages case study. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 199--212. Google ScholarDigital Library
John H. Hartman and John K. Ousterhout. 1995. The Zebra striped network file system. ACM Trans. Comput. Syst. 13, 3 (1995), 274--310. Google ScholarDigital Library
Dean Hildebrand and Peter Honeyman. 2005. Exporting storage systems in a scalable manner with pNFS. In Proceedings of the 22nd IEEE/13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST’05). IEEE, 18--27. Google ScholarDigital Library
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (ATC’10). 1--14. Google ScholarDigital Library
Edward K. Lee and Chandramohan A. Thekkath. 1996. Petal: Distributed virtual disks. In ACM SIGPLAN Notices, Vol. 31. ACM, 84--92. Google ScholarDigital Library
HuiBa Li, ShengYun Liu, YuXing Peng, DongSheng Li, HangJun Zhou, and XiCheng Lu. 2010. Superscalar communication: A runtime optimization for distributed applications. Sci. China Info. Sci. 53, 10 (2010), 1931--1946.Google ScholarCross Ref
Hyeontaek Lim, Donsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 429--444. Google ScholarDigital Library
Guohan Lu, Chuanxiong Guo, Yulong Li, Zhiqiang Zhou, Tong Yuan, Haitao Wu, Yongqiang Xiong, Rui Gao, and Yongguang Zhang. 2011. ServerSwitch: A programmable and high performance platform for data center networks. In Proceedings of the (NSDI’11). Google ScholarDigital Library
Xicheng Lu, Huaimin Wang, and Ji Wang. 2006. Internet-based virtual computing environment (iVCE): Concepts and architecture. Sci. China Ser. F: Info. Sci. 49, 6 (2006), 681--701.Google ScholarCross Ref
Xicheng Lu, Huaimin Wang, Ji Wang, and Jie Xu. 2013. Internet-based virtual computing environment: Beyond the data center as a computer. Future Gen. Comput. Syst. 29, 1 (2013), 309--322. Google ScholarDigital Library
Jeanna Neefe Matthews, Drew Roselli, Adam M. Costello, Randolph Y. Wang, and Thomas E. Anderson. 1997. Improving the performance of log-structured file systems with adaptive methods. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’97). ACM. Google ScholarDigital Library
James Mickens, Edmund B. Nightingale, Jeremy Elson, Darren Gehring, Bin Fan, Asim Kadav, Vijay Chidambaram, Osama Khan, and Krishna Nareddy. 2014. Blizzard: Fast, cloud-scale block storage for cloud-oblivious applications. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 257--273. Google ScholarDigital Library
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang et al. 2014. f4: Facebook’s warm BLOB storage system. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 383--398. Google ScholarDigital Library
Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, and Yutaka Suzue. 2012. Flat datacenter storage. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). Google ScholarDigital Library
Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John K. Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’11). 29--41. Google ScholarDigital Library
John K. Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru M. Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2009. The case for RAMClouds: Scalable high-performance storage entirely in DRAM. Operat. Syst. Rev. 43, 4 (2009), 92--105. Google ScholarDigital Library
Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (1992), 26--52. Google ScholarDigital Library
Stephen M. Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, and John K. Ousterhout. 2011. It’s time for low latency. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS’11). Google ScholarDigital Library
Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 505--516. Google ScholarDigital Library
Ji-Yong Shin, Mahesh Balakrishnan, Tudor Marian, and Hakim Weatherspoon. 2013. Gecko: Contention-oblivious disk arrays for cloud storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’13). 285--298. Google ScholarDigital Library
Michael Vrable, Stefan Savage, and Geoffrey M. Voelker. 2012. BlueSky: A cloud-backed file system for the enterprise. In Proceedings of the 10th USENIX Conference on File and Storage Technologies. USENIX Association, 19--19. Google ScholarDigital Library
Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin. 2013. Robustness in the salus scalable block store. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI’13). 357--370. Google ScholarDigital Library
Haitao Wu, Guohan Lu, Dan Li, Chuanxiong Guo, and Yongguang Zhang. 2009. MDCube: A high performance network structure for modular data center interconnection. In Proceedings of the International Conference on Emerging Networking Experiments and Technologies (CoNEXT’09), Joörg Liebeherr, Giorgio Ventre, Ernst W. Biersack, and Srinivasan Keshav (Eds.). ACM, 25--36. Retrieved from http://dblp.uni-trier.de/db/conf/conext/conext2009.html#WuLLGZ09. Google ScholarDigital Library
Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu, Haitao Wu, and Yongqiang Xiong. 2015. CubicRing: Enabling one-hop failure detection and recovery for distributed in-memory storage systems. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI’15). 529--542. Google ScholarDigital Library
Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao et al. 2015. Packet-level telemetry in large datacenter networks. In ACM SIGCOMM Computer Communication Review, Vol. 45. ACM, 479--491. Google ScholarDigital Library

Index Terms

Leveraging Glocality for Fast Failure Recovery in Distributed RAM Storage
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Secondary storage organization
2. Information systems
  1. Information storage systems
    1. Storage architectures
      1. Cloud based storage
    2. Storage replication

Recommendations

Optimal recovery of single disk failure in RDP code storage systems
Performance evaluation review

Modern storage systems use thousands of inexpensive disks to meet the storage requirement of applications. To enhance the data availability, some form of redundancy is used. For example, conventional RAID-5 systems provide data availability for single ...
Read More
Optimal recovery of single disk failure in RDP code storage systems
SIGMETRICS '10: Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Modern storage systems use thousands of inexpensive disks to meet the storage requirement of applications. To enhance the data availability, some form of redundancy is used. For example, conventional RAID-5 systems provide data availability for single ...
Read More
Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems

In modern parallel storage systems (e.g., cloud storage and data centers), it is important to provide data availability guarantees against disk (or storage node) failures via redundancy coding schemes. One coding scheme is X-code, which is double-fault ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Storage Volume 15, Issue 1
Special Issue on ACM International Systems and Storage Conference (SYSTOR) 2018
February 2019
194 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3311821
Editor:
Sam H. Noh
Ulsan National Institute of Science and Technology, Ulsan, Korea
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 February 2019
- Accepted: 1 October 2018
- Revised: 1 July 2018
- Received: 1 November 2017
Published in tos Volume 15, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Globality
cube-based networks
distributed RAM storage
fast recovery
locality
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 285
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Leveraging Glocality for Fast Failure Recovery in Distributed RAM Storage

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Optimal recovery of single disk failure in RDP code storage systems

Optimal recovery of single disk failure in RDP code storage systems

Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Leveraging Glocality for Fast Failure Recovery in Distributed RAM Storage

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Optimal recovery of single disk failure in RDP code storage systems

Optimal recovery of single disk failure in RDP code storage systems

Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media