skip to main content
research-article

Leveraging Glocality for Fast Failure Recovery in Distributed RAM Storage

Published:18 February 2019Publication History
Skip Abstract Section

Abstract

Distributed RAM storage aggregates the RAM of servers in data center networks (DCN) to provide extremely high I/O performance for large-scale cloud systems. For quick recovery of storage server failures, MemCube [53] exploits the proximity of the BCube network to limit the recovery traffic to the recovery servers’ 1-hop neighborhood. However, the previous design is applicable only to the symmetric BCube(n,k) network with nk+1 nodes and has suboptimal recovery performance due to congestion and contention.

To address these problems, in this article, we propose CubeX, which (i) generalizes the “1-hop” principle of MemCube for arbitrary cube-based networks and (ii) improves the throughput and recovery performance of RAM-based key-value (KV) store via cross-layer optimizations. At the core of CubeX is to leverage the glocality (= globality + locality) of cube-based networks: It scatters backup data across a large number of disks globally distributed throughout the cube and restricts all recovery traffic within the small local range of each server node. Our evaluation shows that CubeX not only efficiently supports RAM-based KV store for cube-based networks but also significantly outperforms MemCube and RAMCloud in both throughput and recovery time.

References

  1. AWS Team. Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region. Retrieved from http://aws.amazon.com/message/65648/.Google ScholarGoogle Scholar
  2. NiceX Lab. Ursa Block Store. Retrieved from http://nicexlab.com/ursa/.Google ScholarGoogle Scholar
  3. RedisLabs. Redis Official Website. Retrieved from http://redis.io/.Google ScholarGoogle Scholar
  4. Dhruba Borthakur. HDFS Architecture Guide. Retrieved from https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.Google ScholarGoogle Scholar
  5. SOSP 2011 PC meeting. SOSP 2011 Reviews and Comments on RAMCloud. https://ramcloud.stanford.edu/wiki/pages/viewpage.action?pageId=8355860SOSP-2011-Reviews-and-comments-on-RAMCloud.Google ScholarGoogle Scholar
  6. Josh Norem. Samsung SSD 960 EVO (500GB). Retrieved from https://www.pcmag.com/review/358847/samsung-ssd-960-evo-500gb.Google ScholarGoogle Scholar
  7. Rich Miller. Failure Rates in Google Data Centers. Retrieved from http://www.datacenterknowledge.com/archives/2008/05/30/failure-rates-in-google-data-centers/.Google ScholarGoogle Scholar
  8. Dormando. Memcached Official Website. Retrieved from http://www.memcached.org/.Google ScholarGoogle Scholar
  9. Stephen Aiken, Dirk Grunwald, Andrew R. Pleszkun, and Jesse Willeke. 2003. A performance analysis of the iSCSI protocol. In Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST’03). IEEE, 123--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ashok Anand, Chitra Muthukrishnan, Steven Kappes, Aditya Akella, and Suman Nath. 2010. Cheap and large CAMs for high performance data-intensive networked systems. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’10). USENIX Association, 433--448. Retrieved from http://www.usenix.org/events/nsdi10/tech/full_papers/anand.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. 2009. FAWN: A fast array of wimpy nodes. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’09), Jeanna Neefe Matthews and Thomas E. Anderson (Eds.). ACM, 1--14. Retrieved from http://dblp.uni-trier.de/db/conf/sosp/sosp2009.html#AndersenFKPTV09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Antirez. {n.d.}. An update on the memcached/redis benchmark. Retrieved from http://antirez.com/post/update-on-memcached-redis-benchmark.html.Google ScholarGoogle Scholar
  13. Ed L. Cashin. 2005. Kernel korner: Ata over ethernet: Putting hard drives on the lan. Linux J. 2005, 134 (2005), 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 205--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, 228--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mosharaf Chowdhury, Srikanth Kandula, and Ion Stoica. 2013. Leveraging endpoint flexibility in data-intensive clusters. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’13), Dah Ming Chiu, Jia Wang, Paul Barford, and Srinivasan Seshan (Eds.). ACM, 231--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mosharaf Chowdhury, Matei Zaharia, Justin Ma, Michael I. Jordan, and Ion Stoica. 2011. Managing data transfers in computer clusters with orchestra. In ACM SIGCOMM Computer Communication Review, Vol. 41. ACM, 98--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Biplob K. Debnath, Sudipta Sengupta, and Jin Li. 2010. FlashStore: High throughput persistent key-value store. Proc. VLDB Endow. 3, 2 (2010), 1414--1425. Retrieved from http://dblp.uni-trier.de/db/journals/pvldb/pvldb3.html#DebnathSL10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Biplob K. Debnath, Sudipta Sengupta, and Jin Li. 2011. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the SIGMOD Conference, Timos K. Sellis, Rene J. Miller, Anastasios Kementsietsidis, and Yannis Velegrakis (Eds.). ACM, 25--36. Retrieved from http://dblp.uni-trier.de/db/conf/sigmod/sigmod2011.html#DebnathSL11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 401--414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and concurrent memcache with dumber caching and smarter hashing. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI’13). 371--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Clayto S. Ferner and Kyungsook Y. Lee. 1992. Hyperbanyan networks: A new class of networks for distributed memory multiprocessors. IEEE Trans. Comput. 41, 3 (1992), 254--261.Google ScholarGoogle Scholar
  23. Armando Fox. 2002. Toward recovery-oriented computing. In Proceedings of the Conference on Very Large Data Bases (VLDB’02). 873--876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’03). 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. 2011. Understanding network failures in data centers: Measurement, analysis, and implications. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’11), Srinivasan Keshav, Jörg Liebeherr, John W. Byers, and Jeffrey C. Mogul (Eds.). ACM, 350--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jim Gray and Gianfranco R. Putzolu. 1987. The 5 minute rule for trading memory for disk accesses and the 10 byte rule for trading memory for CPU time. In Proceedings of the Association for Computing Machinery Special Interest Group on Management of Data, Umeshwar Dayal and Irving L. Traiger (Eds.). ACM Press, 395--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Albert G. Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2011. VL2: A scalable and flexible data center network. Commun. ACM 54, 3 (2011), 95--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi, Chen Tian, Yongguang Zhang, and Songwu Lu. 2009. BCube: A high performance, server-centric network architecture for modular data centers. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’09). 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chuanxiong Guo, Lihua Yuan, Dong Xiang, Yingnong Dang, Ray Huang, Dave Maltz, Zhaoyi Liu, Vin Wang, Bin Pang, Hua Chen et al. 2015. Pingmesh: A large-scale system for data center network latency measurement and analysis. In ACM SIGCOMM Computer Communication Review, Vol. 45. ACM, 139--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tyler Harter, Dhruba Borthakur, Siying Dong, Amitanand Aiyer, Liyin Tang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Analysis of hdfs under hbase: A facebook messages case study. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 199--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. John H. Hartman and John K. Ousterhout. 1995. The Zebra striped network file system. ACM Trans. Comput. Syst. 13, 3 (1995), 274--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dean Hildebrand and Peter Honeyman. 2005. Exporting storage systems in a scalable manner with pNFS. In Proceedings of the 22nd IEEE/13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST’05). IEEE, 18--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (ATC’10). 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Edward K. Lee and Chandramohan A. Thekkath. 1996. Petal: Distributed virtual disks. In ACM SIGPLAN Notices, Vol. 31. ACM, 84--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. HuiBa Li, ShengYun Liu, YuXing Peng, DongSheng Li, HangJun Zhou, and XiCheng Lu. 2010. Superscalar communication: A runtime optimization for distributed applications. Sci. China Info. Sci. 53, 10 (2010), 1931--1946.Google ScholarGoogle ScholarCross RefCross Ref
  36. Hyeontaek Lim, Donsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 429--444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Guohan Lu, Chuanxiong Guo, Yulong Li, Zhiqiang Zhou, Tong Yuan, Haitao Wu, Yongqiang Xiong, Rui Gao, and Yongguang Zhang. 2011. ServerSwitch: A programmable and high performance platform for data center networks. In Proceedings of the (NSDI’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xicheng Lu, Huaimin Wang, and Ji Wang. 2006. Internet-based virtual computing environment (iVCE): Concepts and architecture. Sci. China Ser. F: Info. Sci. 49, 6 (2006), 681--701.Google ScholarGoogle ScholarCross RefCross Ref
  39. Xicheng Lu, Huaimin Wang, Ji Wang, and Jie Xu. 2013. Internet-based virtual computing environment: Beyond the data center as a computer. Future Gen. Comput. Syst. 29, 1 (2013), 309--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jeanna Neefe Matthews, Drew Roselli, Adam M. Costello, Randolph Y. Wang, and Thomas E. Anderson. 1997. Improving the performance of log-structured file systems with adaptive methods. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’97). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. James Mickens, Edmund B. Nightingale, Jeremy Elson, Darren Gehring, Bin Fan, Asim Kadav, Vijay Chidambaram, Osama Khan, and Krishna Nareddy. 2014. Blizzard: Fast, cloud-scale block storage for cloud-oblivious applications. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 257--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang et al. 2014. f4: Facebook’s warm BLOB storage system. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 383--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, and Yutaka Suzue. 2012. Flat datacenter storage. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John K. Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’11). 29--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. John K. Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru M. Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2009. The case for RAMClouds: Scalable high-performance storage entirely in DRAM. Operat. Syst. Rev. 43, 4 (2009), 92--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (1992), 26--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Stephen M. Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, and John K. Ousterhout. 2011. It’s time for low latency. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 505--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ji-Yong Shin, Mahesh Balakrishnan, Tudor Marian, and Hakim Weatherspoon. 2013. Gecko: Contention-oblivious disk arrays for cloud storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’13). 285--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Michael Vrable, Stefan Savage, and Geoffrey M. Voelker. 2012. BlueSky: A cloud-backed file system for the enterprise. In Proceedings of the 10th USENIX Conference on File and Storage Technologies. USENIX Association, 19--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin. 2013. Robustness in the salus scalable block store. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI’13). 357--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Haitao Wu, Guohan Lu, Dan Li, Chuanxiong Guo, and Yongguang Zhang. 2009. MDCube: A high performance network structure for modular data center interconnection. In Proceedings of the International Conference on Emerging Networking Experiments and Technologies (CoNEXT’09), Joörg Liebeherr, Giorgio Ventre, Ernst W. Biersack, and Srinivasan Keshav (Eds.). ACM, 25--36. Retrieved from http://dblp.uni-trier.de/db/conf/conext/conext2009.html#WuLLGZ09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu, Haitao Wu, and Yongqiang Xiong. 2015. CubicRing: Enabling one-hop failure detection and recovery for distributed in-memory storage systems. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI’15). 529--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao et al. 2015. Packet-level telemetry in large datacenter networks. In ACM SIGCOMM Computer Communication Review, Vol. 45. ACM, 479--491. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Leveraging Glocality for Fast Failure Recovery in Distributed RAM Storage

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 15, Issue 1
          Special Issue on ACM International Systems and Storage Conference (SYSTOR) 2018
          February 2019
          194 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/3311821
          • Editor:
          • Sam H. Noh
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 February 2019
          • Accepted: 1 October 2018
          • Revised: 1 July 2018
          • Received: 1 November 2017
          Published in tos Volume 15, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format