skip to main content
research-article
Public Access

Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs

Published:27 October 2017Publication History
Skip Abstract Section

Abstract

Flash storage has become the mainstream destination for storage users. However, SSDs do not always deliver the performance that users expect. The core culprit of flash performance instability is the well-known garbage collection (GC) process, which causes long delays as the SSD cannot serve (blocks) incoming I/Os, which then induces the long tail latency problem. We present ttFlash as a solution to this problem. ttFlash is a “tiny-tail” flash drive (SSD) that eliminates GC-induced tail latencies by circumventing GC-blocked I/Os with four novel strategies: plane-blocking GC, rotating GC, GC-tolerant read, and GC-tolerant flush. These four strategies leverage the timely combination of modern SSD internal technologies such as powerful controllers, parity-based redundancies, and capacitor-backed RAM. Our strategies are dependent on the use of intra-plane copyback operations. Through an extensive evaluation, we show that ttFlash comes significantly close to a “no-GC” scenario. Specifically, between the 99 and 99.99th percentiles, ttFlash is only 1.0 to 2.6× slower than the no-GC case, while a base approach suffers from 5–138× GC-induced slowdowns.

References

  1. Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. 2017. tinyTailFlash Source Code. Retrieved from http://ucare.cs.uchicago.edu/projects/tinyTailFlash/.Google ScholarGoogle Scholar
  2. Vasily Tarasov, Erez Zadok, and Spencer Shepler. 2016. Filebench. Retrieved from http://filebench.sourceforge.net/wiki/index.php/Main_Page.Google ScholarGoogle Scholar
  3. IOTTA TWG. 2008. SNIA IOTTA: Storage Networking Industry Association’s Input/Output Traces, Tools, and Analysis Trace Repository. Retrieved from http://iotta.snia.org.Google ScholarGoogle Scholar
  4. The OpenSSD Project. 2016. SungKyunkwan University Computer Systems Laboratory. Retrieved from http://www.openssd-project.org.Google ScholarGoogle Scholar
  5. Micron. 2006. NAND Flash 101: An Introduction to NAND Flash and How to Design It In to Your Next Product. Retrieved from https://www.micron.com/∼/media/documents/products/technical-note/nand-flash/tn2919_nand_101.pdf.Google ScholarGoogle Scholar
  6. Google. 2012. Google: Taming The Long Latency Tail—When More Machines Equals Worse Results. Retrieved from http://highscalability.com/blog/2012/3/12/google-taming-the-long-latency-tail-when-more-machines-equal.html.Google ScholarGoogle Scholar
  7. Crucial. 2013. The Crucial M550 SSD. Retrieved from http://www.crucial.com/usa/en/storage-ssd-m550.Google ScholarGoogle Scholar
  8. Jeff Barr. 2014. New SSD-Backed Elastic Block Storage. Retrieved from https://aws.amazon.com/blogs/aws/new-ssd-backed-elastic-block-storage/.Google ScholarGoogle Scholar
  9. Jan Willem Aldershoff. 2014. Report: SSD market doubles, optical drive shipment rapidly down. Retrieved from http://www.myce.com/news/report-ssd-market-doubles-optical-drive-shipment-rapidly-down-70415/.Google ScholarGoogle Scholar
  10. Sandisk. 2014. Sandisk: Pre-emptive garbage collection of memory blocks. Retrieved from https://www.google.com/patents/US8626986.Google ScholarGoogle Scholar
  11. Trevor Pott. 2014. Supercapacitors have the power to save you from data loss. Retrieved from http://www.theregister.co.uk/2014/09/24/storage_supercapacitors/.Google ScholarGoogle Scholar
  12. Micron. 2015. L74A NAND Datasheet. Retrieved from https://www.micron.com/parts/nand-flash/mass-storage/mt29f256g08cmcabh2-12z.Google ScholarGoogle Scholar
  13. Micron. 2015. Micron P420m Enterprise PCIe SSD Review. Retrieved from http://www.storagereview.com/micron_p420m_enterprise_pcie_ssd_review.Google ScholarGoogle Scholar
  14. Pedro Hernandez. 2015. Microsoft Rolls Out SSD-Backed Azure Premium Cloud Storage. Retrieved from http://www.eweek.com/cloud/microsoft-rolls-out-ssd-backed-azure-premium-cloud-storage.html.Google ScholarGoogle Scholar
  15. Zsolt Kerekes. 2015. What Happens Inside SSDs When the Power Goes Down? Retrieved from http://www.army-technology.com/contractors/data_recording/solidata-technology/presswhat-happens-ssds-power-down.html.Google ScholarGoogle Scholar
  16. Robin Harris. 2015. Why SSDs Don’t Perform. Retrieved from http://www.zdnet.com/article/why-ssds-dont-perform/.Google ScholarGoogle Scholar
  17. LightNVM. 2016. Open-Channel Solid State Drives. Retrieved from http://lightnvm.io/.Google ScholarGoogle Scholar
  18. Zsolt Kerekes. 2016. What’s the State of DWPD? Endurance in Industry Leading Enterprise SSDs. Retrieved from http://www.storagesearch.com/dwpd.html.Google ScholarGoogle Scholar
  19. Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference (ATC’08).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. George Amvrosiadis, Angela Demke Brown, and Ashvin Goel. 2015. Opportunistic storage maintenance. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yitzhak Birk. Random RAIDs with selective exploitation of redundancy for high performance video servers. In Proceedings of 7th International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV’97). Google ScholarGoogle ScholarCross RefCross Ref
  22. Matias Bjørling, Javier González, and Philippe Bonnet. LightNVM: The linux open-channel SSD subsystem. In Proceedings of the 15th USENIX Symposium on File and Storage Technologies (FAST’17).Google ScholarGoogle Scholar
  23. Li-Pin Chang, Tei-Wei Kuo, and Shi-Wu Lo. 2004. Real-time garbage collection for flash-memory storage systems of real-time embedded systems. ACM Trans. Embed. Comput. Syst. 3, 4 (November 2004), 1–26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. John Colgrove, John D. Davis, John Hayes, Ethan L. Miller, Cary Sandvig, Russell Sears, Ari Tamches, Neil Vachharajani, and Feng Wang. Purity: Building fast, highly-available enterprise flash storage from commodity components. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD’15).Google ScholarGoogle Scholar
  25. Jeffrey Dean and Luiz Andr Barroso. 2013. The tail at scale. Communications of The ACM 56, 2 (February 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07).Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Thanh Do, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, and Haryadi S. Gunawi. Limplock: Understanding the impact of limpware on scale-out cloud systems. In Proceedings of the 4th ACM Symposium on Cloud Computing (SoCC’13).Google ScholarGoogle Scholar
  28. Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, Thanh Do, Jeffry Adityatama, Kurnia J. Eliazar, Agung Laksono, Jeffrey F. Lukman, Vincentius Martin, and Anang D. Satria. What bugs live in the cloud? A study of 3000+ issues in cloud systems. cbs. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC’14).Google ScholarGoogle Scholar
  29. Haryadi S. Gunawi, Mingzhe Hao, Riza O. Suminto, Agung Laksono, Anang D. Satria, Jeffry Adityatama, and Kurnia J. Eliazar. Why does the cloud stop computing? Lessons from hundreds of service outages. In Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC’16).Google ScholarGoogle Scholar
  30. Aayush Gupta, Raghav Pisolkar, Bhuvan Urgaonkar, and Anand Sivasubramaniam. Leveraging value locality in optimizing NAND flash-based SSDs. In Proceedings of the 9th USENIX Symposium on File and Storage Technologies (FAST’11).Google ScholarGoogle Scholar
  31. Mingzhe Hao, Gokul Soundararajan, Deepak Kenchammana-Hosekote, Andrew A. Chien, and Haryadi S. Gunawi. 2016. The tail at store: A revelation from millions of hours of disk and SSD deployments. In Proceedings of the 14th USENIX Symposium on File and Storage Technologies (FAST’16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jun He, Duy Nguyen, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2015. Reducing file system tail latencies with chopper. In Proceedings of the 13th USENIX Symposium on File and Storage Technologies (FAST’15).Google ScholarGoogle Scholar
  33. Yang Hu, Hong Jiang, Dan Feng, Lei Tian, Hao Luo, and Shuping Zhang. Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity. In Proceedings of the 25th International Conference on Supercomputing (ICS’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ping Huang, Guanying Wu, Xubin He, and Weijun Xiao. An aggressive worn-out flash block management scheme to alleviate SSD performance degradation. In Proceedings of the 2014 EuroSys Conference (EuroSys’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sheng-Min Huang and Li-Pin Chang. Exploiting page correlations for write buffering in page-mapping multichannel SSDs. In Proceedings of the IEEE ACM Conference on Transactions on Embedded Computing Systems (TECS’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Soojun Im and Dongkun Shin. 2010. Flash-aware RAID techniques for dependable and high-performance flash memory SSD. IEEE Trans. Comput. 60, 1 (Oct. 2010).Google ScholarGoogle Scholar
  37. Woon-Hak Kang, Sang-Won Lee, Bongki Moon, Gi-Hwan Oh, and Changwoo Min. X-FTL: Transactional FTL for SQLite databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13).Google ScholarGoogle Scholar
  38. Swaroop Kavalanekar, Bruce Worthington, Qi Zhang, and Vishal Sharda. 2008. Characterization of storage workload traces from production windows servers. In IEEE International Symposium on Workload Characterization (IISWC’08). Google ScholarGoogle ScholarCross RefCross Ref
  39. Hyojun Kim and Seongjun Ahn. BPLRU: A buffer management scheme for improving random writes in flash storage. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST’08).Google ScholarGoogle Scholar
  40. Jaeho Kim, Donghee Lee, and Sam H. Noh. 2015. Towards SLO complying SSDs through OPS isolation. In Proceedings of the 13th USENIX Symposium on File and Storage Technologies (FAST’15).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jaeho Kim, Jongmin Lee, Jongmoo Choi, Donghee Lee, and Sam H. Noh. Improving SSD reliability with RAID via elastic striping and anywhere parity. In Proceedings of the International Conference on Dependable Systems and Networks (DSN’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Youngjae Kim, Sarp Oral, Galen M. Shipman, Junghee Lee, David A. Dillow, and Feiyi Wang. Harmonia: A globally coordinated garbage collector for arrays of solid-state drives. In Proceedings of the 27th IEEE Symposium on Massive Storage Systems and Technologies (MSST’11).Google ScholarGoogle Scholar
  43. Changman Lee, Dongho Sim, Joo-Young Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Symposium on File and Storage Technologies (FAST’15).Google ScholarGoogle Scholar
  44. Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim. A semi-preemptive garbage collector for solid state drives. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sehwan Lee, Bitna Lee, Kern Koh, and Hyokyung Bahn. A lifespan-aware reliability scheme for RAID-based flash storage. In Proceedings of the 2011 ACM Symposium on Applied Computing (SAC’11).Google ScholarGoogle Scholar
  46. Yangsup Lee, Sanghyuk Jung, and Yong Ho Song. FRA: A flash-aware redundancy array of flash storage devices. In Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System (CODES+ISSS’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Fabio Margaglia, Gala Yadgar, Eitan Yaakobi, Yue Li, Assaf Schuster, and Andr Brinkmann. 2016. The devil is in the details: Implementing flash page reuse with WOM codes. In Proceedings of the 14th USENIX Symposium on File and Storage Technologies (FAST’16).Google ScholarGoogle Scholar
  48. Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom. 2012. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST’12).Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. 2014. SDF: Software-defined flash for web-scale internet storage system. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14).Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Mohit Saxena, Yiying Zhang, Michael M. Swift, Andrea C. Arpaci Dusseau, and Remzi H. Arpaci Dusseau. 2013. Getting real: Lessons in transitioning research simulations into hardware systems. In Proceedings of the 11th USENIX Symposium on File and Storage Technologies (FAST’13).Google ScholarGoogle Scholar
  51. Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash reliability in production: The expected and the unexpected. In Proceedings of the 14th USENIX Symposium on File and Storage Technologies (FAST’16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Dimitris Skourtis, Dimitris Achlioptas, Noah Watkins, Carlos Maltzahn, and Scott Brandt. Flash on rails: Consistent flash performance through redundancy. In Proceedings of the 2014 USENIX Annual Technical Conference (ATC’14).Google ScholarGoogle Scholar
  53. Devesh Tiwari, Simona Boboila, Sudharshan Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solihin. 2013. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In Proceedings of the 11th USENIX Symposium on File and Storage Technologies (FAST’13).Google ScholarGoogle Scholar
  54. Guanying Wu and Xubin He. 2012. Reducing SSD read latency via NAND flash program and erase suspension. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST’12).Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Gala Yadgar, Eitan Yaakobi, and Assaf Schuster. 2015. Write once, get 50% free: Saving SSD erase costs using WOM codes. In Proceedings of the 13th USENIX Symposium on File and Storage Technologies (FAST’15).Google ScholarGoogle Scholar
  56. Suli Yang, Tyler Harter, Nishant Agrawal, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Samer Al-Kiswany, Rini T. Kaushik, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2015. Split-level I/O scheduling. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15).Google ScholarGoogle Scholar
  57. Jinsoo Yoo, Youjip Won, Joongwoo Hwang, Sooyong Kang, Jongmoo Choi, Sungroh Yoon, and Jaehyuk Cha. VSSIM: Virtual machine based SSD simulator. In Proceedings of the 29th IEEE Symposium on Massive Storage Systems and Technologies (MSST’13).Google ScholarGoogle Scholar

Index Terms

  1. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Storage
        ACM Transactions on Storage  Volume 13, Issue 3
        Special Issue on FAST 2017 and Regular Papers
        August 2017
        265 pages
        ISSN:1553-3077
        EISSN:1553-3093
        DOI:10.1145/3141876
        • Editor:
        • Sam H. Noh
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 October 2017
        • Received: 1 June 2017
        • Accepted: 1 June 2017
        Published in tos Volume 13, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader