skip to main content
research-article
Open Access

Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories

Published:24 September 2020Publication History
Skip Abstract Section

Abstract

Complex storage stacks providing data compression, indexing, and analytics help leverage the massive amounts of data generated today to derive insights. It is challenging to perform this computation, however, while fully utilizing the underlying storage media. This is because, while storage servers with large core counts are widely available, single-core performance and memory bandwidth per core grow slower than the core count per die. Computational storage offers a promising solution to this problem by utilizing dedicated compute resources along the storage processing path. We present DeltaFS Indexed Massive Directories (IMDs), a new approach to computational storage. DeltaFS IMDs harvest available (i.e., not dedicated) compute, memory, and network resources on the compute nodes of an application to perform computation on data. We demonstrate the efficiency of DeltaFS IMDs by using them to dynamically reorganize the output of a real-world simulation application across 131,072 CPU cores. DeltaFS IMDs speed up reads by 1,740× while only slightly slowing down the writing of data during simulation I/O for in situ data processing.

References

  1. Google. 2012. LevelDB. Retrieved from https://github.com/google/levGoogle ScholarGoogle Scholar
  2. Oracle. 2013. A Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server. Retrieved from https://www.oracle.com/technetwork/database/exadata/exadata-dbmachine-x4-twp-2076451.pdf.Google ScholarGoogle Scholar
  3. IBM. 2014. IBM PureData System for Analytics Architecture, A Platform for High Performance Data Warehousing and Analytics. Retrieved from https://www.redbooks.ibm.com/redpapers/pdfs/redp4725.pdf.Google ScholarGoogle Scholar
  4. LANL, NERSC, SNL. 2016. APEX Workflows. Retrieved from https://www.nersc.gov/assets/apex-workflows-v2.pdf.Google ScholarGoogle Scholar
  5. LANL. 2016. LANL Trinity. Retrieved from http://www.lanl.gov/projects/trinity/.Google ScholarGoogle Scholar
  6. SNIA. 2019. Computational Storage Architecture and Programming Model. Retrieved from https://www.snia.org/sites/default/files/technical_work/PublicReview/SNIA-Computational-Storage-Architecture-and-Programming-Model-0.3R1.pdf.Google ScholarGoogle Scholar
  7. Anurag Acharya, Mustafa Uysal, and Joel Saltz. 1998. Active disks: Programming model, algorithms and evaluation. SIGOPS Oper. Syst. Rev. 32, 5 (Oct. 1998), 81--91. DOI:https://doi.org/10.1145/384265.291026Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ashok Anand, Chitra Muthukrishnan, Steven Kappes, Aditya Akella, and Suman Nath. 2010. Cheap and large CAMs for high performance data-intensive networked systems. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI’10).Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Atchley, D. Dillow, G. Shipman, P. Geoffray, J. M. Squyres, G. Bosilca, and R. Minnich. 2011. The common communication interface (CCI). In Proceedings of the IEEE Annual Symposium on High-Performance Interconnects (HOTI’11). 51--60. DOI:https://doi.org/10.1109/HOTI.2011.17Google ScholarGoogle Scholar
  10. Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’12). 53--64. DOI:https://doi.org/10.1145/2254756.2254766Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Utkarsh Ayachit, Andrew Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E. Jansen, Burlen Loring, Zarija Lukić, Suresh Menon, Dmitriy Morozov, Patrick O’Leary, Reetesh Ranjan, Michel Rasquin, Christopher P. Stone, Venkat Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, and E. Wes Bethel. 2016. Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’16). Article 79, 12 pages.Google ScholarGoogle Scholar
  12. Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. 2012. Don’t thrash: How to cache your hash on flash. Proc. VLDB Endow. 5, 11 (July 2012), 1627--1637. DOI:https://doi.org/10.14778/2350229.2350275Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. C. Bennett, H. Abbasi, P. T. Bremer, R. Grout, A. Gyulassy, T. Jin, S. Klasky, H. Kolla, M. Parashar, V. Pascucci, P. Pebay, D. Thompson, H. Yu, F. Zhang, and J. Chen. 2012. Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’12). 1--9. DOI:https://doi.org/10.1109/SC.2012.31Google ScholarGoogle Scholar
  14. J. Bent, S. Faibish, J. Ahrens, G. Grider, J. Patchett, P. Tzelnic, and J. Woodring. 2012. Jitter-free co-processing on a prototype exascale storage stack. In Proceedings of the International Conference on Massive Storage Systems and Technologies (MSST’12). 1--5. DOI:https://doi.org/10.1109/MSST.2012.6232382Google ScholarGoogle Scholar
  15. John Bent, Brad Settlemyer, and Gary Grider. 2016. Serving data to the lunatic fringe: The evolution of HPC storage. USENIX ;login: 41, 2 (June 2016).Google ScholarGoogle Scholar
  16. D. Bigelow, S. Brandt, J. Bent, and H. B. Chen. 2010. Mahanaxar: Quality of service guarantees in high-bandwidth, real-time streaming data storage. In Proceedings of the International Conference on Massive Storage Systems and Technologies (MSST’10). 1--11. DOI:https://doi.org/10.1109/MSST.2010.5496975Google ScholarGoogle Scholar
  17. Andrew D. Birrell and Bruce Jay Nelson. 1983. Implementing remote procedure calls. In Proceedings of the Ninth ACM Symposium on Operating Systems Principles (SOSP’83). 3–. DOI:https://doi.org/10.1145/800217.806609Google ScholarGoogle Scholar
  18. Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (July 1970), 422--426. DOI:https://doi.org/10.1145/362686.362692Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Boboila, Y. Kim, S. S. Vazhkudai, P. Desnoyers, and G. M. Shipman. 2012. Active flash: Out-of-core data analytics on flash storage. In Proceedings of the International Conference on Massive Storage Systems and Technologies (MSST 12). 1--12. DOI:https://doi.org/10.1109/MSST.2012.6232366Google ScholarGoogle Scholar
  20. Jeff Bonwick, Matt Ahrens, Val Henson, Mark Maybee, and Mark Shellenbaum. 2003. The Zettabyte File System. Technical Report. Sun Microsystems.Google ScholarGoogle Scholar
  21. K. J. Bowers, B. J. Albright, L. Yin, B. Bergen, and T. J. T. Kwan. 2008. Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Phys. Plasmas 15, 5 (2008), 7.Google ScholarGoogle ScholarCross RefCross Ref
  22. Surendra Byna, Jerry Chou, Oliver Rübel, Prabhat, Homa Karimabadi, William S. Daughton, Vadim Roytershteyn, E. Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, Arie Shoshani, Andrew Uselton, and Kesheng Wu. 2012. Parallel I/O, analysis, and visualization of a trillion particle simulation. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). Article 59, 12 pages. DOI:https://doi.org/10.1109/SC.2012.92Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Suren Byna, Robert Sisneros, Kalyana Chadalavada, and Quincey Koziol. 2015. Tuning parallel I/O on blue waters for writing 10 trillion particles. In Proceedings of the Cray User Group (CUG’15). Retrieved from https://cug.org/proceedings/cug2015_proceedings/includes/files/pap120-file2.pdf.Google ScholarGoogle Scholar
  24. Suren Byna, A. Uselton, D. Knaak Prabhat, and Y. He. 2013. Trillion particles, 120,000 cores, and 350 TBs: Lessons learned from a hero I/O run on Hopper. In Proceedings of the Cray User Group (CUG’13). Retrieved from https://cug.org/proceedings/cug2013_proceedings/includes/files/pap107-file2.pdf.Google ScholarGoogle Scholar
  25. P. Carns, W. Ligon, R. Ross, and P. Wyckoff. 2005. BMI: A network abstraction layer for parallel I/O. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS’05). 1--8. DOI:https://doi.org/10.1109/IPDPS.2005.128Google ScholarGoogle Scholar
  26. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 205--218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Chen, M. Lang, L. Ionkov, and Y. Chen. 2016. Active burst-buffer: In-transit processing integrated into hierarchical storage. In Proceedings of the IEEE International Conference on Networking Architecture and Storage (NAS’16). 1--10. DOI:https://doi.org/10.1109/NAS.2016.7549390Google ScholarGoogle Scholar
  28. Jacqueline H. Chen, Alok Choudhary, Bronis De Supinski, Matthew DeVries, Evatt R. Hawkes, Scott Klasky, Wei-Keng Liao, Kwan-Liu Ma, John Mellor-Crummey, Norbert Podhorszki, et al. 2009. Terascale direct numerical simulations of turbulent combustion using S3D. Comput. Sci. Discov. 2, 1 (2009), 015001.Google ScholarGoogle ScholarCross RefCross Ref
  29. Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R. Ganger. 2013. Active disk meets flash: A case for intelligent SSDs. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS’13). 91--102. DOI:https://doi.org/10.1145/2464996.2465003Google ScholarGoogle Scholar
  30. Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E. Wes Bethel, Arie Shoshani, Oliver Rübel, Prabhat, and Rob D. Ryne. 2011. Parallel index and query for large scale data analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’11). Article 30, 11 pages. DOI:https://doi.org/10.1145/2063384.2063424Google ScholarGoogle Scholar
  31. J. Chou, K. Wu, and Prabhat. 2011. FastQuery: A parallel indexing system for scientific data. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’11). 455--464. DOI:https://doi.org/10.1109/CLUSTER.2011.86Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal navigable key-value store. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’17). 79--94. DOI:https://doi.org/10.1145/3035918.3064054Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Opearting Systems Design and Implementation (OSDI’04).Google ScholarGoogle Scholar
  34. Peter J. Desnoyers and Prashant Shenoy. 2007. Hyperion: High volume stream archival for retrospective querying. In Proceedings of the 2007 USENIX Annual Technical Conference (USENIX ATC’07). Article 4, 14 pages.Google ScholarGoogle Scholar
  35. Ananth Devulapalli, Iyyappa Murugandi, Da Xu, and Pete Wyckoff. 2009. Design of an Intelligent Object-based Storage Device. Technical Report. Ohio Supercomputer Center.Google ScholarGoogle Scholar
  36. Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query processing on smart SSDs: Opportunities and challenges. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13). 1221--1230. DOI:https://doi.org/10.1145/2463676.2465295Google ScholarGoogle Scholar
  37. Douglas Doerfler, Brian Austin, Brandon Cook, Jack Deslippe, Krishna Kandalla, and Peter Mendygral. 2017. Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC. In Proceedings of the Cray User Group (CUG’17). Retrieved from https://cug.org/proceedings/cug2017_proceedings/includes/files/pap117s2-file1.pdf.Google ScholarGoogle Scholar
  38. Bin Dong, Surendra Byna, and Kesheng Wu. 2016. SDS-sort: Scalable dynamic skew-aware parallel sorting. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC’16). 57--68. DOI:https://doi.org/10.1145/2907294.2907300Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Dorier, G. Antoniu, F. Cappello, M. Snir, and L. Orf. 2012. Damaris: How to efficiently leverage multicore parallelism to achieve scalable, jitter-free I/O. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’12). 155--163. DOI:https://doi.org/10.1109/CLUSTER.2012.26Google ScholarGoogle Scholar
  40. Robert Escriva, Bernard Wong, and Emin Gün Sirer. 2012. HyperDex: A distributed, searchable key-value store. In Proceedings of the ACM SIGCOMM Conference on Applications Technologies Architectures and Protocols for Computer Communication (SIGCOMM’12). 25--36. DOI:https://doi.org/10.1145/2342356.2342360Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Bin Fan, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. 2014. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT’14). 75--88. DOI:https://doi.org/10.1145/2674005.2674994Google ScholarGoogle Scholar
  42. Hugh N. Greenberg, John Bent, and Gary Grider. 2015. MDHIM: A parallel key/value framework for HPC. In Proceedings of the 7th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’15).Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P. Grun, S. Hefty, S. Sur, D. Goodell, R. D. Russell, H. Pritchard, and J. M. Squyres. 2015. A brief introduction to the openfabrics interfaces—A new network API for maximizing high performance application efficiency. In Proceedings of the IEEE Annual Symposium on High-Performance Interconnects (HOTI’15). 34--39. DOI:https://doi.org/10.1109/HOTI.2015.19Google ScholarGoogle Scholar
  44. Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. 2016. Biscuit: A framework for near-data processing of big data workloads. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). 153--165. DOI:https://doi.org/10.1109/ISCA.2016.23Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tyler Harter, Dhruba Borthakur, Siying Dong, Amitanand Aiyer, Liyin Tang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Analysis of HDFS under HBase: A Facebook messages case study. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 199--212.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Larry Huston, Rahul Sukthankar, Rajiv Wickremesinghe, M. Satyanarayanan, Gregory R. Ganger, Erik Riedel, and Anastassia Ailamaki. 2004. Diamond: A storage architecture for early discard in interactive search. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04).Google ScholarGoogle Scholar
  47. Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind, and Sungjin Lee. 2020. PinK: High-speed in-storage key-value store with bounded tails. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’20).Google ScholarGoogle Scholar
  48. H. V. Jagadish, P. P. S. Narayan, S. Seshadri, S. Sudarshan, and Rama Kanneganti. 1997. Incremental organization for data recording and warehousing. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB’97). 16--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. Jin, H. Tseng, Y. Papakonstantinou, and S. Swanson. 2017. KAML: A flexible, high-performance key-value SSD. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’17). 373--384. DOI:https://doi.org/10.1109/HPCA.2017.15Google ScholarGoogle Scholar
  50. Y. Kang, Y. Kee, E. L. Miller, and C. Park. 2013. Enabling cost-effective data processing with smart SSD. In Proceedings of the International Conference on Massive Storage Systems and Technologies (MSST’13). 1--12. DOI:https://doi.org/10.1109/MSST.2013.6558444Google ScholarGoogle Scholar
  51. Yangwook Kang, Rekha Pitchumani, Pratik Mishra, Yang-suk Kee, Francisco Londono, Sangyoon Oh, Jongyeol Lee, and Daniel D. G. Lee. 2019. Towards building a high-performance, scale-in key-value storage system. In Proceedings of the 12th ACM International Conference on Systems and Storage (SYSTOR’19). 144--154. DOI:https://doi.org/10.1145/3319647.3325831Google ScholarGoogle Scholar
  52. Kimberly Keeton, David A. Patterson, and Joseph M. Hellerstein. 1998. A case for intelligent disks (IDISKs). SIGMOD Rec. 27, 3 (Sept. 1998), 42--52. DOI:https://doi.org/10.1145/290593.290602Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. J. Kim, H. Abbasi, L. Chacón, C. Docan, S. Klasky, Q. Liu, N. Podhorszki, A. Shoshani, and K. Wu. 2011. Parallel in situ indexing for data-intensive computing. In Proceedings of the IEEE Symposium on Large Data Analysis and Visualization (LDAV’11). 65--72. DOI:https://doi.org/10.1109/LDAV.2011.6092319Google ScholarGoogle Scholar
  54. C. Lee, H. Kang, D. Park, S. Park, Y. Kim, J. Noh, W. Chung, and K. Park. 2019. iLSM-SSD: An intelligent LSM-tree-based key-value SSD for data analytics. In Proceedings of the IEEE 27th International Symposium on Modeling Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’19). 384--395. DOI:https://doi.org/10.1109/MASCOTS.2019.00048Google ScholarGoogle Scholar
  55. S. Lee, J. Park, K. Fleming, Arvind, and J. Kim. 2011. Improving performance and lifetime of solid-state drives using hardware-accelerated compression. IEEE Trans. Consumer Electr. 57, 4 (Nov. 2011), 1732--1739. DOI:https://doi.org/10.1109/TCE.2011.6131148Google ScholarGoogle ScholarCross RefCross Ref
  56. M. Li, S. S. Vazhkudai, A. R. Butt, F. Meng, X. Ma, Y. Kim, C. Engelmann, and G. Shipman. 2010. Functional partitioning to optimize end-to-end performance on many-core architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’10). 1--12. DOI:https://doi.org/10.1109/SC.2010.28Google ScholarGoogle Scholar
  57. Siyang Li, Youyou Lu, Jiwu Shu, Yang Hu, and Tao Li. 2017. LocoFS: A loosely-coupled metadata service for distributed file systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’17). Article 4, 12 pages. DOI:https://doi.org/10.1145/3126908.3126928Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Xiaozhou Li, David G. Andersen, Michael Kaminsky, and Michael J. Freedman. 2014. Algorithmic improvements for fast concurrent cuckoo hashing. In Proceedings of the 9th European Conference on Computer Systems (EuroSys’14). Article 27, 14 pages. DOI:https://doi.org/10.1145/2592798.2592820Google ScholarGoogle Scholar
  59. Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 1--13. DOI:https://doi.org/10.1145/2043556.2043558Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. 2012. On the role of burst buffers in leadership-class storage systems. In Proceedings of the International Conference on Massive Storage Systems and Technologies (MSST’12). 1--11. DOI:https://doi.org/10.1109/MSST.2012.6232369Google ScholarGoogle Scholar
  61. J. Lofstead, I. Jimenez, C. Maltzahn, Q. Koziol, J. Bent, and E. Barton. 2016. DAOS and friends: A proposal for an exascale storage system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’16). 585--596. DOI:https://doi.org/10.1109/SC.2016.49Google ScholarGoogle Scholar
  62. J. Lofstead, F. Zheng, S. Klasky, and K. Schwan. 2009. Adaptable, metadata rich IO methods for portable high performance IO. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS’09). 1--10. DOI:https://doi.org/10.1109/IPDPS.2009.5161052Google ScholarGoogle Scholar
  63. Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating keys from values in SSD-conscious storage. In Proceedings of the 14th Usenix Conference on File and Storage Technologies (FAST’16). 133--148.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Chen Luo and Michael J. Carey. 2020. LSM-based storage techniques: A survey. VLDB J. 29, 1 (Jan. 2020), 393--418. DOI:https://doi.org/10.1007/s00778-019-00555-yGoogle ScholarGoogle ScholarCross RefCross Ref
  65. Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, and Raju Rangaswami. 2015. NVMKV: A scalable, lightweight, FTL-aware key-value store. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15). 207--219.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. M. Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12, 10 (Oct. 2001), 1094--1104. DOI:https://doi.org/10.1109/71.963420Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Ron A. Oldfield, Gregory D. Sjaardema, Gerald F. Lofstead, II, and Todd Kordenbrock. 2012. Trilinos I/O support trios. Sci. Program. 20, 2 (Apr. 2012), 181--196. DOI:https://doi.org/10.1155/2012/842791Google ScholarGoogle Scholar
  68. Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Info. 33, 4 (June 1996), 351--385. DOI:https://doi.org/10.1007/s002360050048Google ScholarGoogle Scholar
  69. Andrey Ovsyannikov, Melissa Romanus, Brian Van Straalen, Gunther H. Weber, and David Trebotich. 2016. Scientific workflows at datawarp-speed: Accelerated data-intensive science using NERSC’s burst buffer. In Proceedings of the 1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS’16). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  70. Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. J. Algor. 51, 2 (May 2004), 122--144. DOI:https://doi.org/10.1016/j.jalgor.2003.12.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro. 2017. A general-purpose counting filter: Making every bit count. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’17). 775--787. DOI:https://doi.org/10.1145/3035918.3035963Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Juan Piernas, Jarek Nieplocha, and Evan J. Felix. 2007. Evaluation of active storage strategies for the lustre parallel file system. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’07). Article 28, 10 pages. DOI:https://doi.org/10.1145/1362622.1362660Google ScholarGoogle Scholar
  73. Kai Ren and Garth Gibson. 2013. TABLEFS: Enhancing metadata efficiency in the local file system. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’13). 145--156.Google ScholarGoogle Scholar
  74. Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. 2017. SlimDB: A space-efficient key-value storage engine for semi-sorted data. Proc. VLDB Endow. 10, 13 (Sept. 2017), 2037--2048. DOI:https://doi.org/10.14778/3151106.3151108Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. 2014. IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’14). 237--248. DOI:https://doi.org/10.1109/SC.2014.25Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. E. Riedel, C. Faloutsos, G. A. Gibson, and D. Nagle. 2001. Active disks for large-scale data processing. Computer 34, 6 (June 2001), 68--74. DOI:https://doi.org/10.1109/2.928624Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), 26--52. DOI:https://doi.org/10.1145/146941.146943Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Robert B. Ross, George Amvrosiadis, Philip Carns, Charles D. Cranor, Matthieu Dorier, Kevin Harms, Greg Ganger, Garth Gibson, Samuel K. Gutierrez, Robert Latham, Bob Robey, Dana Robinson, Bradley Settlemyer, Galen Shipman, Shane Snyder, Jerome Soumagne, and Qing Zheng. 2020. Mochi: Composing data services for high-performance computing environments. J. Comput. Sci. Technol. 35, 1, Article 121 (2020), 23 pages. DOI:https://doi.org/10.1007/s11390-020-9802-0Google ScholarGoogle ScholarCross RefCross Ref
  79. M. T. Runde, W. G. Stevens, P. A. Wortman, and J. A. Chandy. 2012. An active storage framework for object storage devices. In Proceedings of the International Conference on Massive Storage Systems and Technologies (MSST’12). 1--12. DOI:https://doi.org/10.1109/MSST.2012.6232372Google ScholarGoogle Scholar
  80. Philip Schwan. 2003. Lustre: Building a file system for 1000-node clusters. In Proceedings of the Ottawa Linux Symposium (OLS’03). 380--386.Google ScholarGoogle Scholar
  81. Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’12). 217--228. DOI:https://doi.org/10.1145/2213836.2213862Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Pradeep Shetty, Richard Spillane, Ravikant Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. 2013. Building workload-independent storage with VT-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 17--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai, Devesh Tiwari, Ali Anwar, Ali R. Butt, and Lavanya Ramakrishnan. 2015. AnalyzeThis: An analysis workflow-aware storage system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’15). Article 20, 12 pages. DOI:https://doi.org/10.1145/2807591.2807622Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. A. Sodani, R. Gramunt, J. Corbal, H. S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y. C. Liu. 2016. Knights landing: Second-generation Intel Xeon phi product. IEEE Micro 36, 2 (Mar. 2016), 34--46. DOI:https://doi.org/10.1109/MM.2016.25Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. S. W. Son, S. Lang, P. Carns, R. Ross, R. Thakur, B. Ozisikyilmaz, P. Kumar, W. Liao, and A. Choudhary. 2010. Enabling active storage on parallel I/O software stacks. In Proceedings of the International Conference on Massive Storage Systems and Technologies (MSST’10). 1--12. DOI:https://doi.org/10.1109/MSST.2010.5496981Google ScholarGoogle Scholar
  86. J. Soumagne, D. Kimpe, J. Zounmevo, M. Chaarawi, Q. Koziol, A. Afsahi, and R. Ross. 2013. Mercury: Enabling remote procedure call for high-performance computing. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’13). 1--8. DOI:https://doi.org/10.1109/CLUSTER.2013.6702617Google ScholarGoogle Scholar
  87. Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 119--132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw. 2010. Accelerating parallel analysis of scientific simulation data via Zazen. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10).Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. V. Vishwanath, M. Hereld, V. Morozov, and M. E. Papka. 2011. Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’11). 1--11. DOI:https://doi.org/10.1145/2063384.2063409Google ScholarGoogle Scholar
  90. V. Vishwanath, M. Hereld, and M. E. Papka. 2011. Toward simulation-time data analysis and I/O acceleration on leadership-class systems. In Proceedings of the IEEE Symposium on Large Data Analysis and Visualization (LDAV’11). 9--14. DOI:https://doi.org/10.1109/LDAV.2011.6092178Google ScholarGoogle Scholar
  91. Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2017. An experimental study of bitmap compression vs. inverted list compression. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’17). 993--1008. DOI:https://doi.org/10.1145/3035918.3064007Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Jianguo Wang, Dongchul Park, Yang-Suk Kee, Yannis Papakonstantinou, and Steven Swanson. 2016. SSD in-storage computing for list intersection. In Proceedings of the 12th International Workshop on Data Management on New Hardware (DaMoN’16). Article 4, 7 pages. DOI:https://doi.org/10.1145/2933349.2933353Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Sage A. Weil, Andrew W. Leung, Scott A. Brandt, and Carlos Maltzahn. 2007. RADOS: A scalable, reliable storage service for petabyte-scale storage clusters. In Proceedings of the 2nd International Workshop on Petascale Data Storage (PDSW’07). 35--44. DOI:https://doi.org/10.1145/1374596.1374606Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An intelligent storage engine with support for advanced SQL offloading. Proc. VLDB Endow. 7, 11 (July 2014), 963--974. DOI:https://doi.org/10.14778/2732967.2732972Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2006. Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31, 1 (Mar. 2006), 1--38. DOI:https://doi.org/10.1145/1132863.1132864Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. S. Wu, K. Lin, and L. Chang. 2018. KVSSD: Close integration of LSM trees and flash translation layer for write-efficient KV store. In Proceedings of the Design Automation Test in Europe Conference Exhibition (DATE’18). 563--568. DOI:https://doi.org/10.23919/DATE.2018.8342070Google ScholarGoogle Scholar
  97. Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15). 71--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. X. Yu, M. Youill, M. Woicik, A. Ghanem, M. Serafini, A. Aboulnaga, and M. Stonebraker. 2020. PushdownDB: Accelerating a DBMS using S3 computation. In Proceedings of the IEEE 36th International Conference on Data Engineering (ICDE’20). 1802--1805. DOI:https://doi.org/10.1109/ICDE48307.2020.00174Google ScholarGoogle Scholar
  99. Yulai Xie, K. Muniswamy-Reddy, D. Feng, D. D. E. Long, Yangwook Kang, Z. Niu, and Zhipeng Tan. 2011. Design and evaluation of Oasis: An active storage framework based on T10 OSD standard. In Proceedings of the International Conference on Massive Storage Systems and Technologies (MSST’11). 1--12. DOI:https://doi.org/10.1109/MSST.2011.5937220Google ScholarGoogle Scholar
  100. Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical range query filtering with fast succinct tries. In Proceedings of the International Conference on Management of Data (SIGMOD’18). 323--336. DOI:https://doi.org/10.1145/3183713.3196931Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. F. Zheng, H. Abbasi, C. Docan, J. Lofstead, Q. Liu, S. Klasky, M. Parashar, N. Podhorszki, K. Schwan, and M. Wolf. 2010. PreDatA—Preparatory data analytics on peta-scale machines. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS’10). 1--12. DOI:https://doi.org/10.1109/IPDPS.2010.5470454Google ScholarGoogle Scholar
  102. F. Zheng, H. Yu, C. Hantas, M. Wolf, G. Eisenhauer, K. Schwan, H. Abbasi, and S. Klasky. 2013. GoldRush: Resource efficient in situ scientific data analytics using fine-grained interference aware execution. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’13). 1--12. DOI:https://doi.org/10.1145/2503210.2503279Google ScholarGoogle Scholar
  103. F. Zheng, H. Zou, G. Eisenhauer, K. Schwan, M. Wolf, J. Dayal, T. A. Nguyen, J. Cao, H. Abbasi, S. Klasky, N. Podhorszki, and H. Yu. 2013. FlexIO: I/O middleware for location-flexible scientific data analytics. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS’13). 320--331. DOI:https://doi.org/10.1109/IPDPS.2013.46Google ScholarGoogle Scholar
  104. Qing Zheng, George Amvrosiadis, Saurabh Kadekodi, Garth A. Gibson, Charles D. Cranor, Bradley W. Settlemyer, Gary Grider, and Fan Guo. 2017. Software-defined storage for fast trajectory queries using a DeltaFS indexed massive directory. In Proceedings of the 2nd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS’17). 7--12. DOI:https://doi.org/10.1145/3149393.3149398Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Qing Zheng, Kai Ren, Garth Gibson, Bradley W. Settlemyer, and Gary Grider. 2015. DeltaFS: Exascale file systems scale better without dedicated servers. In Proceedings of the 10th Parallel Data Storage Workshop (PDSW’15). 1--6. DOI:https://doi.org/10.1145/2834976.2834984Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Aviad Zuck, Sivan Toledo, Dmitry Sotnikov, and Danny Harnik. 2014. Compression and SSDs: Where and how? In Proceedings of the 2nd Workshop on Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW’14).Google ScholarGoogle Scholar

Index Terms

  1. Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Storage
            ACM Transactions on Storage  Volume 16, Issue 4
            Special Section on Computational Storage and Regular Papers
            November 2020
            185 pages
            ISSN:1553-3077
            EISSN:1553-3093
            DOI:10.1145/3426401
            • Editor:
            • Sam H. Noh
            Issue’s Table of Contents

            Copyright © 2020 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 September 2020
            • Accepted: 1 August 2020
            • Revised: 1 July 2020
            • Received: 1 February 2020
            Published in tos Volume 16, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format