skip to main content
10.1145/2834976.2834984acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

DeltaFS: exascale file systems scale better without dedicated servers

Published:15 November 2015Publication History

ABSTRACT

High performance computing fault tolerance depends on scalable parallel file system performance. For more than a decade scalable bandwidth has been available from the object storage systems that underlie modern parallel file systems, and recently we have seen demonstrations of scalable parallel metadata using dynamic partitioning of the namespace over multiple metadata servers. But even these scalable parallel file systems require significant numbers of dedicated servers, and some workloads still experience bottlenecks. We envision exascale parallel file systems that do not have any dedicated server machines. Instead a parallel job instantiates a file system namespace service in client middleware that operates on only scalable object storage and communicates with other jobs by sharing or publishing namespace snapshots. Experiments shows that our serverless file system design, DeltaFS, performs metadata operations orders of magnitude faster than traditional file system architectures.

References

  1. S. Lang et al. "I/O Performance Challenges at Leadership Scale". In: SC. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Trinity. http://www.lanl.gov/projects/trinity/.Google ScholarGoogle Scholar
  3. N. Ali et al. "Scalable I/O forwarding framework for high-performance computing systems". In: CLUSTER. 2009.Google ScholarGoogle Scholar
  4. N. Liu et al. "On the role of burst buffers in leadership-class storage systems". In: MSST. 2012.Google ScholarGoogle Scholar
  5. P. Schwan. "Lustre: Building a file system for 1000-node clusters". In: Linux Symposium. 2003.Google ScholarGoogle Scholar
  6. F. Schmuck and R. Haskin. "GPFS: A Shared-Disk File System for Large Computing Clusters". In: FAST. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Welch et al. "Scalable Performance of the Panasas Parallel File System". In: FAST. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. H. Carns et al. "PVFS: A parallel file system for Linux clusters". In: Linux Showcase and Conference. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Titan. https://www.olcf.ornl.gov/computing-resources/titan-cray-xk7/.Google ScholarGoogle Scholar
  10. S. R. Alam et al. "Parallel I/O and the metadata wall". In: PDSW. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Latham, R. Ross, and R. Thakur. "The impact of file systems on MPI-IO scalability". In: EuroPVM/MPI. 2004.Google ScholarGoogle Scholar
  12. S. A. Weil et al. "Dynamic Metadata Management for Petabyte-Scale File Systems". In: SC. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Xing et al. "Adaptive and Scalable Metadata Management to Support a Trillion Files". In: SC. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Patil and G. Gibson. "Scale and Concurrency of GIGA+: File System Directories with Millions of Files". In: FAST. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Ren et al. "IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion". In: SC. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Q. Zheng, K. Ren, and G. Gibson. "BatchFS: Scaling the File System Control Plane with Client-Funded Metadata Servers". In: PDSW. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Bent et al. "PLFS: a checkpoint filesystem for parallel applications". In: SC. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Rajachandrasekar et al. "A 1 PB/s File System to Checkpoint Three Million MPI Tasks". In: HPDC. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Prabhakar et al. "Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines". In: ICDCS. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. O'Neil et al. "The Log-structured Merge-tree". In: Acta Inf. 33.4 (June 1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Hildebrand and P. Honeyman. "Exporting storage systems in a scalable manner with pNFS". In: MSST. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. A. Weil et al. "Ceph: A Scalable, High-Performance Distributed System". In: OSDI. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. A. Weil et al. "RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters". In: PDSW. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Hunt et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." In: USENIX ATC. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. LevelDB. A fast and lightweight key/value database library. https://github.com/google/leveldb/.Google ScholarGoogle Scholar
  26. G. DeCandia et al. "Dynamo: Amazon's Highly Available Key-value Store". In: SOSP. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Chang et al. "BigTable: a distributed storage system for structured data". In: OSDI. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Burrows. "The Chubby Lock Service for Loosely-coupled Distributed Systems". In: OSDI. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. OpenLDAP. http://www.openldap.org/.Google ScholarGoogle Scholar
  30. AWS Directory Service. https://aws.amazon.com/directoryservice/.Google ScholarGoogle Scholar
  31. D. B. Terry et al. "Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System". In: SOSP. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Greenberg, J. Bent, and G. Grider. "MDHIM: A Parallel Key/Value Framework for HPC". In: HotStorage. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nome. http://nome.nmc-probe.org/.Google ScholarGoogle Scholar
  34. G. Gibson et al. "PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research". In: USENIX;login: 38.3 (June 2013).Google ScholarGoogle Scholar
  35. A. Torres and D. Bonnie. Small File Aggregation with PLFS. http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-13-22024. 2013.Google ScholarGoogle Scholar
  36. J. He et al. "Discovering Structure in Unstructured I/O". In: PDSW. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Ren and G. Gibson. "TableFS: Enhancing Metadata Efficiency in the Local File System". In: USENIX ATC. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Cranor, M. Polte, and G. Gibson. "Structuring PLFS for Extensibility". In: PDSW. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. E. Anderson et al. "Serverless Network File Systems". In: SOSP. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Zhao et al. "FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems". In: Big Data. 2014.Google ScholarGoogle Scholar
  41. G. A. Gibson et al. "A Cost-effective, High-bandwidth Storage Architecture". In: ASPLOS. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. B. Calder et al. "Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency". In: SOSP. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Chen et al. "Walnut: A Unified Cloud Object Store". In: SIGMOD. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. J. Kistler and M. Satyanarayanan. "Disconnected Operation in the Coda File System". In: SOSP. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    PDSW '15: Proceedings of the 10th Parallel Data Storage Workshop
    November 2015
    59 pages
    ISBN:9781450340083
    DOI:10.1145/2834976

    Copyright © 2015 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 November 2015

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    PDSW '15 Paper Acceptance Rate9of25submissions,36%Overall Acceptance Rate17of41submissions,41%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader