skip to main content
article

Types from data: making structured data first-class citizens in F#

Published:02 June 2016Publication History
Skip Abstract Section

Abstract

Most modern applications interact with external services and access data in structured formats such as XML, JSON and CSV. Static type systems do not understand such formats, often making data access more cumbersome. Should we give up and leave the messy world of external data to dynamic typing and runtime checks? Of course, not! We present F# Data, a library that integrates external structured data into F#. As most real-world data does not come with an explicit schema, we develop a shape inference algorithm that infers a shape from representative sample documents. We then integrate the inferred shape into the F# type system using type providers. We formalize the process and prove a relative type soundness theorem. Our library significantly reduces the amount of data access code and it provides additional safety guarantees when contrasted with the widely used weakly typed techniques.

References

  1. L. Cardelli and J. C. Mitchell. Operations on Records. In Mathematical Foundations of Programming Semantics, pages 22–52. Springer, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Chlipala. Ur: Statically-typed Metaprogramming with Type-level Record Computation. In ACM SIGPLAN Notices, volume 45, pages 122–133. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. R. Christiansen. Dependent Type Providers. In Proceedings of Workshop on Generic Programming, WGP ’13, pages 25–34, 2013. ISBN 978-1-4503-2389-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Colazzo, G. Ghelli, and C. Sartiani. Typing Massive JSON Datasets. In International Workshop on Cross-model Language Design and Implementation, XLDI ’12, 2012.Google ScholarGoogle Scholar
  5. E. Cooper, S. Lindley, P. Wadler, and J. Yallop. Links: Web Programming without Tiers. In Formal Methods for Components and Objects, pages 266–296. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Donham and N. Pouillard. Camlp4 and Template Haskell. In Commercial Users of Functional Programming, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Fisher and R. Gruber. PADS: A Domain-specific Language for Processing Ad Hoc Data. ACM SIGPLAN Notices, 40(6): 295–304, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Fisher, D. Walker, K. Q. Zhu, and P. White. From Dirt to Shovels: Fully Automatic Tool Generation from Ad Hoc Data. In Proceedings of ACM Symposium on Principles of Programming Languages, POPL ’08, pages 421–434, 2008. ISBN 978-1-59593-689-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Hosoya and B. C. Pierce. XDuce: A Statically Typed XML Processing Language. Transactions on Internet Technology, 3 (2):117–148, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Igarashi, B. Pierce, and P. Wadler. Featherweight Java: A Minimal Core Calculus for Java and GJ. In ACM SIGPLAN Notices, volume 34, pages 132–146. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Mandelbaum, K. Fisher, D. Walker, M. Fernandez, and A. Gleyzer. PADS/ML: A Functional Data Description Language. In ACM SIGPLAN Notices, volume 42, pages 77–83. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Mehnert and D. Christiansen. Tool Demonstration: An IDE for Programming and Proving in Idris. In Proceedings of Vienna Summer of Logic, VSL’14, 2014.Google ScholarGoogle Scholar
  13. E. Meijer, W. Schulte, and G. Bierman. Unifying Tables, Objects, and Documents. In Workshop on Declarative Programming in the Context of Object-Oriented Languages, pages 145–166, 2003.Google ScholarGoogle Scholar
  14. E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling Object, Relations and XML in the .NET Framework. In Proceedings of the International Conference on Management of Data, SIGMOD ’06, pages 706–706, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Milner. The Definition of Standard ML: Revised. MIT press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Petricek and D. Syme. In the Age of Web: Typed Functional-first Programming Revisited. Post-Proceedings of ML Workshop, 2015.Google ScholarGoogle Scholar
  17. D. Rémy. Type Inference for Records in a Natural Extension of ML. Theoretical Aspects Of Object-Oriented Programming. Types, Semantics and Language Design. MIT Press, 1993.Google ScholarGoogle Scholar
  18. S. Scheglmann, R. Lämmel, M. Leinberger, S. Staab, M. Thimm, and E. Viegas. IDE Integrated RDF Exploration, Access and RDF–Based Code Typing with LITEQ. In The Semantic Web: ESWC 2014 Satellite Events, pages 505–510. Springer, 2014.Google ScholarGoogle Scholar
  19. T. Sheard and S. P. Jones. Template Meta-programming for Haskell. In Proceedings of the ACM Workshop on Haskell, pages 1–16. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. G. Siek and W. Taha. Gradual Typing for Functional Languages. In Scheme and Functional Programming Workshop, pages 81–92, 2006.Google ScholarGoogle Scholar
  21. M. Sulzmann and K. Z. M. Lu. A Type-safe Embedding of XDuce into ML. Electr. Notes in Theoretical Comp. Sci., 148 (2):239–264, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Swamy, C. Fournet, A. Rastogi, K. Bhargavan, J. Chen, P.-Y. Strub, and G. Bierman. Gradual Typing Embedded Securely in JavaScript. In ACM SIGPLAN Notices, volume 49, pages 425–437. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Syme, K. Battocchi, K. Takeda, D. Malayeri, J. Fisher, J. Hu, T. Liu, B. McNamara, D. Quirk, M. Taveggia, W. Chae, U. Matsveyeu, and T. Petricek. Strongly-typed Language Support for Internet-scale Information Sources. Technical Report MSR-TR-2012-101, Microsoft Research, September 2012.Google ScholarGoogle Scholar
  24. D. Syme, K. Battocchi, K. Takeda, D. Malayeri, and T. Petricek. Themes in Information-rich Functional Programming for Internet-scale Data Sources. In Proceedings of the Workshop on Data Driven Functional Programming, DDFP’13, pages 1–4, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Taha and T. Sheard. Multi-stage Programming with Explicit Annotations. ACM SIGPLAN Notices, 32(12):203–217, 1997. ISSN 0362-1340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. K. Wright and M. Felleisen. A Syntactic Approach to Type Soundness. Information and computation, 115(1):38– 94, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Types from data: making structured data first-class citizens in F#

    Recommendations

    Reviews

    Scott Arthur Moody

    Imagine learning the data schema formats of an application just by analyzing JavaScript Object Notation (JSON), Extensible Markup Language (XML), or comma-separated values (CSV) examples. By using shape inference algorithms, the "shape" of these examples can be inferred, and then described with the F# data and type system. The authors introduce the F# data language and their approach to analyze sample documents and create first-class structured data objects, "citizens," in F#. Using F# data type providers, processing sample data, and inferring common preferred shapes, F# types can be generated and used by programmers. The papers shows typical real-world JSON services exhibiting some of the more complicated shapes, such as numerical versus string literals, nested collections, and even optional fields. Their approach flips the typical known type processing into a new "types from data" approach. By exploring structured data, such as JSON nodes, the authors' shape inference algorithm can recursively find common shapes of all the child nodes of the sample document. These are then defined in the F# language while being related to known preferred shape relations of JSON, XML, or CVS languages. The paper describes their common preferred shape function rules and maps them into an advanced type system. Numerous worked examples are shown that map back to their common shapes. A formal type model based on the flyweight object-oriented (FOO) calculus forms the basis for their F# data integration approach. With great detail, the paper shows various challenges with example data. With this approach, the type providers create a strong typing model helping a program work with unknown real-world data. Without strong typing, or known schemas, applications usually substitute with lots of dynamic typing and exception handling. Also as schemas or examples change, the inference engine can be rerun to incorporate new knowledge. The authors have connected two lines of research: (1) extending programming language type systems to accommodate external data services, and (2) inferring types for these same real-world data sources. This paper is also unique in that it describes the programming language theory behind concrete type providers. The paper also provides examples showing how their approach works, and correctness proofs that match their definition of what they can process with desired results. A formal model helps prove relative type safety. With a rich F# user community, many examples have been processed, which increases information on the shape inference approaches. Information on F# is available online, and libraries to incorporate this process into popular languages are available. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 6
      PLDI '16
      June 2016
      726 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2980983
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2016
        726 pages
        ISBN:9781450342612
        DOI:10.1145/2908080
        • General Chair:
        • Chandra Krintz,
        • Program Chair:
        • Emery Berger

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 June 2016

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader