Abstract
Most modern applications interact with external services and access data in structured formats such as XML, JSON and CSV. Static type systems do not understand such formats, often making data access more cumbersome. Should we give up and leave the messy world of external data to dynamic typing and runtime checks? Of course, not! We present F# Data, a library that integrates external structured data into F#. As most real-world data does not come with an explicit schema, we develop a shape inference algorithm that infers a shape from representative sample documents. We then integrate the inferred shape into the F# type system using type providers. We formalize the process and prove a relative type soundness theorem. Our library significantly reduces the amount of data access code and it provides additional safety guarantees when contrasted with the widely used weakly typed techniques.
- L. Cardelli and J. C. Mitchell. Operations on Records. In Mathematical Foundations of Programming Semantics, pages 22–52. Springer, 1990. Google ScholarDigital Library
- A. Chlipala. Ur: Statically-typed Metaprogramming with Type-level Record Computation. In ACM SIGPLAN Notices, volume 45, pages 122–133. ACM, 2010. Google ScholarDigital Library
- D. R. Christiansen. Dependent Type Providers. In Proceedings of Workshop on Generic Programming, WGP ’13, pages 25–34, 2013. ISBN 978-1-4503-2389-5. Google ScholarDigital Library
- D. Colazzo, G. Ghelli, and C. Sartiani. Typing Massive JSON Datasets. In International Workshop on Cross-model Language Design and Implementation, XLDI ’12, 2012.Google Scholar
- E. Cooper, S. Lindley, P. Wadler, and J. Yallop. Links: Web Programming without Tiers. In Formal Methods for Components and Objects, pages 266–296. Springer, 2007. Google ScholarDigital Library
- J. Donham and N. Pouillard. Camlp4 and Template Haskell. In Commercial Users of Functional Programming, 2010. Google ScholarDigital Library
- K. Fisher and R. Gruber. PADS: A Domain-specific Language for Processing Ad Hoc Data. ACM SIGPLAN Notices, 40(6): 295–304, 2005. Google ScholarDigital Library
- K. Fisher, D. Walker, K. Q. Zhu, and P. White. From Dirt to Shovels: Fully Automatic Tool Generation from Ad Hoc Data. In Proceedings of ACM Symposium on Principles of Programming Languages, POPL ’08, pages 421–434, 2008. ISBN 978-1-59593-689-9. Google ScholarDigital Library
- H. Hosoya and B. C. Pierce. XDuce: A Statically Typed XML Processing Language. Transactions on Internet Technology, 3 (2):117–148, 2003. Google ScholarDigital Library
- A. Igarashi, B. Pierce, and P. Wadler. Featherweight Java: A Minimal Core Calculus for Java and GJ. In ACM SIGPLAN Notices, volume 34, pages 132–146. ACM, 1999. Google ScholarDigital Library
- Y. Mandelbaum, K. Fisher, D. Walker, M. Fernandez, and A. Gleyzer. PADS/ML: A Functional Data Description Language. In ACM SIGPLAN Notices, volume 42, pages 77–83. ACM, 2007. Google ScholarDigital Library
- H. Mehnert and D. Christiansen. Tool Demonstration: An IDE for Programming and Proving in Idris. In Proceedings of Vienna Summer of Logic, VSL’14, 2014.Google Scholar
- E. Meijer, W. Schulte, and G. Bierman. Unifying Tables, Objects, and Documents. In Workshop on Declarative Programming in the Context of Object-Oriented Languages, pages 145–166, 2003.Google Scholar
- E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling Object, Relations and XML in the .NET Framework. In Proceedings of the International Conference on Management of Data, SIGMOD ’06, pages 706–706, 2006. Google ScholarDigital Library
- R. Milner. The Definition of Standard ML: Revised. MIT press, 1997. Google ScholarDigital Library
- T. Petricek and D. Syme. In the Age of Web: Typed Functional-first Programming Revisited. Post-Proceedings of ML Workshop, 2015.Google Scholar
- D. Rémy. Type Inference for Records in a Natural Extension of ML. Theoretical Aspects Of Object-Oriented Programming. Types, Semantics and Language Design. MIT Press, 1993.Google Scholar
- S. Scheglmann, R. Lämmel, M. Leinberger, S. Staab, M. Thimm, and E. Viegas. IDE Integrated RDF Exploration, Access and RDF–Based Code Typing with LITEQ. In The Semantic Web: ESWC 2014 Satellite Events, pages 505–510. Springer, 2014.Google Scholar
- T. Sheard and S. P. Jones. Template Meta-programming for Haskell. In Proceedings of the ACM Workshop on Haskell, pages 1–16. ACM, 2002. Google ScholarDigital Library
- J. G. Siek and W. Taha. Gradual Typing for Functional Languages. In Scheme and Functional Programming Workshop, pages 81–92, 2006.Google Scholar
- M. Sulzmann and K. Z. M. Lu. A Type-safe Embedding of XDuce into ML. Electr. Notes in Theoretical Comp. Sci., 148 (2):239–264, 2006. Google ScholarDigital Library
- N. Swamy, C. Fournet, A. Rastogi, K. Bhargavan, J. Chen, P.-Y. Strub, and G. Bierman. Gradual Typing Embedded Securely in JavaScript. In ACM SIGPLAN Notices, volume 49, pages 425–437. ACM, 2014. Google ScholarDigital Library
- D. Syme, K. Battocchi, K. Takeda, D. Malayeri, J. Fisher, J. Hu, T. Liu, B. McNamara, D. Quirk, M. Taveggia, W. Chae, U. Matsveyeu, and T. Petricek. Strongly-typed Language Support for Internet-scale Information Sources. Technical Report MSR-TR-2012-101, Microsoft Research, September 2012.Google Scholar
- D. Syme, K. Battocchi, K. Takeda, D. Malayeri, and T. Petricek. Themes in Information-rich Functional Programming for Internet-scale Data Sources. In Proceedings of the Workshop on Data Driven Functional Programming, DDFP’13, pages 1–4, 2013. Google ScholarDigital Library
- W. Taha and T. Sheard. Multi-stage Programming with Explicit Annotations. ACM SIGPLAN Notices, 32(12):203–217, 1997. ISSN 0362-1340. Google ScholarDigital Library
- A. K. Wright and M. Felleisen. A Syntactic Approach to Type Soundness. Information and computation, 115(1):38– 94, 1994. Google ScholarDigital Library
Index Terms
- Types from data: making structured data first-class citizens in F#
Recommendations
Types from data: making structured data first-class citizens in F#
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationMost modern applications interact with external services and access data in structured formats such as XML, JSON and CSV. Static type systems do not understand such formats, often making data access more cumbersome. Should we give up and leave the ...
Refactoring legacy AJAX applications to improve the efficiency of the data exchange component
The AJAX paradigm encodes data exchange XML formats. Recently, JSON has also become a popular data exchange format. XML has numerous benefits including human-readable structures and self-describing data. However, JSON provides significant performance ...
Schemas and Types for JSON Data: From Theory to Practice
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataThe last few years have seen the fast and ubiquitous diffusion of JSON as one of the most widely used formats for publishing and interchanging data, as it combines the flexibility of semistructured data models with well-known data structures like ...
Comments