skip to main content
article

PADS: a domain-specific language for processing ad hoc data

Published:12 June 2005Publication History
Skip Abstract Section

Abstract

PADS is a declarative data description language that allows data analysts to describe both the physical layout of ad hoc data sources and semantic properties of that data. From such descriptions, the PADS compiler generates libraries and tools for manipulating the data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as Xml or those required for loading relational databases, and tools for running XQueries over raw PADS data sources. The descriptions are concise enough to serve as "living" documentation while flexible enough to describe most of the ASCII, binary, and Cobol formats that we have seen in practice. The generated parsing library provides for robust, application-specific error handling.

References

  1. Abstract syntax description language. http://sourceforge.net/projects/asdl.Google ScholarGoogle Scholar
  2. Cisco netflow. http://www.cisco.com/warp/public/732/Tech/nmp/netflow/index.shtml.Google ScholarGoogle Scholar
  3. DFDL project. http://forge.gridforum.org/projects/dfdl-wg.Google ScholarGoogle Scholar
  4. Erlang bit syntax. http://www.erlang.se/euc/99/binaries.ps.Google ScholarGoogle Scholar
  5. Galax user manual. http://www.galaxquery.org/doc.html#manual.Google ScholarGoogle Scholar
  6. Hypertext transfer protocol -- HTTP/1.1. http://www.w3.org/Protocols/rfc2616/rfc2616.html.Google ScholarGoogle Scholar
  7. PADS user manual. http://www.padsproj.org/doc.html#manual.Google ScholarGoogle Scholar
  8. Unicode home page. http://www.unicode.org/.Google ScholarGoogle Scholar
  9. G. Back. DataScript - A specification and scripting language for binary data. In Proceedings of Generative Programming and Component Engineering, volume 2487, pages 66--77. LNCS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Bell, F. Bellegarde, J. Hook, R. B. Kieburtz, A. Kotov, J. Lewis, L. McKinney, D. P. Oliva, T. Sheard, L. Tong, L. Walton, and T. Zhou. Software design for reliability and reuse: A proof-of-concept demonstration. In TRI-Ada '94 proceedings, pages 396--404, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Boag, D. Chamberlin, M. F. Fernández, D. Florescu, J. Robie, and J. Siméon. XQuery 1.0 An XML Query Language, W3C Working Draft, Aug 2004. http://www.w3.org/TR/xquery.Google ScholarGoogle Scholar
  12. S. Chandra, N. Heintze, D. MacQueen, D. Oliva, and M. Siff. C-frontend library for SML/NJ. See cm.bell-labs.com/cm/cs/what/smlnj., 1999.Google ScholarGoogle Scholar
  13. C. Cortes, K. Fisher, D. Pregibon, A. Rogers, and F. Smith. Hancock: A language for analyzing transactional data streams. ACM Trans. Program. Lang. Syst., 26(2):301--338, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Cortes and D. Pregibon. Giga mining. In KDD, 1998.Google ScholarGoogle Scholar
  15. C. Cortes and D. Pregibon. Information mining platform: An infrastructure for KDD rapid deployment. In KDD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: High performance network monitoring with an SQL interface. In SIGMOD. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Dubuisson. ASN.1: Communication between heterogeneous systems. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. F. Fernández, J. Siméon, B. Choi, A. Marian, and G. Sur. Implementing XQuery 1.0: The Galax experience. In VLDB, pages 1077--1080. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Fowler, D. Korn, S. North, and P. Vo. The AT&T AST opensource software collection. In Proceedings of the FREENIX Track 2000 Usenix Annual Technical Conference, pages 187--195, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In STOC, pages 389--398, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In VLDB, pages 454--465, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Greer. Daytona and the fourth-generation language Cymbal. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania, USA. ACM Press, 1999. Also available at www.research.att.com/projects/daytona. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Histogramming data streams with fast per-item processing. In ICALP, pages 681--692, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Kieburtz, L. McKinney, J. Bell, J. Hook, A. Kotov, J. Lewis, D. Oliva, T. Sheard, I. Smith, and L. Walton. A software engineering experiment in software component generation. In Proceedings of the 18th International Conference on Software Engineering, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. G. Korn and K.-P. Vo. SFIO: Safe/fast string/file IO. In Proc. of the Summer '91 Usenix Conference, pages 235--256. USENIX, 1991.Google ScholarGoogle Scholar
  26. B. Krishnamurthy and J. Rexford. Web Protocols and Practice. Addison Wesley, 2001.Google ScholarGoogle Scholar
  27. B. Krishnamurthy and J. Wang. On network-aware clustering of web clients. In Proceedings of SIGCOMM 2000. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Krishnamurthy and C. Wills. Improving web experience by client characterization driven server adaptation. In Proceedings of WWW 2002. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. McCann and S. Chandra. PacketTypes: Abstract specification of network protocol messages. In ACM Conference of Special Interest Group on Data Communications (SIGCOMM), pages 321--333, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PADS: a domain-specific language for processing ad hoc data

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 40, Issue 6
    Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
    June 2005
    325 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1064978
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
      June 2005
      338 pages
      ISBN:1595930566
      DOI:10.1145/1065010

    Copyright © 2005 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 June 2005

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader