research-article

Symbolic finite state transducers: algorithms and applications

Authors:
Margus Veanes

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Pieter Hooimeijer

University of Virginia, Charlottesville, VA, USA

University of Virginia, Charlottesville, VA, USA
View Profile

,
Benjamin Livshits

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
David Molnar

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Nikolaj Bjorner

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

POPL '12: Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesJanuary 2012Pages 137–150https://doi.org/10.1145/2103656.2103674

Published:25 January 2012Publication History

POPL '12: Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

Pages 137–150

ABSTRACT

Finite automata and finite transducers are used in a wide range of applications in software engineering, from regular expressions to specification languages. We extend these classic objects with symbolic alphabets represented as parametric theories. Admitting potentially infinite alphabets makes this representation strictly more general and succinct than classical finite transducers and automata over strings. Despite this, the main operations, including composition, checking that a transducer is single-valued, and equivalence checking for single-valued symbolic finite transducers are effective given a decision procedure for the background theory. We provide novel algorithms for these operations and extend composition to symbolic transducers augmented with registers. Our base algorithms are unusual in that they are nonconstructive, therefore, we also supply a separate model generation algorithm that can quickly find counterexamples in the case two symbolic finite transducers are not equivalent. The algorithms give rise to a complete decidable algebra of symbolic transducers. Unlike previous work, we do not need any syntactic restriction of the formulas on the transitions, only a decision procedure. In practice we leverage recent advances in satisfiability modulo theory (SMT) solvers. We demonstrate our techniques on four case studies, covering a wide range of applications. Our techniques can synthesize string pre-images in excess of 8,000 bytes in roughly a minute, and we find that our new encodings significantly outperform previous techniques in succinctness and speed of analysis.

Supplemental Material

popl_2b_2.mp4

mp4

205 MB

Download

References

R. Alur and P. Cerný. Streaming transducers for algorithmic verification of single-pass list-processing programs. In POPL'11, pages 599--610. ACM, 2011. Google ScholarDigital Library
D. E. Bakke, R. Parameswaran, D. M. Blough, A. A. Franz, and T. J. Palmer. Data obfuscation: Anonymity and desensitization of usable data sets. IEEE Security and Privacy, Apr. 2004. Google ScholarDigital Library
D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In IEEE Oakland Security and Privacy, 2008. Google ScholarDigital Library
Bek. http://research.microsoft.com/bek.Google Scholar
M. Benedikt, C. Ley, and G. Puppis. Automata vs. logics on data words. In CSL, volume 6247 of LNCS, pages 110--124. Springer, 2010. Google ScholarDigital Library
er and Veanes(2011)}BV11N. Bjørner and M. Veanes. Symbolic transducers. Technical Report MSR-TR-2011--3, Microsoft Research, January 2011.Google Scholar
M. Boja'nczyk, A. Muscholl, T. Schwentick, L. Segoufin, and C. David. Two-variable logic on words with data. In phLICS, pages 7--16. IEEE, 06. Google ScholarDigital Library
A. Brush, J. Krumm, and J. Scott. Exploring end user preferences for location obfuscation, location-based services, and the value of location. In UbiComp, September 2010. Google ScholarDigital Library
K. Z. Chen, G. Gu, J. Nazario, X. Han, and J. Zhuge. WebPatrol: Automated collection and replay of web-based malware scenarios. In ASIACCS, March 2011. Google ScholarDigital Library
B. Courcelle and P. Franchi-Zannettacchi. Attribute grammars and recursive program schemes. Theoretical Computer Science, 17: 163--191, 1982.Google ScholarCross Ref
M. Cova, C. Kruegel, and G. Vigna. Detection and analysis of drive-by-download attacks and malicious JavaScript code. In WWW Conference, Raleigh, NC, April 2010. Google ScholarDigital Library
K. Culic and J. Karhumaki. The equivalence of finite-valued transducers (on HDTOL languages) is decidable. Theoretical Computer Science, 47: 71--84, 1986. Google ScholarDigital Library
C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert. Zozzle: Low-overhead mostly static javascript malware detection. In Proceedings of the Usenix Security Symposium, Aug. 2011. Google ScholarDigital Library
G. B. Dantzig and B. C. Eaves. Fourier-Motzkin elimination and its dual. Journal of Combinatorial Theory (A), 14: 288--297, 1973.Google ScholarCross Ref
L. de Moura and N. Bjørner. Z3: An Efficient SMT Solver. In TACAS'08, LNCS, 2008. Google ScholarDigital Library
A. J. Demers, C. Keleman, and B. Reusch. On some decidable properties of finite state translations. Acta Informatica, 17: 349--364, 1982.Google ScholarDigital Library
P. Eckersley. How unique is your web browser? In Privacy Enhancing Technologies, pages 1--18, 2010. Google ScholarDigital Library
J. Engelfriet. Some open questions and recent results on tree transducers and tree languages. In R. V. Book, editor, Formal Language Theory, pages 241--286. Academic Press, 1980.Google ScholarCross Ref
J. Engelfriet and S. Maneth. A comparison of pebble tree transducers with macro tree transducers. Acta Informatica, 39: 2003, 2003.Google ScholarDigital Library
Z. Esik. Decidability results concerning tree transducers. Acta Cybernetica, 5: 1--20, 1980.Google Scholar
Z. Fülöp and H. Vogler. Syntax-Directed Semantics: Formal Models Based on Tree Transducers. EATCS. Springer, 1998. Google ScholarDigital Library
T. L. Gall and B. Jeannet. Lattice automata: A representation for languages on infinite alphabets, and some applications to verification. In SAS 2007, volume 4634 of LNCS, pages 52--68, 2007. Google ScholarDigital Library
T. Griffiths. The unsolvability of the equivalence problem for lambda-free nondeterministic generalized machines. J. ACM, 15: 409--413, 1968. Google ScholarDigital Library
P. Hooimeijer and M. Veanes. An evaluation of automata algorithms for string analysis. In VMCAI'11, LNCS. Springer, 2011. Google ScholarDigital Library
P. Hooimeijer and W. Weimer. A decision procedure for subset constraints over regular languages. In Proceedings of the Conference on Programming Language Design and Implementation, pages 188--198, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--392--1. Google ScholarDigital Library
P. Hooimeijer and W. Weimer. Solving string constraints lazily. In ASE, 2010. Google ScholarDigital Library
P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with bek. In Proceedings of the USENIX Security Symposium, August 2011. Google ScholarDigital Library
O. Ibarra. The unsolvability of the equivalence problem for Efree NGSM's with unary input (output) alphabet and applications. SIAM Journal on Computing, 4: 524--532, 1978.Google ScholarCross Ref
M. Kaminski and N. Francez. Finite-memory automata. In 31st Annual Symposium on Foundations of Computer Science (FOCS 1990), volume 2, pages 683--688. IEEE, 1990. Google ScholarDigital Library
A. Kiezun, V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. HAMPI: a solver for string constraints. In ISSTA, 2009. Google ScholarDigital Library
N. Kobayashi, N. Tabuchi, and H. Unno. Higher-order multi-parameter tree transducers and recursion schemes for program verification. In POPL, pages 495--508. ACM, 2010. Google ScholarDigital Library
J. Krumm. A survey of computational location privacy. Personal Ubiquitous Comput., 13: 391--399, August 2009. Google ScholarDigital Library
A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power of extended top-down tree transducers. SIAM J. Comput., 39: 410--430, June 2009. Google ScholarDigital Library
T. Milo, D. Suciu, and V. Vianu. Typechecking for XML transformers. In Proc. 19th ACM Symposium on Principles of Database Systems (PODS'2000), pages 11--22. ACM, 2000. Google ScholarDigital Library
Y. Minamide. Static approximation of dynamically generated web pages. In WWW '05: Proceedings of the 14th International Conference on the World Wide Web, pages 432--441, 2005. ISBN 1--59593-046--9. Google ScholarDigital Library
K. Mowery, D. Bogenreif, S. Yilek, and H. Shacham. Fingerprinting information in javascript implementations. In Proceedings of Web 2.0 Security and Privacy 2011 (W2SP), May 2011.Google Scholar
F. Neven, T. Schwentick, and V. Vianu. Finite state machines for strings over infinite alphabets. ACM Trans. CL, 5: 403--435, 2004. Google ScholarDigital Library
G. V. Noord and D. Gerdemann. Finite state transducers with predicates and identities. Grammars, 4: 263--286, 2001.Google ScholarCross Ref
C.-H. L. Ong and S. J. Ramsay. Verifying higher-order functional programs with pattern-matching algebraic data types. In POPL'11, pages 587--598. ACM, 2011. Google ScholarDigital Library
J. R. Parker. Algorithms for Image Processing and Computer Vision. Wiley and Sons, 2006.Google Scholar
P. Ratanaworabhan, B. Livshits, and B. Zorn. Nozzle: A defense against heap-spraying code injection attacks. In Proceedings of the Usenix Security Symposium, Aug. 2009. Google ScholarDigital Library
P. Saxena, D. Akhawe, S. Hanna, S. McCamant, F. Mao, and D. Song. A symbolic execution framework for javascript. In IEEE Security and Privacy, 2010. Google ScholarDigital Library
M. P. Schützenberger. Sur les relations rationnelles. In GI Conference on Automata Theory and Formal Languages, volume 33 of LNCS, pages 209--213, 1975. Google ScholarDigital Library
L. Segoufin. Automata and logics for words and trees over an infinite alphabet. In Z. Ésik, editor, CSL, volume 4207 of LNCS, pages 41--57, 2006. Google ScholarDigital Library
H. Seidl. Equivalence of finite-valued tree transducers is decidable. Math. Systems Theory, 27: 285--346, 1994. Google ScholarDigital Library
M. Veanes and N. Bjørner. Symbolic tree transducers. In Perspectives of System Informatics (PSI'11), 2011. Google ScholarDigital Library
M. Veanes, N. Bjørner, and L. de Moura. Symbolic automata constraint solving. In C. Fermüller and A. Voronkov, editors, LPAR-17, volume 6397 of LNCS, pages 640--654, 2010. Google ScholarDigital Library
M. Veanes, P. de Halleux, and N. Tillmann. Rex: Symbolic Regular Expression Explorer. In ICST'10. IEEE, 2010. Google ScholarDigital Library
G. Wassermann, D. Yu, A. Chander, D. Dhurjati, H. Inamura, and Z. Su. Dynamic test input generation for web applications. In ISSTA, 2008. Google ScholarDigital Library
A. Weber. Decomposing finite-valued transducers and deciding their equivalence. SIAM Journal on Computing, 22 (1): 175--202, February 1993. Google ScholarDigital Library
F. Yu, T. Bultan, and O. H. Ibarra. Relational string verification using multi-track automata. In Proceedings of the 15th international conference on Implementation and application of automata, CIAA'10, pages 290--299, 2011. Google ScholarDigital Library

Index Terms

Symbolic finite state transducers: algorithms and applications

Recommendations

Symbolic finite state transducers: algorithms and applications
POPL '12

Finite automata and finite transducers are used in a wide range of applications in software engineering, from regular expressions to specification languages. We extend these classic objects with symbolic alphabets represented as parametric theories. ...
Read More
Extended symbolic finite automata and transducers

Symbolic finite automata and transducers augment classic automata and transducers with symbolic alphabets represented as parametric theories. This extension enables to succinctly represent large and potentially infinite alphabets while preserving ...
Read More
From Two-Way to One-Way Finite State Transducers
LICS '13: Proceedings of the 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science

Any two-way finite state automaton is equivalent to some one-way finite state automaton. This well-known result, shown by Rabin and Scott and independently by Shepherd son, states that two-way finite state automata (even non-deterministic) characterize ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
POPL '12: Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
January 2012
602 pages
ISBN:9781450310833
DOI:10.1145/2103656
General Chair:
John Field
Google, USA
,
Program Chair:
Michael Hicks
University of Maryland, College Park, USA
ACM SIGPLAN Notices Volume 47, Issue 1
POPL '12
January 2012
569 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2103621
Issue’s Table of Contents
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 January 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SMT
automata
composition
equivalence
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate824of4,130submissions,20%
Upcoming Conference
POPL '25

Sponsor:

sigplan

The 52nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages

January 19 - 25, 2025

Denver , CO , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 122
  Total Citations
  View Citations
- 933
  Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Symbolic finite state transducers: algorithms and applications

POPL '12: Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Symbolic finite state transducers: algorithms and applications

Extended symbolic finite automata and transducers

From Two-Way to One-Way Finite State Transducers