Abstract
Developers nowadays have to contend with a growing number of APIs. While in the long-term they are very useful to developers, many modern APIs have an incredibly steep learning curve, due to their hundreds of functions handling many arguments, obscure documentation, and frequently changing semantics. For APIs that perform data transformations, novices can often provide an I/O example demonstrating the desired transformation, but may be stuck on how to translate it to the API. A programming-by-example synthesis engine that takes such I/O examples and directly produces programs in the target API could help such novices. Such an engine presents unique challenges due to the breadth of real-world APIs, and the often-complex constraints over function arguments. We present a generator-based synthesis approach to contend with these problems. This approach uses a program candidate generator, which encodes basic constraints on the space of programs. We introduce neural-backed operators which can be seamlessly integrated into the program generator. To improve the efficiency of the search, we simply use these operators at non-deterministic decision points, instead of relying on domain-specific heuristics. We implement this technique for the Python pandas library in AutoPandas. AutoPandas supports 119 pandas dataframe transformation functions. We evaluate AutoPandas on 26 real-world benchmarks and find it solves 17 of them.
Supplemental Material
- 2014. The pandas project. https://pandas.pydata.org . Accessed October 11th, 2018.Google Scholar
- Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=BJOFETxR-Google Scholar
- R. Alur, R. Bodik, G. Juniwal, M. M. K. Martin, M. Raghothaman, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. 2013. Syntax-guided synthesis. In 2013 Formal Methods in Computer-Aided Design. 1–8. Google ScholarCross Ref
- Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2016. DeepCoder: Learning to Write Programs. CoRR abs/1611.01989 (2016). arXiv: 1611.01989 http://arxiv.org/abs/1611.01989Google Scholar
- Konstantin Böttinger, Patrice Godefroid, and Rishabh Singh. 2018. Deep Reinforcement Fuzzing. CoRR abs/1801.04589 (2018). arXiv: 1801.04589 http://arxiv.org/abs/1801.04589Google Scholar
- Rudy Bunel, Matthew J. Hausknecht, Jacob Devlin, Rishabh Singh, and Pushmeet Kohli. 2018. Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis. CoRR abs/1805.04276 (2018). arXiv: 1805.04276 http: //arxiv.org/abs/1805.04276Google Scholar
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 1724–1734. Google ScholarCross Ref
- Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the 5th ACM SIGPLAN International Conference on Functional Programming (ICFP).Google ScholarDigital Library
- Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, and Le Song. 2017. Learning Combinatorial Optimization Algorithms over Graphs. CoRR abs/1704.01665 (2017). arXiv: 1704.01665 http://arxiv.org/abs/1704.01665Google Scholar
- Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdelrahman Mohamed, and Pushmeet Kohli. 2017. RobustFill: Neural Program Learning under Noisy I/O. In ICML 2017. https://www.microsoft.com/en-us/research/ publication/robustfill-neural-program-learning-noisy-io/Google Scholar
- Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-driven Learning. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). ACM, New York, NY, USA, 420–435. Google ScholarDigital Library
- Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples. SIGPLAN Not. 52, 6 (June 2017), 422–436. Google ScholarDigital Library
- Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. 2014. Probabilistic Programming. In Proceedings of the on Future of Software Engineering (FOSE 2014). ACM, New York, NY, USA, 167–181. Google ScholarDigital Library
- Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, New York, NY, USA, 317–330. Google ScholarDigital Library
- Yeye He, Xu Chu, Kris Ganjam, Yudian Zheng, Vivek Narasayya, and Surajit Chaudhuri. 2018. Transform-data-by-example (TDE): An Extensible Search Engine for Data Transformations. Proc. VLDB Endow. 11, 10 (June 2018), 1165–1177. Google ScholarDigital Library
- Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari. 2010. Oracle-guided Component-based Program Synthesis. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE ’10). ACM, New York, NY, USA, 215–224. Google ScholarDigital Library
- A. Kalyan, A. Mohta, O. Polozov, D. Batra, P. Jain, and S. Gulwani. 2018. Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples. ArXiv e-prints (April 2018). arXiv: cs.AI/1804.01186Google Scholar
- D. P. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. ArXiv e-prints (Dec. 2014). arXiv: 1412.6980Google Scholar
- Wouter Kool, Herke van Hoof, and Max Welling. 2019. Attention, Learn to Solve Routing Problems!. In International Conference on Learning Representations. https://openreview.net/forum?id=ByxBFsRqYmGoogle Scholar
- Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 542–553. Google ScholarDigital Library
- Woosuk Lee, Kihong Heo, Rajeev Alur, and Mayur Naik. 2018. Accelerating Search-based Program Synthesis Using Learned Probabilistic Models. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). ACM, New York, NY, USA, 436–449. Google ScholarDigital Library
- Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2015. Gated Graph Sequence Neural Networks. CoRR abs/1511.05493 (2015). arXiv: 1511.05493 http://arxiv.org/abs/1511.05493Google Scholar
- Andreas Löscher and Konstantinos Sagonas. 2017. Targeted Property-based Testing. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017). ACM, New York, NY, USA, 46–56. Google ScholarDigital Library
- Microsoft. 2017. Gated Graph Neural Network Samples. https://github.com/Microsoft/gated-graph-neural-network-samples. Accessed October 17th, 2018.Google Scholar
- Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. 2019. Semantic Fuzzing with Zest. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’19). Google ScholarDigital Library
- Emilio Parisotto, Abdelrahman Mohamed, Rishabh Singh, Lihong Li, Denny Zhou, and Pushmeet Kohli. 2017. NeuroSymbolic Program Synthesis. In ICLR 2017. https://www.microsoft.com/en-us/research/publication/neuro-symbolicprogram-synthesis-2/Google Scholar
- Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. SIGPLAN Not. 51, 6 (June 2016), 522–538. Google ScholarDigital Library
- Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, New York, NY, USA, 107–126. Google ScholarDigital Library
- Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code Completion with Statistical Language Models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 419–428. Google ScholarDigital Library
- Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 404–415. Google ScholarDigital Library
- Calvin Smith and Aws Albarghouthi. 2016. MapReduce Program Synthesis. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 326–340. Google ScholarDigital Library
- Armando Solar-Lezama. 2008. Program Synthesis by Sketching. Ph.D. Dissertation. University of California at Berkeley, Berkeley, CA, USA. Advisor(s) Bodik, Rastislav. AAI3353225.Google ScholarDigital Library
- Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial Sketching for Finite Programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). ACM, New York, NY, USA, 404–415. Google ScholarDigital Library
- Xinyu Wang, Isil Dillig, and Rishabh Singh. 2017. Program Synthesis Using Abstraction Refinement. Proc. ACM Program. Lang. 2, POPL, Article 63 (Dec. 2017), 30 pages. Google ScholarDigital Library
- Navid Yaghmazadeh, Xinyu Wang, and Isil Dillig. 2018. Automated Migration of Hierarchical Data to Relational Tables Using Programming-by-example. Proc. VLDB Endow. 11, 5 (Jan. 2018), 580–593. Google ScholarDigital Library
- Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. Proc. ACM Program. Lang. 1, OOPSLA, Article 63 (Oct. 2017), 26 pages. Google ScholarDigital Library
Index Terms
- AutoPandas: neural-backed generators for program synthesis
Recommendations
Type-directed program synthesis for RESTful APIs
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationWith the rise of software-as-a-service and microservice architectures, RESTful APIs are now ubiquitous in mobile and web applications. A service can have tens or hundreds of API methods, making it a challenge for programmers to find the right ...
History-driven test program synthesis for JVM testing
ICSE '22: Proceedings of the 44th International Conference on Software EngineeringJava Virtual Machine (JVM) provides the runtime environment for Java programs, which allows Java to be "write once, run anywhere". JVM plays a decisive role in the correctness of all Java programs running on it. Therefore, ensuring the correctness and ...
Algorithmic program synthesis: introduction
Program synthesis is a process of producing an executable program from a specification. Algorithmic synthesis produces the program automatically, without an intervention from an expert. While classical compilation falls under the definition of ...
Comments