Abstract
We present new techniques for automatically constructing probabilistic programs for data analysis, interpretation, and prediction. These techniques work with probabilistic domain-specific data modeling languages that capture key properties of a broad class of data generating processes, using Bayesian inference to synthesize probabilistic programs in these modeling languages given observed data. We provide a precise formulation of Bayesian synthesis for automatic data modeling that identifies sufficient conditions for the resulting synthesis procedure to be sound. We also derive a general class of synthesis algorithms for domain-specific languages specified by probabilistic context-free grammars and establish the soundness of our approach for these languages. We apply the techniques to automatically synthesize probabilistic programs for time series data and multivariate tabular data. We show how to analyze the structure of the synthesized programs to compute, for key qualitative properties of interest, the probability that the underlying data generating process exhibits each of these properties. Second, we translate probabilistic programs in the domain-specific language into probabilistic programs in Venture, a general-purpose probabilistic programming system. The translated Venture programs are then executed to obtain predictions of new time series data and new multivariate data records. Experimental results show that our techniques can accurately infer qualitative structure in multiple real-world data sets and outperform standard data analysis methods in forecasting and predicting new data.
Supplemental Material
Available for Download
This file contains the supplementary material (appendix) for the main paper.
- Martin L. Abbott. 2016. Correlation. In Using Statistics in the Social and Health Sciences with SPSS and Excel. Chapter 11, 329–370.Google Scholar
- Rajeev Alur, Rastislav Alur, Garvit Juniwal, Milo M. K. Juniwal, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In Proceedings of the Thirteenth International Conference on Formal Methods in Computer-Aided Design (FMCAD). IEEE, 1–8.Google ScholarCross Ref
- Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to write programs. In International Conference on Learning Representations (ICLR).Google Scholar
- José-Miguel Bernardo and Adrian Smith. 1994. Bayesian Theory. John Wiley & Sons, Inc.Google Scholar
- Taylor L. Booth and Richard A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Trans. Comput. 22, 5 (1973), 442–450. Google ScholarDigital Library
- Matko Bošnak, Pushmeet Kohli, Tejas Kulkrani, Sebastian Riedel, Tim Rocktäschel, Dawn Song, and Robert Zinkov (Eds.). 2018. Neural Abstract Machines and Program Induction Workshop v2.Google Scholar
- George E. P. Box and Gwilym Jenkins. 1976. Time Series Analysis: Forecasting and Control. Holden-Day, Inc. Google ScholarDigital Library
- Richard M. Burstall and John Darlington. 1977. A transformation system for developing recursive programs. J. ACM 24, 1 (1977), 44–67. Google ScholarDigital Library
- Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A probabilistic programming language. Journal of Statistical Software 76, 1 (2017), 1–32.Google ScholarCross Ref
- Sarah Chasins and Phitchaya M. Phothilimthana. 2017. Data-driven synthesis of full probabilistic programs. In Proceedings of the Twenty-Ninth International Conference on Computer Aided Verification (CAV), Rupak Majumdar and Viktor Kunčak (Eds.), Vol. 10426. Springer, 279–304.Google Scholar
- Ron P. Cody and Jeffrey K. Smith. 2005. Applied Statistics and the SAS Programming Language (5 ed.). Prentice-Hall. Google ScholarDigital Library
- Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml . (2017).Google Scholar
- Norman R. Draper and Harry Smith. 1966. Applied Regression Analysis. John Wiley & Sons, Inc.Google Scholar
- David Duvenaud, James Lloyd, Roger Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. 2013. Structure discovery in nonparametric regression through compositional kernel search. In Proceedings of the Thirtieth International Conference on Machine Learning (ICML), Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. PMLR, 1166–1174. Google ScholarDigital Library
- Kevin Ellis, Armando Solar-Lezama, and Josh Tenenbaum. 2015. Unsupervised learning by program synthesis. In Advances in Neural Information Processing Systems 28 (NIPS), Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). Curran Associates, 973–981. Google ScholarDigital Library
- Kevin Ellis, Armando Solar-Lezama, and Josh Tenenbaum. 2016. Sampling for Bayesian program learning. In Advances in Neural Information Processing Systems 29 (NIPS), Daniel D. Lee, Masashi Sugiyama, Ulrich von Luxburg, I. Guyon, and Roman Garnett (Eds.). Curran Associates, 1297–1305. Google ScholarDigital Library
- Mordecai Ezekiel. 1941. Methods of Correlation Analysis. John Wiley & Sons, Inc.Google Scholar
- John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In Proceedings of the Thirty-Sixth ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 229–239. Google ScholarDigital Library
- Hong Ge, Kai Xu, and Zoubin Ghahramani. 2018. Turing: A language for flexible probabilistic inference. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (AISTATS), Amos Storkey and Fernando Perez-Cruz (Eds.), Vol. 84. PMLR, 1682–1690.Google Scholar
- Roland Gecse and Attila Kovács. 2010. Consistency of stochastic context-free grammars. Mathematical and Computer Modelling 52, 3 (2010), 490–500. Google ScholarDigital Library
- Andrew Gelman and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.Google Scholar
- Noah Goodman, Vikash Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua Tenenbaum. 2008. Church: A language for generative models. In Proceedings of the Twenty-Fourth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI). AUAI Press, 220–229. Google ScholarDigital Library
- Noah D. Goodman and Andreas Stuhlmüller. 2014. The Design and Implementation of Probabilistic Programming Languages. http://dippl.org . (2014).Google Scholar
- Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural Turing machines. (2014). arXiv: 1410.5401Google Scholar
- Roger Grosse, Ruslan Salakhutdinov, William Freeman, and Joshua B. Tenenbaum. 2012. Exploiting compositionality to explore a large space of model structures. In Proceedings of the Twenty-Eighth Annual Conference on Uncertainty in Artificial Intelligence (UAI). AUAI Press, 306–31. Google ScholarDigital Library
- Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In Proceedings of the Thirty-Eighth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM, 317–330. Google ScholarDigital Library
- Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. 2011. Synthesis of loop-free programs. In Proceedings of the Thirty-Second ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 62–73. Google ScholarDigital Library
- Irvin Hwang, Andreas Stuhlmüller, and Noah D. Goodman. 2011. Inducing probabilistic programs by Bayesian program merging. (2011). arXiv: 1110.5667Google Scholar
- Rob Hyndman and Yeasmin Khandakar. 2008. Automatic time series forecasting: The forecast package for R. Journal of Statistical Software 27, 3 (2008), 1–22.Google ScholarCross Ref
- Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: with Applications in R. Springer. Google Scholar
- Frederick Jelinek, John D. Lafferty, and Robert L. Mercer. 1992. Basic methods of probabilistic context free grammars. In Speech Recognition and Understanding (NATO ASI Series, Sub-Series F: Computer and Systems Sciences), Pietro Laface and Renato De Mori (Eds.), Vol. 75. Springer, 345–360.Google Scholar
- Sumit Jha, Sumit Gulwani, Sanjit A. Jha, and Ashish Tiwari. 2010. Oracle-guided component-based program synthesis. In Proceedings of the Thirty-Second ACM/IEEE International Conference on Software Engineering (ICSE), Vol. 1. ACM, 215–224. Google ScholarDigital Library
- Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. 2007. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Advances in Neural Information Processing Systems 19 (NIPS), Bernard Schölkopf, John C. Platt, and Thomas Hoffman (Eds.). Curran Associates, 641–648. Google ScholarDigital Library
- Matthew J. Johnson and Alan S. Willsky. 2013. Bayesian nonparametric hidden semi-Markov models. Journal of Machine Learning Research 14, Feb (2013), 673–701. Google ScholarDigital Library
- John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press. Google ScholarDigital Library
- John R. Koza, Forest H. Bennett III, Andre David, Martin A. Keane, and Frank Dunlap. 1997. Automated synthesis of analog electrical circuits by means of genetic programming. IEEE Transactions on Evolutionary Computation 1, 2 (1997), 109–128. Google ScholarDigital Library
- Woosuk Lee, Kihong Heo, Rajeev Alur, and Mayur Naik. 2018. Accelerating search-based program synthesis using learned probabilistic models. In Proceedings of the Thirty-Ninth ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 436–449. Google ScholarDigital Library
- Percy Liang, Michael I. Jordan, and Dan Klein. 2010. Learning programs: A hierarchical Bayesian approach. In Proceedings of the Twenty-Seventh International Conference on Machine Learning (ICML), Johannes Fürnkranz and Thorsten Joachims (Eds.). Omnipress, 639–646. Google ScholarDigital Library
- James R. Lloyd. 2014. Kernel structure discovery research code. https://github.com/jamesrobertlloyd/gpss- research/tree/ master/data . (2014).Google Scholar
- James R. Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. 2014. Automatic construction and natural-language description of nonparametric regression models. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI). AAAI Publications. Google ScholarDigital Library
- Zohar Manna and Richard Waldinger. 1979. Synthesis: Dreams to programs. IEEE Transactions on Software Engineering 5, 4 (1979), 294–328. Google ScholarDigital Library
- Zohar Manna and Richard Waldinger. 1980. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems 2, 1 (1980), 90–121. Google ScholarDigital Library
- Vikash Mansinghka, Charles Kemp, Thomas Griffiths, and Joshua B. Tenenbaum. 2006. Structured priors for structure learning. In Proceedings of the Twenty-Second Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI). AUAI Press, 324–331. Google ScholarDigital Library
- Vikash Mansinghka, Daniel Selsam, and Yura Perov. 2014. Venture: A higher-order probabilistic programming platform with programmable inference. (2014). arXiv: 1404.0099Google Scholar
- Vikash Mansinghka, Patrick Shafto, Eric Jonas, Cap Petschulat, Max Gasner, and Joshua B. Tenenbaum. 2016. CrossCat: A fully Bayesian nonparametric method for analyzing heterogeneous, high dimensional data. Journal of Machine Learning Research 17, 138 (2016), 1–49. Google ScholarDigital Library
- Vikash K. Mansinghka, Ulrich Schaechtle, Shivam Handa, Alexey Radul, Yutian Chen, and Martin Rinard. 2018. Probabilistic programming with programmable inference. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 603–616. Google ScholarDigital Library
- Andrew McCallum, Karl Schultz, and Sameer Singh. 2009. FACTORIE: Probabilistic programming via imperatively defined factor graphs. In Advances in Neural Information Processing Systems 22 (NIPS), Yoshua Bengio, Dale Schuurmans, John D. Lafferty, Christopher K. Williams, and Aron Culotta (Eds.). Curran Associates, 1249–1257. Google ScholarDigital Library
- Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. 2005. BLOG: Probabilistic models with unknown objects. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI). Morgan Kaufmann Publishers Inc., 1352–1359. Google ScholarDigital Library
- Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. MIT Press. Google ScholarDigital Library
- Norman H. Nie. 1975. SPSS: Statistical Package for the Social Sciences. McGraw-Hill.Google Scholar
- Antinus Nijholt. 1980. Context-Free Grammars: Covers, Normal Forms, and Parsing. Springer. Google ScholarDigital Library
- Aditya V. Nori, Sherjil Ozair, Sriram K. Rajamani, and Deepak Vijaykeerthy. 2015. Efficient synthesis of probabilistic programs. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 208–217. Google ScholarDigital Library
- Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825–2830. Google ScholarDigital Library
- Yura N. Perov and Frank D. Wood. 2014. Learning probabilistic programs. (2014). arXiv: 1407.2646Google Scholar
- Avi Pfeffer. 2001. IBAL: A probabilistic rational programming language. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Vol. 1. Morgan Kaufmann Publishers Inc., 733–740. Google ScholarDigital Library
- Avi Pfeffer. 2016. Practical Probabilistic Programming (1 ed.). Manning Publications Co. Google ScholarDigital Library
- Martyn Plummer. 2003. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the Third International Workshop on Distributed Statistical Computing (DSC), Kurt Hornik, Friedrich Leisch, and Achim Zeileis (Eds.). Austrian Association for Statistical Computing.Google Scholar
- Joaquin Quiñonero-Candela and Carl E. Rasmussen. 2005. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research 6, Dec (2005), 1939–1959. Google ScholarDigital Library
- Jeff Racine and Qi Li. 2004. Nonparametric estimation of regression functions with both categorical and continuous data. Journal of Econometrics 119, 1 (2004), 99–130.Google ScholarCross Ref
- Carl Rasmussen and Christopher Williams. 2006. Gaussian Processes for Machine Learning. The MIT Press. Google ScholarDigital Library
- Carl E. Rasmussen and Hannes Nickisch. 2010. Gaussian processes for machine learning (GPML) toolbox. Journal of Machine Learning Research 11, Nov (2010), 3011–3015. Google ScholarDigital Library
- Scott Reed and Nando de Freitas. 2016. Neural programmer-interpreters. In International Conference on Learning Representations (ICLR).Google Scholar
- Feras Saad and Vikash Mansinghka. 2016. Probabilistic data analysis with probabilistic programming. (2016). arXiv: 1608.05347Google Scholar
- Feras Saad and Vikash Mansinghka. 2017. Detecting dependencies in sparse, multivariate databases using probabilistic programming and non-parametric Bayes. In Artificial Intelligence and Statistics (AISTATS), Aarti Singh and Jerry Zhu (Eds.), Vol. 54. PMLR, 632–641.Google Scholar
- Feras A. Saad and Vikash K. Mansinghka. 2018. Temporally-reweighted Chinese restaurant process mixtures for clustering, imputing, and forecasting multivariate time series. In Proceedings of the Twenty-First Conference on Artificial Intelligence and Statistics (AISTATS), Amos Storkey and Fernando Perez-Cruz (Eds.), Vol. 84. PMLR, 755–764.Google Scholar
- John Salvatier, Thomas V. Wiecki, and Christopher Fonnesbeck. 2016. Probabilistic programming in python using PyMC3. PeerJ Computer Science 2 (2016), e55.Google ScholarCross Ref
- Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic superoptimization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 305–316. Google ScholarDigital Library
- Moses Schönfinkel. 1924. Über die Bausteine der mathematischen Logik. Math. Ann. 92 (1924), 305–316.Google ScholarCross Ref
- David W. Scott. 2009. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, Inc.Google Scholar
- Skipper Seabold and Josef Perktold. 2010. Statsmodels: Econometric and statistical modeling with python. In Proceedings of the Ninth Python in Science Conference (SciPy), Stêfan van der Walt and Jarrod Millman (Eds.). 57–61.Google ScholarCross Ref
- Bernard W. Silverman. 1986. Density Estimation for Statistics and Data Analysis. CRC Press.Google Scholar
- Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial sketching for finite programs. In Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 404–415. Google ScholarDigital Library
- David Spiegelhalter, Andrew Thomas, Nicky Best, and Wally Gilks. 1996. BUGS 0.5: Bayesian Inference Using Gibbs Sampling Manual (version ii). MRC Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom.Google Scholar
- Sean J. Taylor and Benjamin Letham. 2018. Forecasting at scale. The American Statistician 72, 1 (2018), 37–45.Google ScholarCross Ref
- Luke Tierney. 1994. Markov chains for exploring posterior distributions. The Annals of Statistics 22, 4 (1994), 1701–1728.Google ScholarCross Ref
- Anh Tong and Jaesik Choi. 2016. Automatic generation of probabilistic programming from time series data. (2016). arXiv: 1607.00710Google Scholar
- Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. Deep probabilistic programming. In International Conference on Learning Representations (ICLR).Google Scholar
- John W. Tukey. 1977. Exploratory Data Analysis. Addison-Wesley Publishing Company.Google Scholar
- Franklyn Turbak, David Gifford, and Mark A. Sheldon. 2008. Design Concepts in Programming Languages. MIT Press. Google ScholarDigital Library
- Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. 2014. A new approach to probabilistic programming inference. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 33. PMLR, 1024–1032.Google Scholar
Index Terms
- Bayesian synthesis of probabilistic programs for automatic data modeling
Recommendations
Efficient synthesis of probabilistic programs
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationWe show how to automatically synthesize probabilistic programs from real-world datasets. Such a synthesis is feasible due to a combination of two techniques: (1) We borrow the idea of ``sketching'' from synthesis of deterministic programs, and allow ...
Efficient synthesis of probabilistic programs
PLDI '15We show how to automatically synthesize probabilistic programs from real-world datasets. Such a synthesis is feasible due to a combination of two techniques: (1) We borrow the idea of ``sketching'' from synthesis of deterministic programs, and allow ...
Declarative Modeling and Bayesian Inference of Dark Matter Halos
Revised Selected Papers of the 14th International Conference on Computer Aided Systems Theory - EUROCAST 2013 - Volume 8111Probabilistic programming allows specification of probabilistic models in a declarative manner. Recently, several new software systems and languages for probabilistic programming have been developed in the on the basis of newly developed and improved ...
Comments