Abstract
Probabilistic programming systems make machine learning more modular by automating inference. Recent work by Shan and Ramsey makes inference more modular by automating conditioning. Their technique uses a symbolic program transformation that treats conditioning generally via the measure-theoretic notion of disintegration. This technique, however, is limited to conditioning a single scalar variable. As a step towards modular inference for realistic machine learning applications, we have extended the disintegration algorithm to symbolically condition arrays in probabilistic programs. The extended algorithm implements lifted disintegration, where repetition is treated symbolically and without unrolling loops. The technique uses a language of index variables for tracking expressions at various array levels. We find that the method works well for arbitrarily-sized arrays of independent random choices, with the conditioning step taking time linear in the number of indices needed to select an element.
Supplemental Material
Available for Download
Source code of the Hakaru probabilistic programming system bundled with examples that illustrate disintegration on array probabilistic programs.
- Nathanael L. Ackerman, Cameron E. Freer, and Daniel M. Roy. 2011. Noncomputable Conditional Distributions. In Proceedings of the 2011 IEEE 26th Annual Symposium on Logic in Computer Science (LICS ’11). IEEE Computer Society, Washington, DC, USA, 107–116. Google ScholarDigital Library
- Arthur Asuncion, Max Welling, Padhraic Smyth, and Yee-Whye Teh. 2009. On Smoothing and Inference for Topic Models. In UAI. http://www.ics.uci.edu/~asuncion/pubs/UAI_09.pdfGoogle Scholar
- Patrick Billingsley. 1995. Probability and Measure. John Wiley & Sons, New York.Google Scholar
- Wray L. Buntine. 1994. Operations for Learning with Graphical Models. J. Artif. Int. Res. 2, 1 (Dec. 1994), 159–225. Google ScholarDigital Library
- Jacques Carette and Chung-Chieh Shan. 2016. Simplifying Probabilistic Programs Using Computer Algebra. Springer International Publishing, Cham, 135–152. Google ScholarCross Ref
- George Casella and Edward I. George. 1992. Explaining the Gibbs Sampler. The American Statistician 46, 3 (1992), 167–174. http://www.jstor.org/stable/2685208Google Scholar
- Joseph T. Chang and David Pollard. 1997. Conditioning as Disintegration. Statistica Neerlandica 51, 3 (1997), 287–317. Google ScholarCross Ref
- Guillaume Claret, Sriram K. Rajamani, Aditya V. Nori, Andrew D. Gordon, and Johannes Borgström. 2013. Bayesian Inference Using Data Flow Analysis. Technical Report MSR-TR-2013-27. Microsoft Research. http://research.microsoft.com/apps/ pubs/default.aspx?id=171611Google Scholar
- Sebastian Fischer, Oleg Kiselyov, and Chung-chieh Shan. 2011. Purely Functional Lazy Nondeterministic Programming. Journal of Functional Programming 21, 4–5 (2011), 413–465.Google ScholarDigital Library
- Sebastian Fischer, Josep Silva, Salvador Tamarit, and Germán Vidal. 2008. Preserving Sharing in the Partial Evaluation of Lazy Functional Programs. In Revised Selected Papers from LOPSTR 2007: 17th International Symposium on Logic-Based Program Synthesis and Transformation (Lecture Notes in Computer Science). Springer, Berlin, 74–89. Google ScholarDigital Library
- Timon Gehr, Sasa Misailovic, and Martin Vechev. 2016. PSI: Exact Symbolic Inference for Probabilistic Programs. Springer International Publishing, Cham, 62–83. Google ScholarCross Ref
- Andrew Gelman, Daniel Lee, and Jiqiang Guo. 2015. Stan: A probabilistic programming language for Bayesian inference and optimization. (2015).Google ScholarCross Ref
- Noah D. Goodman, Vikash K. Mansinghka, Daniel M. Roy, Keith Bonawitz, and Joshua B. Tenenbaum. 2008. Church: a language for generative models. In Proc. of Uncertainty in Artificial Intelligence. http://danroy.org/papers/church_ GooManRoyBonTen-UAI-2008.pdfGoogle Scholar
- Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, and David M. Blei. 2015. Automatic Variational Inference in Stan. ArXiv e-prints (June 2015). arXiv: stat.ML/1506.03431Google Scholar
- John Launchbury. 1993. A Natural Semantics for Lazy Evaluation. In POPL’93: Proceedings of the 20th ACM SIGPLAN-SIG-ACT Symposium on Principles of Programming Languages. ACM Press, New York, 144–154. Google ScholarDigital Library
- Vikash K. Mansinghka, Daniel Selsam, and Yura N. Perov. 2014. Venture: a higher-order probabilistic programming platform with programmable inference. CoRR abs/1404.0099 (2014). http://arxiv.org/abs/1404.0099Google Scholar
- Andrew McCallum, Karl Schultz, and Sameer Singh. 2009. FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs. In Neural Information Processing Systems (NIPS).Google ScholarDigital Library
- Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. 2007. BLOG: Probabilistic Models with Unknown Objects. In Statistical Relational Learning, Lise Getoor and Ben Taskar (Eds.). MIT Press. http: //sites.google.com/site/bmilch/papers/blog-chapter.pdfGoogle Scholar
- Tom Minka, John M. Winn, John P. Guiver, Sam Webster, Yordan Zaykov, Boris Yangel, Alexander. Spengler, and John Bronskill. 2014. Infer.NET 2.6. (2014). http://research.microsoft.com/infernet Microsoft Research Cambridge.Google Scholar
- Praveen Narayanan, Jacques Carette, Wren Romano, Chung-chieh Shan, and Robert Zinkov. 2016. Probabilistic Inference by Program Transformation in Hakaru (System Description). In Functional and Logic Programming: 13th International Symposium, FLOPS 2016 (Lecture Notes in Computer Science). Springer, Berlin, 62–79.Google ScholarCross Ref
- Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. 2014. R2: An Efficient MCMC Sampler for Probabilistic Programs. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. AAAI Press, 2476–2482.Google ScholarCross Ref
- David Pollard. 2001. A User’s Guide to Measure Theoretic Probability. Cambridge University Press, Cambridge. Google ScholarCross Ref
- Philip Resnik and Eric Hardisty. 2009. Gibbs Sampling for the Uninitiated. (2009).Google Scholar
- Chung-chieh Shan and Norman Ramsey. 2017. Exact Bayesian Inference by Symbolic Disintegration. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 130–144. Google ScholarDigital Library
- Mark Steyvers and Tom Griffiths. 2006. Probabilistic Topic Models. In Latent Semantic Analysis: A Road to Meaning., T. Landauer, D. Mcnamara, S. Dennis, and W. Kintsch (Eds.). Laurence Erlbaum. http://cocosci.berkeley.edu/tom/papers/ SteyversGriffiths.pdfGoogle Scholar
- Luke Tierney. 1998. A Note on Metropolis-Hastings Kernels for General State Spaces. The Annals of Applied Probability 8, 1 (1998), 1–9. Google ScholarCross Ref
- David Wingate, Andreas Stuhlmüller, and Noah D. Goodman. 2011. Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation. In Proceedings of AISTATS 2011: 14th International Conference on Artificial Intelligence and Statistics (JMLR Workshop and Conference Proceedings). MIT Press, Cambridge, 770–778.Google Scholar
- David Wingate and Theo Weber. 2013. Automated Variational Inference in Probabilistic Programming. ArXiv e-prints (Jan. 2013). arXiv: stat.ML/1301.1299Google Scholar
- Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. 2014. A New Approach to Probabilistic Programming Inference. In Proceedings of the 17th International conference on Artificial Intelligence and Statistics. 1024–1032.Google Scholar
Index Terms
- Symbolic conditioning of arrays in probabilistic programs
Recommendations
Exact Bayesian inference by symbolic disintegration
POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming LanguagesBayesian inference, of posterior knowledge from prior knowledge and observed evidence, is typically defined by Bayes's rule, which says the posterior multiplied by the probability of an observation equals a joint probability. But the observation of a ...
Exact Bayesian inference by symbolic disintegration
POPL '17Bayesian inference, of posterior knowledge from prior knowledge and observed evidence, is typically defined by Bayes's rule, which says the posterior multiplied by the probability of an observation equals a joint probability. But the observation of a ...
Defunctionalizing push arrays
FHPC '14: Proceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computingRecent work on embedded domain specific languages (EDSLs) for high performance array programming has given rise to a number of array representations. In Feldspar and Obsidian there are two different kinds of arrays, called Pull and Push arrays. Both ...
Comments