ABSTRACT
We establish a link between evolutionary algorithms (EAs) and learning of probabilistic generative models with binary hidden variables. Learning is formulated as approximate maximum likelihood optimization using variational expectation maximization. When choosing truncated posteriors as variational distributions, the variational parameters take the form of sets of latent states. By (A) identifying these sets with populations of genomes, and by (B) defining the fitness of individuals as the joint probability of the chosen generative model, the sets of latent states can be optimized using EAs. We obtain scalable learning algorithms that effectively improve the tractable free energy objective of truncated posteriors. While this novel approach is applicable to any model with binary latents and tractable joint probability, as a proof of concept, we here apply it to the optimization of parameters of noisy-OR Bayes Nets (modeling binary data) and Binary Sparse Coding (modelling continuous data). We find that the data likelihood is efficiently improved by employing genetic algorithms with point mutations and single-point cross-over as EAs. In general we believe that, with the novel link established here, standard as well as recent results in evolutionary optimization can be leveraged to address the difficult problem of parameter optimization in generative models.
- A. J. Bell and T. J. Sejnowski. 1997. The "independent components" of natural scenes are edge filters. Vision Research 37, 23 (1997), 3327--38.Google ScholarCross Ref
- Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Transactions on Image Processing 16, 8 (2007), 2080--2095. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion). Journal of the Royal Statistical Society B 39 (1977), 1--38.Google ScholarCross Ref
- G. Exarchakis and J. Lücke. 2017. Discrete Sparse Coding. Neural Computation 29 (2017), 2979--3013. Google ScholarDigital Library
- L. J. Fogel, A. J. Owens, and M. J. Walsh. 1966. Artificial intelligence through simulated evolution. (1966).Google Scholar
- Peter Földiak. 1990. Forming sparse representations by local anti-Hebbian learning. Biological cybernetics 64, 2 (1990), 165--170. Google ScholarDigital Library
- D. Forster and J. Lücke. 2017. Truncated Variational EM for Semi-Supervised Neural Simpletrons. In IJCNN 3769--3776.Google Scholar
- D. Forster, A.-S. Sheikh, and J. Lücke. 2018. Neural Simpletrons - Learning in the Limit of Few Labels with Directed Generative Networks. Neural Computation, in press (2018).Google Scholar
- M. Haft, R. Hofman, and V. Tresp. 2004. Generative binary codes. Pattern Anal Appl 6 (2004), 269--84. Google ScholarDigital Library
- M. Henniges, G. Puertas, J. Bornschein, J. Eggert, and J. Lücke. 2010. Binary Sparse Coding. In Proceedings LVA/ICA (LNCS 6365). Springer, 450--57. Google ScholarDigital Library
- P. O. Hoyer. 2003. Modeling receptive fields with non-negative sparse coding. Neurocomputing 52--54 (June 2003), 547--52.Google Scholar
- P. O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research 5 (2004), 1457--69. Google ScholarDigital Library
- E. R. Hruschka, R. JGB Campello, A. A. Freitas, et al. 2009. A survey of evolutionary algorithms for clustering. IEEE Trans. on Systems, Man, and Cybernetics 39, 2 (2009), 133--155. Google ScholarDigital Library
- M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. 1999. An Introduction to Variational Methods for Graphical Models. Machine Learning 37 (1999), 183--233. Google ScholarDigital Library
- Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google Scholar
- I. Loshchilov and F. Hutter. 2016. CMA-ES for hyperparameter optimization of deep neural networks. In ICLR Workshop. 513--520.Google Scholar
- J. Lücke. 2017. Truncated Variational Expectation Maximization. arXiv preprint, arXiv:1610.03113 (2017).Google Scholar
- J. Lücke and J. Eggert. 2010. Expectation Truncation And the Benefits of Preselection in Training Generative Models. JMLR 11 (2010), 2855--900. Google ScholarDigital Library
- J. Lücke and M. Sahani. 2008. Maximal Causes for Non-linear Component Extraction. Journal of Machine Learning Research 9 (2008), 1227--67. Google ScholarDigital Library
- J. W. Myers, K. B. Laskey, and K. A. DeJong. 1999. Learning Bayesian networks from incomplete data using evolutionary algorithms. In Proc. Annual Conference on Genetic and Evolutionary Computation. 458--465. Google ScholarDigital Library
- R. Neal and G. Hinton. 1998. A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants. In Learning in Graphical Models, M. I. Jordan (Ed.). Kluwer. Google ScholarDigital Library
- B. Olshausen and D. Field. 1996. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381 (1996), 607--9.Google ScholarCross Ref
- B. A. Olshausen and D. J. Field. 1997. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research 37, 23 (1997), 3311--3325.Google ScholarCross Ref
- M. Opper and O. Winther. 2005. Expectation consistent approximate inference. JMLR 6 (2005), 2177--04. Google ScholarDigital Library
- A. B. Patel, T. Nguyen, and R. G. Baraniuk. 2016. A probabilistic theory of deep learning. In NIPS. 2558--2566. Google ScholarDigital Library
- F. Pernkopf and D. Bouchaffra. 2005. Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1344--1348. Google ScholarDigital Library
- E. Real, S. Moore, A. Seile, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin. 2017. Large-Scale Evolution of Image Classifiers. In ICML. 2902--2911.Google Scholar
- I. Rechenberg. 1965. Cybernetic solution path of an experimental problem. (1965).Google Scholar
- M. Rotmensch, Y. Halpern, A. Tlimat, S. Horng, and D. Sontag. 2017. Learning a health knowledge graph from electronic medical records. Scientific reports 7, 1 (2017), 5994.Google Scholar
- S. Roweis. 1998. EM algorithms for PCA and SPCA. NIPS (1998), 626--32. Google ScholarDigital Library
- T. Salimans, J. Ho, X. Chen, and I. Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).Google Scholar
- L. K. Saul and M Jordan. 1996. Exploiting tractable substructures in intractable networks. NIPS (1996), 486--492. Google ScholarDigital Library
- J. Schmidhuber. 2015. Deep learning in neural networks. Neural networks 61 (2015), 85--117. Google ScholarDigital Library
- A.-S. Sheikh, J. A. Shelton, and J. Lücke. 2014. A Truncated EM Approach for Spike-and-Slab Sparse Coding. Journal of Machine Learning Research 15 (2014), 2653--2687. Google ScholarDigital Library
- J. A. Shelton, J. Gasthaus, Z. Dai, J. Lücke, and A. Gretton. 2017. GP-select: Accelerating EM using adaptive subspace preselection. Neural Computation 29, 8 (2017), 2177--2202. Google ScholarDigital Library
- T. Singliar and M. Hauskrecht. 2006. Noisy-OR Component Analysis and its Application to Link Analysis. JMLR (2006), 2189--2213. Google ScholarDigital Library
- M. W. Spratling. 1999. Pre-synaptic lateral inhibition provides a better architecture for self-organising neural networks. Network: Computation in Neural Systems 10 (1999), 285 -- 301.Google ScholarCross Ref
- M. W. Spratling, K. De Meyer, and R. Kompass. 2009. Unsupervised learning of overlapping image components using divisive input modulation. Computational Intelligence and Neuroscience, Article 1 (2009), 19 pages. Google ScholarDigital Library
- K. O. Stanley and R. Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computing 10, 2 (2002), 99--127. Google ScholarDigital Library
- M. Suganuma, S. Shirakawa, and T. Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In GECCO. 497--504. Google ScholarDigital Library
- M. Tipping and C. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society. Series B 61 (1999).Google Scholar
- Michalis K. Titsias and Miguel Lázaro-Gredilla. 2011. Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning. In NIPS, Vol. 24. Google ScholarDigital Library
- J. H. van Hateren and A. van der Schaaf. 1998. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London B 265 (1998), 359--66.Google ScholarCross Ref
Index Terms
- Evolutionary expectation maximization
Recommendations
Rigid and Articulated Point Registration with Expectation Conditional Maximization
This paper addresses the issue of matching rigid and articulated shapes through probabilistic point registration. The problem is recast into a missing data framework where unknown correspondences are handled via mixture models. Adopting a maximum ...
Segmentation of clusters by template rotation expectation maximization
Highlights- We introduce TREM, an algorithm for resolving clusters of nearly identical objects.
AbstractTo solve the task of segmenting clusters of nearly identical objects we here present the template rotation expectation maximization (TREM) approach which is based on a generative model. We explore both a general purpose optimization ...
One-Shot Learning of Object Categories
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather ...
Comments