skip to main content
10.1145/3205455.3205588acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Evolutionary expectation maximization

Published:02 July 2018Publication History

ABSTRACT

We establish a link between evolutionary algorithms (EAs) and learning of probabilistic generative models with binary hidden variables. Learning is formulated as approximate maximum likelihood optimization using variational expectation maximization. When choosing truncated posteriors as variational distributions, the variational parameters take the form of sets of latent states. By (A) identifying these sets with populations of genomes, and by (B) defining the fitness of individuals as the joint probability of the chosen generative model, the sets of latent states can be optimized using EAs. We obtain scalable learning algorithms that effectively improve the tractable free energy objective of truncated posteriors. While this novel approach is applicable to any model with binary latents and tractable joint probability, as a proof of concept, we here apply it to the optimization of parameters of noisy-OR Bayes Nets (modeling binary data) and Binary Sparse Coding (modelling continuous data). We find that the data likelihood is efficiently improved by employing genetic algorithms with point mutations and single-point cross-over as EAs. In general we believe that, with the novel link established here, standard as well as recent results in evolutionary optimization can be leveraged to address the difficult problem of parameter optimization in generative models.

References

  1. A. J. Bell and T. J. Sejnowski. 1997. The "independent components" of natural scenes are edge filters. Vision Research 37, 23 (1997), 3327--38.Google ScholarGoogle ScholarCross RefCross Ref
  2. Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Transactions on Image Processing 16, 8 (2007), 2080--2095. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion). Journal of the Royal Statistical Society B 39 (1977), 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  4. G. Exarchakis and J. Lücke. 2017. Discrete Sparse Coding. Neural Computation 29 (2017), 2979--3013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. J. Fogel, A. J. Owens, and M. J. Walsh. 1966. Artificial intelligence through simulated evolution. (1966).Google ScholarGoogle Scholar
  6. Peter Földiak. 1990. Forming sparse representations by local anti-Hebbian learning. Biological cybernetics 64, 2 (1990), 165--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Forster and J. Lücke. 2017. Truncated Variational EM for Semi-Supervised Neural Simpletrons. In IJCNN 3769--3776.Google ScholarGoogle Scholar
  8. D. Forster, A.-S. Sheikh, and J. Lücke. 2018. Neural Simpletrons - Learning in the Limit of Few Labels with Directed Generative Networks. Neural Computation, in press (2018).Google ScholarGoogle Scholar
  9. M. Haft, R. Hofman, and V. Tresp. 2004. Generative binary codes. Pattern Anal Appl 6 (2004), 269--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Henniges, G. Puertas, J. Bornschein, J. Eggert, and J. Lücke. 2010. Binary Sparse Coding. In Proceedings LVA/ICA (LNCS 6365). Springer, 450--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. O. Hoyer. 2003. Modeling receptive fields with non-negative sparse coding. Neurocomputing 52--54 (June 2003), 547--52.Google ScholarGoogle Scholar
  12. P. O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research 5 (2004), 1457--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. R. Hruschka, R. JGB Campello, A. A. Freitas, et al. 2009. A survey of evolutionary algorithms for clustering. IEEE Trans. on Systems, Man, and Cybernetics 39, 2 (2009), 133--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. 1999. An Introduction to Variational Methods for Graphical Models. Machine Learning 37 (1999), 183--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google ScholarGoogle Scholar
  16. I. Loshchilov and F. Hutter. 2016. CMA-ES for hyperparameter optimization of deep neural networks. In ICLR Workshop. 513--520.Google ScholarGoogle Scholar
  17. J. Lücke. 2017. Truncated Variational Expectation Maximization. arXiv preprint, arXiv:1610.03113 (2017).Google ScholarGoogle Scholar
  18. J. Lücke and J. Eggert. 2010. Expectation Truncation And the Benefits of Preselection in Training Generative Models. JMLR 11 (2010), 2855--900. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Lücke and M. Sahani. 2008. Maximal Causes for Non-linear Component Extraction. Journal of Machine Learning Research 9 (2008), 1227--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. W. Myers, K. B. Laskey, and K. A. DeJong. 1999. Learning Bayesian networks from incomplete data using evolutionary algorithms. In Proc. Annual Conference on Genetic and Evolutionary Computation. 458--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Neal and G. Hinton. 1998. A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants. In Learning in Graphical Models, M. I. Jordan (Ed.). Kluwer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Olshausen and D. Field. 1996. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381 (1996), 607--9.Google ScholarGoogle ScholarCross RefCross Ref
  23. B. A. Olshausen and D. J. Field. 1997. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research 37, 23 (1997), 3311--3325.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Opper and O. Winther. 2005. Expectation consistent approximate inference. JMLR 6 (2005), 2177--04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. B. Patel, T. Nguyen, and R. G. Baraniuk. 2016. A probabilistic theory of deep learning. In NIPS. 2558--2566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Pernkopf and D. Bouchaffra. 2005. Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1344--1348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Real, S. Moore, A. Seile, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin. 2017. Large-Scale Evolution of Image Classifiers. In ICML. 2902--2911.Google ScholarGoogle Scholar
  28. I. Rechenberg. 1965. Cybernetic solution path of an experimental problem. (1965).Google ScholarGoogle Scholar
  29. M. Rotmensch, Y. Halpern, A. Tlimat, S. Horng, and D. Sontag. 2017. Learning a health knowledge graph from electronic medical records. Scientific reports 7, 1 (2017), 5994.Google ScholarGoogle Scholar
  30. S. Roweis. 1998. EM algorithms for PCA and SPCA. NIPS (1998), 626--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Salimans, J. Ho, X. Chen, and I. Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).Google ScholarGoogle Scholar
  32. L. K. Saul and M Jordan. 1996. Exploiting tractable substructures in intractable networks. NIPS (1996), 486--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Schmidhuber. 2015. Deep learning in neural networks. Neural networks 61 (2015), 85--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A.-S. Sheikh, J. A. Shelton, and J. Lücke. 2014. A Truncated EM Approach for Spike-and-Slab Sparse Coding. Journal of Machine Learning Research 15 (2014), 2653--2687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. A. Shelton, J. Gasthaus, Z. Dai, J. Lücke, and A. Gretton. 2017. GP-select: Accelerating EM using adaptive subspace preselection. Neural Computation 29, 8 (2017), 2177--2202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Singliar and M. Hauskrecht. 2006. Noisy-OR Component Analysis and its Application to Link Analysis. JMLR (2006), 2189--2213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. W. Spratling. 1999. Pre-synaptic lateral inhibition provides a better architecture for self-organising neural networks. Network: Computation in Neural Systems 10 (1999), 285 -- 301.Google ScholarGoogle ScholarCross RefCross Ref
  38. M. W. Spratling, K. De Meyer, and R. Kompass. 2009. Unsupervised learning of overlapping image components using divisive input modulation. Computational Intelligence and Neuroscience, Article 1 (2009), 19 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. O. Stanley and R. Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computing 10, 2 (2002), 99--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Suganuma, S. Shirakawa, and T. Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In GECCO. 497--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Tipping and C. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society. Series B 61 (1999).Google ScholarGoogle Scholar
  42. Michalis K. Titsias and Miguel Lázaro-Gredilla. 2011. Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning. In NIPS, Vol. 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. H. van Hateren and A. van der Schaaf. 1998. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London B 265 (1998), 359--66.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Evolutionary expectation maximization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference
          July 2018
          1578 pages
          ISBN:9781450356183
          DOI:10.1145/3205455

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 July 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,669of4,410submissions,38%

          Upcoming Conference

          GECCO '24
          Genetic and Evolutionary Computation Conference
          July 14 - 18, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader