research-article

Evolutionary expectation maximization

Authors:
Enrico Guiraud

University of Oldenburg and CERN

University of Oldenburg and CERN
View Profile

,
Jakob Drefs

University of Oldenburg

University of Oldenburg
View Profile

,
Jörg Lücke

University of Oldenburg

University of Oldenburg
View Profile

GECCO '18: Proceedings of the Genetic and Evolutionary Computation ConferenceJuly 2018Pages 442–449https://doi.org/10.1145/3205455.3205588

Published:02 July 2018Publication History

GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 442–449

ABSTRACT

We establish a link between evolutionary algorithms (EAs) and learning of probabilistic generative models with binary hidden variables. Learning is formulated as approximate maximum likelihood optimization using variational expectation maximization. When choosing truncated posteriors as variational distributions, the variational parameters take the form of sets of latent states. By (A) identifying these sets with populations of genomes, and by (B) defining the fitness of individuals as the joint probability of the chosen generative model, the sets of latent states can be optimized using EAs. We obtain scalable learning algorithms that effectively improve the tractable free energy objective of truncated posteriors. While this novel approach is applicable to any model with binary latents and tractable joint probability, as a proof of concept, we here apply it to the optimization of parameters of noisy-OR Bayes Nets (modeling binary data) and Binary Sparse Coding (modelling continuous data). We find that the data likelihood is efficiently improved by employing genetic algorithms with point mutations and single-point cross-over as EAs. In general we believe that, with the novel link established here, standard as well as recent results in evolutionary optimization can be leveraged to address the difficult problem of parameter optimization in generative models.

References

A. J. Bell and T. J. Sejnowski. 1997. The "independent components" of natural scenes are edge filters. Vision Research 37, 23 (1997), 3327--38.Google ScholarCross Ref
Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Transactions on Image Processing 16, 8 (2007), 2080--2095. Google ScholarDigital Library
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion). Journal of the Royal Statistical Society B 39 (1977), 1--38.Google ScholarCross Ref
G. Exarchakis and J. Lücke. 2017. Discrete Sparse Coding. Neural Computation 29 (2017), 2979--3013. Google ScholarDigital Library
L. J. Fogel, A. J. Owens, and M. J. Walsh. 1966. Artificial intelligence through simulated evolution. (1966).Google Scholar
Peter Földiak. 1990. Forming sparse representations by local anti-Hebbian learning. Biological cybernetics 64, 2 (1990), 165--170. Google ScholarDigital Library
D. Forster and J. Lücke. 2017. Truncated Variational EM for Semi-Supervised Neural Simpletrons. In IJCNN 3769--3776.Google Scholar
D. Forster, A.-S. Sheikh, and J. Lücke. 2018. Neural Simpletrons - Learning in the Limit of Few Labels with Directed Generative Networks. Neural Computation, in press (2018).Google Scholar
M. Haft, R. Hofman, and V. Tresp. 2004. Generative binary codes. Pattern Anal Appl 6 (2004), 269--84. Google ScholarDigital Library
M. Henniges, G. Puertas, J. Bornschein, J. Eggert, and J. Lücke. 2010. Binary Sparse Coding. In Proceedings LVA/ICA (LNCS 6365). Springer, 450--57. Google ScholarDigital Library
P. O. Hoyer. 2003. Modeling receptive fields with non-negative sparse coding. Neurocomputing 52--54 (June 2003), 547--52.Google Scholar
P. O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research 5 (2004), 1457--69. Google ScholarDigital Library
E. R. Hruschka, R. JGB Campello, A. A. Freitas, et al. 2009. A survey of evolutionary algorithms for clustering. IEEE Trans. on Systems, Man, and Cybernetics 39, 2 (2009), 133--155. Google ScholarDigital Library
M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. 1999. An Introduction to Variational Methods for Graphical Models. Machine Learning 37 (1999), 183--233. Google ScholarDigital Library
Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google Scholar
I. Loshchilov and F. Hutter. 2016. CMA-ES for hyperparameter optimization of deep neural networks. In ICLR Workshop. 513--520.Google Scholar
J. Lücke. 2017. Truncated Variational Expectation Maximization. arXiv preprint, arXiv:1610.03113 (2017).Google Scholar
J. Lücke and J. Eggert. 2010. Expectation Truncation And the Benefits of Preselection in Training Generative Models. JMLR 11 (2010), 2855--900. Google ScholarDigital Library
J. Lücke and M. Sahani. 2008. Maximal Causes for Non-linear Component Extraction. Journal of Machine Learning Research 9 (2008), 1227--67. Google ScholarDigital Library
J. W. Myers, K. B. Laskey, and K. A. DeJong. 1999. Learning Bayesian networks from incomplete data using evolutionary algorithms. In Proc. Annual Conference on Genetic and Evolutionary Computation. 458--465. Google ScholarDigital Library
R. Neal and G. Hinton. 1998. A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants. In Learning in Graphical Models, M. I. Jordan (Ed.). Kluwer. Google ScholarDigital Library
B. Olshausen and D. Field. 1996. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381 (1996), 607--9.Google ScholarCross Ref
B. A. Olshausen and D. J. Field. 1997. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research 37, 23 (1997), 3311--3325.Google ScholarCross Ref
M. Opper and O. Winther. 2005. Expectation consistent approximate inference. JMLR 6 (2005), 2177--04. Google ScholarDigital Library
A. B. Patel, T. Nguyen, and R. G. Baraniuk. 2016. A probabilistic theory of deep learning. In NIPS. 2558--2566. Google ScholarDigital Library
F. Pernkopf and D. Bouchaffra. 2005. Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1344--1348. Google ScholarDigital Library
E. Real, S. Moore, A. Seile, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin. 2017. Large-Scale Evolution of Image Classifiers. In ICML. 2902--2911.Google Scholar
I. Rechenberg. 1965. Cybernetic solution path of an experimental problem. (1965).Google Scholar
M. Rotmensch, Y. Halpern, A. Tlimat, S. Horng, and D. Sontag. 2017. Learning a health knowledge graph from electronic medical records. Scientific reports 7, 1 (2017), 5994.Google Scholar
S. Roweis. 1998. EM algorithms for PCA and SPCA. NIPS (1998), 626--32. Google ScholarDigital Library
T. Salimans, J. Ho, X. Chen, and I. Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).Google Scholar
L. K. Saul and M Jordan. 1996. Exploiting tractable substructures in intractable networks. NIPS (1996), 486--492. Google ScholarDigital Library
J. Schmidhuber. 2015. Deep learning in neural networks. Neural networks 61 (2015), 85--117. Google ScholarDigital Library
A.-S. Sheikh, J. A. Shelton, and J. Lücke. 2014. A Truncated EM Approach for Spike-and-Slab Sparse Coding. Journal of Machine Learning Research 15 (2014), 2653--2687. Google ScholarDigital Library
J. A. Shelton, J. Gasthaus, Z. Dai, J. Lücke, and A. Gretton. 2017. GP-select: Accelerating EM using adaptive subspace preselection. Neural Computation 29, 8 (2017), 2177--2202. Google ScholarDigital Library
T. Singliar and M. Hauskrecht. 2006. Noisy-OR Component Analysis and its Application to Link Analysis. JMLR (2006), 2189--2213. Google ScholarDigital Library
M. W. Spratling. 1999. Pre-synaptic lateral inhibition provides a better architecture for self-organising neural networks. Network: Computation in Neural Systems 10 (1999), 285 -- 301.Google ScholarCross Ref
M. W. Spratling, K. De Meyer, and R. Kompass. 2009. Unsupervised learning of overlapping image components using divisive input modulation. Computational Intelligence and Neuroscience, Article 1 (2009), 19 pages. Google ScholarDigital Library
K. O. Stanley and R. Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computing 10, 2 (2002), 99--127. Google ScholarDigital Library
M. Suganuma, S. Shirakawa, and T. Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In GECCO. 497--504. Google ScholarDigital Library
M. Tipping and C. Bishop. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society. Series B 61 (1999).Google Scholar
Michalis K. Titsias and Miguel Lázaro-Gredilla. 2011. Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning. In NIPS, Vol. 24. Google ScholarDigital Library
J. H. van Hateren and A. van der Schaaf. 1998. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London B 265 (1998), 359--66.Google ScholarCross Ref

Index Terms

Evolutionary expectation maximization
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Mixture modeling
    2. Machine learning approaches
      1. Bio-inspired approaches
        Genetic algorithms
      2. Learning in probabilistic graphical models
        Latent variable models

Recommendations

Rigid and Articulated Point Registration with Expectation Conditional Maximization

This paper addresses the issue of matching rigid and articulated shapes through probabilistic point registration. The problem is recast into a missing data framework where unknown correspondences are handled via mixture models. Adopting a maximum ...
Read More
Segmentation of clusters by template rotation expectation maximization
Highlights
- We introduce TREM, an algorithm for resolving clusters of nearly identical objects.
Abstract
To solve the task of segmenting clusters of nearly identical objects we here present the template rotation expectation maximization (TREM) approach which is based on a generative model. We explore both a general purpose optimization ...
Read More
One-Shot Learning of Object Categories

Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference
July 2018
1578 pages
ISBN:9781450356183
DOI:10.1145/3205455
Editor:
Hernan Aguirre
Shinshu University
,
General Chair:
Keiki Takadama
The University of Electro-Communications
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
BSC
EM
coding
evolutionary
expectation maximization
learning
noisy-OR
optimization
sparse
unsupervised
variational EM
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 353
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evolutionary expectation maximization

GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Rigid and Articulated Point Registration with Expectation Conditional Maximization

Segmentation of clusters by template rotation expectation maximization

One-Shot Learning of Object Categories

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Evolutionary expectation maximization

GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Rigid and Articulated Point Registration with Expectation Conditional Maximization

Segmentation of clusters by template rotation expectation maximization

One-Shot Learning of Object Categories

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media