skip to main content
10.1145/3205455.3205539acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Where are we now?: a large benchmark study of recent symbolic regression methods

Published:02 July 2018Publication History

ABSTRACT

In this paper we provide a broad benchmarking of recent genetic programming approaches to symbolic regression in the context of state of the art machine learning approaches. We use a set of nearly 100 regression benchmark problems culled from open source repositories across the web. We conduct a rigorous benchmarking of four recent symbolic regression approaches as well as nine machine learning approaches from scikit-learn. The results suggest that symbolic regression performs strongly compared to state-of-the-art gradient boosting algorithms, although in terms of running times is among the slowest of the available methodologies. We discuss the results in detail and point to future research directions that may allow symbolic regression to gain wider adoption in the machine learning community.

References

  1. Ignacio Arnaldo, Krzysztof Krawiec, and Una-May O'Reilly. 2014. Multiple regression genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, 879--886. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J.C. Bongard and H. Lipson. 2005. Nonlinear System Identification Using Coevolution of Models and Tests. IEEE Transactions on Evolutionary Computation 9, 4 (Aug. 2005), 361--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Mauro Castelli, Sara Silva, and Leonardo Vanneschi. 2015. A C++ framework for geometric semantic genetic programming. Genetic Programming and Evolvable Machines 16, 1 (March 2015), 73--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Grant Dick, Aysha P. Rimoni, and Peter A. Whigham. 2015. A Re-Examination of the Use of Genetic Programming on the Oral Bioavailability Problem. ACM Press, 1015--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM 55, 10 (2012), 78--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Harris Drucker. 1997. Improving regressors using boosting techniques. In ICML, Vol. 97. 107--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chris Drummond and Nathalie Japkowicz. 2010. Warning: statistical benchmarking is addictive. Kicking the habit in machine learning. Journal of Experimental & Theoretical Artificial Intelligence 22, 1 (March 2010), 67--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani, and others. 2004. Least angle regression. The Annals of statistics 32, 2 (2004), 407--499.Google ScholarGoogle ScholarCross RefCross Ref
  11. Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res 15, 1 (2014), 3133--3181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yoav Freund and Robert E Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55, 1 (1997), 119--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google ScholarGoogle Scholar
  14. Geoffrey E Hinton. 1989. Connectionist Learning Procedures. Artificial Intelligence 40 (1989), 185--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gregory S Hornby. 2006. ALPS: the age-layered population structure for reducing the problem of premature convergence. In Proceedings of the 8th annual conference on Genetic and evolutionary computation. ACM, 815--822. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  17. Michael F. Korns. 2011. Accuracy in symbolic regression. In Genetic Programming Theory and Practice IX. Springer, 129--151. http://link.springer.com/chapter/10.1007/978-1-4614-1770-5_8Google ScholarGoogle Scholar
  18. William La Cava, Kourosh Danai, and Lee Spector. 2016. Inference of compact nonlinear dynamic models by epigenetic local search. Engineering Applications of Artificial Intelligence 55 (Oct. 2016), 292--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. William La Cava, Kourosh Danai, Lee Spector, Paul Fleming, Alan Wright, and Matthew Lackner. 2016. Automatic identification of wind turbine models using evolutionary multiobjective optimization. Renewable Energy 87, Part 2 (March 2016), 892--902.Google ScholarGoogle Scholar
  20. William La Cava, Lee Spector, and Kourosh Danai. 2016. Epsilon-Lexicase Selection for Regression. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 (GECCO '16). ACM, New York, NY, USA, 741--748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. James McDermott, David R. White, Sean Luke, Luca Manzoni, Mauro Castelli, Leonardo Vanneschi, Wojciech Jaskowski, Krzysztof Krawiec, Robin Harper, and Kenneth De Jong. 2012. Genetic programming needs better benchmarks. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference. ACM, 791--798. http://dl.acm.org/citation.cfm?id=2330273 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alberto Moraglio, Krzysztof Krawiec, and Colin G. Johnson. 2012. Geometric semantic genetic programming. In Parallel Problem Solving from Nature-PPSN XII. Springer, 21--31. http://link.springer.com/chapter/10.1007/978-3-642-32937-1_3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Quang Uy Nguyen, Tuan Anh Pham, Xuan Hoai Nguyen, and James McDermott. 2015. Subtree semantic geometric crossover for genetic programming. Genetic Programming and Evolvable Machines (Oct. 2015), 1--29.Google ScholarGoogle Scholar
  24. Randal S. Olson, William La Cava, Zairah Mustahsan, Akshay Varik, and Jason H. Moore. 2017. Data-driven Advice for Applying Machine Learning to Bioinformatics Problems. In Pacific Symposium on Biocomputing (PSB). http://arxiv.org/abs/1708.05070 arXiv: 1708.05070.Google ScholarGoogle Scholar
  25. Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison. BioData Mining (2017). https://arxiv.org/abs/1703.00512 arXiv preprint arXiv:1703.00512.Google ScholarGoogle Scholar
  26. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christian Robert. 2014. Machine learning, a probabilistic perspective. (2014).Google ScholarGoogle Scholar
  28. Michael Schmidt and Hod Lipson. 2009. Distilling free-form natural laws from experimental data. Science 324, 5923 (2009), 81--85. http://www.sciencemag.org/content/324/5923/81.shortGoogle ScholarGoogle Scholar
  29. Michael Schmidt and Hod Lipson. 2011. Age-fitness pareto optimization. In Genetic Programming Theory and Practice VIII. Springer, 129--146. http://link.springer.com/chapter/10.1007/978-1-4419-7747-2_8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Michael D Schmidt, Ravishankar R Vallabhajosyula, Jerry W Jenkins, Jonathan E Hood, Abhishek S Soni, John P Wikswo, and Hod Lipson. 2011. Automated refinement and inference of analytical models for metabolic networks. Physical Biology 8, 5 (Oct. 2011), 055011.Google ScholarGoogle ScholarCross RefCross Ref
  31. Alex J Smola and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and computing 14, 3 (2004), 199--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lee Spector. 2012. Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion. 401--408. http://dl.acm.org/citation.cfm?id=2330846 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Karolina Stanislawska, Krzysztof Krawiec, and Zbigniew W. Kundzewicz. 2012. Modeling global temperature changes with genetic programming. Computers & Mathematics with Applications 64, 12 (Dec. 2012), 3717--3728. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267--288.Google ScholarGoogle Scholar
  35. Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2014. OpenML: Networked Science in Machine Learning. SIGKDD Explor. Newsl. 15, 2 (June 2014), 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. E.J. Vladislavleva, G.F. Smits, and D. den Hertog. 2009. Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming. IEEE Transactions on Evolutionary Computation 13, 2 (2009), 333--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jan Žegklitz and Petr Pošík. 2017. Symbolic Regression Algorithms with Built-in Linear Regression. arXiv:1701.03641 {cs} (Jan. 2017). http://arxiv.org/abs/1701.03641 arXiv: 1701.03641.Google ScholarGoogle Scholar
  38. David R. White, James McDermott, Mauro Castelli, Luca Manzoni, Brian W. Goldman, Gabriel Kronberger, Wojciech Jaśkowski, Una-May O'Reilly, and Sean Luke. 2012. Better GP benchmarks: community survey results and proposals. Genetic Programming and Evolvable Machines 14, 1 (Dec. 2012), 3--29. D0I Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Where are we now?: a large benchmark study of recent symbolic regression methods

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference
            July 2018
            1578 pages
            ISBN:9781450356183
            DOI:10.1145/3205455

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 2 July 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,669of4,410submissions,38%

            Upcoming Conference

            GECCO '24
            Genetic and Evolutionary Computation Conference
            July 14 - 18, 2024
            Melbourne , VIC , Australia

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader