skip to main content
10.1145/3219819.3220104acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Generalized Score Functions for Causal Discovery

Published:19 July 2018Publication History

ABSTRACT

Discovery of causal relationships from observational data is a fundamental problem. Roughly speaking, there are two types of methods for causal discovery, constraint-based ones and score-based ones. Score-based methods avoid the multiple testing problem and enjoy certain advantages compared to constraint-based ones. However, most of them need strong assumptions on the functional forms of causal mechanisms, as well as on data distributions, which limit their applicability. In practice the precise information of the underlying model class is usually unknown. If the above assumptions are violated, both spurious and missing edges may result. In this paper, we introduce generalized score functions for causal discovery based on the characterization of general (conditional) independence relationships between random variables, without assuming particular model classes. In particular, we exploit regression in RKHS to capture the dependence in a nonparametric way. The resulting causal discovery approach produces asymptotically correct results in rather general cases, which may have nonlinear causal mechanisms, a wide class of data distributions, mixed continuous and discrete data, and multidimensional variables. Experimental results on both synthetic and real-world data demonstrate the efficacy of our proposed approach.

Skip Supplemental Material Section

Supplemental Material

huang_causal_discovery.mp4

mp4

408 MB

References

  1. C. F. Aliferis, A. R. Statnikov, I. Tsamardinos, S. Mani, and X. D. Koutsoukos . 2010. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation. Journal of Machine Learning Research Vol. 11 (2010), 171--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. R. Bach and M. I. Jordan . 2002. Learning graphical models with Mercer kernels. Advances in Neural Information Processing Systems (2002), 1009--1016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. E. Bakken, A. M. Dale, and N. J. Schork . 2011. A Geographic Cline of Skull and Brain Morphology among Individuals of European Ancestry. Hum Hered Vol. 72(1) (2011), 35--44.Google ScholarGoogle Scholar
  4. P Bühlmann, J. Peters, and J. Ernest . 2014. CAM: Causal Additive Models, high-dimensional order search and penalized regression. Annals of Statistics Vol. 42(6) (2014), 2526--2556.Google ScholarGoogle ScholarCross RefCross Ref
  5. W. Buntine . 1991. Theory refinment on Bayesian networks. Uncertainty in Artificial Intelligence (1991), 52--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Caponnetto and E. De Vito . 2006. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics (2006).Google ScholarGoogle Scholar
  7. D. M. Chickering . 2003. Optimal Structure Identification With Greedy Search. Journal of Machine Learning Research Vol. 3 (2003), 507--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. M. Chickering and D. Heckerman . 1997. Efficient approximations for the marginal likelihood of bayesian networks with hidden variables. Machine Learning Vol. 29 (1997), 181--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Claassen and T. Heskes . 2012. A Bayesian approach to constraint based causal inference. Uncertainty in Artificial Intelligence (2012), 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Fukumizu, F. R. Bach, and M. I. Jordan . 2004. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research Vol. 5 (2004), 73--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf . 2007. Kernel measures of conditional dependence. NIPS Vol. 11 (2007), 489--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Geiger and D. Heckerman . 1994. Learning Gaussian networks. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence (1994), 235 --243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Heckerman, D. Geiger, and D.M. Chickering . 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning Vol. 20 (1995), 197--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Heckerman, C. Meek, and G. Cooper . 2006. A Bayesian approach to causal discovery. Innovations in Machine Learning (2006), 1--28.Google ScholarGoogle Scholar
  15. P. Hoyer, D. Janzing, J. Mooji, Peters J., and B. Schölkopf . 2009. Nonlinear causal discovery with additive noise models. NIPS (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Huang, K. Zhang, J. Zhang, R. Sanchez-Romero, C. Glymour, and B. Schölkopf . 2017. Behind Distribution Shift: Mining Driving Forces of Changes and Causal Arrows. ICDM (2017), 913--918.Google ScholarGoogle Scholar
  17. A. Hyttinen, F. Eberhardt, and M. J"arvisalo . 2014. Constraint-based causal discovery: Conflict resolution with answer set programming. Uncertainty in Artificial Intelligence (2014), 340--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Hyv"arinen and S.n M. Smith . 2013. Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research Vol. 14 (2013), 111--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Imoto, T. Goto, and S. Miyano . 2002. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pacific Symposium on Biocomputing (2002), 175--186.Google ScholarGoogle Scholar
  20. M. V. D. Laan, S. Dudoit, and S. Keles . 2004. Asymptotic optimality of likelihood-based cross-validation. Statistical Applications in Genetics and Molecular Biology Vol. 3(1) (2004), 1--23.Google ScholarGoogle Scholar
  21. S. Meiri and T. Dayan . 2003. On the validity of Bergmann's rule. Journal of Biogeography Vol. 30(3) (2003), 331--351.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Pearl . 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. N.V. Ruigrok, G. S. Khorshidi, M. Lai, S. B. Cohen, M. V. Lombardo, R. J. Tait, and J. Suckling . 2014. A meta-analysis of sex differences in human brain structure. Neuroscience and Biobehavioral Reviews Vol. 39 (2014), 34--50.Google ScholarGoogle ScholarCross RefCross Ref
  24. B. Schölkopf and A. J. Smola . 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. E. Schwarz . 1978. Estimating the dimension of a model. Annals of Statistics Vol. 6(2) (1978), 461--464.Google ScholarGoogle ScholarCross RefCross Ref
  26. E. Sokolova, P. Groot, T. Claassen, and T. Heskes . 2014. Causal discovery from databases with discrete and continuous variables. Workshop on Probabilistic Graphical Models (2014), 442--457.Google ScholarGoogle ScholarCross RefCross Ref
  27. P. Spirtes . 2010. Introduction to Causal Inference. Journal of Machine Learning Research Vol. 11 (2010), 1643--1662. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Spirtes, C. Glymour, and R. Scheines . 1993. Causation, Prediction, and Search. Spring-Verlag Lectures in Statistics.Google ScholarGoogle Scholar
  29. P. Spirtes and K. Zhang . 2016. Causal discovery and inference: Concepts and recent methodological advances. Applied Informatics Vol. 3(3) (2016).Google ScholarGoogle Scholar
  30. M. Springmann, D. Mason-DCroz, S. Robinson, P. Ballon, T. Garnett, and C. Godfray . 2016. The global and regional health impacts of future food production under climate change. The Lancet Vol. 387 (10031) (2016), 1937--1946.Google ScholarGoogle Scholar
  31. I. Tsamardinos, L. E. Brown, and C. F. Aliferis . 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning Vol. 65(1) (2006), 31--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Zhang, B. Huang, J. Zhang, C. Glymour, and B. Schölkopf . 2017. Causal discovery from nonstationary/heterogeneous data: Skeleton estimation and orientation determination. IJCAI (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K Zhang and A. Hyv"arinen . 2009 a. Causality discovery with additive disturbances: An information-theoretical perspective. Machine learning and knowledge discovery in databases (2009), 570--585.Google ScholarGoogle Scholar
  34. K. Zhang and A. Hyv"arinen . 2009 b. On the identifiability of the post-nonlinear causal model. UAI (2009), 647--655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Zhang, J. Peters, D. Janzing, and B. Schölkopf . 2011. Kernel-based conditional independence test and application in causal discovery. Uncertainty in Artificial Intelligence (2011), 804--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Zhang, B. Schölkopf, P. Spirtes, and C. Glymour . 2018. Learning causality and causality-related learning: some recent progress. National Science Review Vol. 5(1) (2018), 26--29.Google ScholarGoogle Scholar

Index Terms

  1. Generalized Score Functions for Causal Discovery

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
            July 2018
            2925 pages
            ISBN:9781450355520
            DOI:10.1145/3219819

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 July 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader