ABSTRACT
Discovery of causal relationships from observational data is a fundamental problem. Roughly speaking, there are two types of methods for causal discovery, constraint-based ones and score-based ones. Score-based methods avoid the multiple testing problem and enjoy certain advantages compared to constraint-based ones. However, most of them need strong assumptions on the functional forms of causal mechanisms, as well as on data distributions, which limit their applicability. In practice the precise information of the underlying model class is usually unknown. If the above assumptions are violated, both spurious and missing edges may result. In this paper, we introduce generalized score functions for causal discovery based on the characterization of general (conditional) independence relationships between random variables, without assuming particular model classes. In particular, we exploit regression in RKHS to capture the dependence in a nonparametric way. The resulting causal discovery approach produces asymptotically correct results in rather general cases, which may have nonlinear causal mechanisms, a wide class of data distributions, mixed continuous and discrete data, and multidimensional variables. Experimental results on both synthetic and real-world data demonstrate the efficacy of our proposed approach.
Supplemental Material
- C. F. Aliferis, A. R. Statnikov, I. Tsamardinos, S. Mani, and X. D. Koutsoukos . 2010. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation. Journal of Machine Learning Research Vol. 11 (2010), 171--234. Google ScholarDigital Library
- F. R. Bach and M. I. Jordan . 2002. Learning graphical models with Mercer kernels. Advances in Neural Information Processing Systems (2002), 1009--1016. Google ScholarDigital Library
- T. E. Bakken, A. M. Dale, and N. J. Schork . 2011. A Geographic Cline of Skull and Brain Morphology among Individuals of European Ancestry. Hum Hered Vol. 72(1) (2011), 35--44.Google Scholar
- P Bühlmann, J. Peters, and J. Ernest . 2014. CAM: Causal Additive Models, high-dimensional order search and penalized regression. Annals of Statistics Vol. 42(6) (2014), 2526--2556.Google ScholarCross Ref
- W. Buntine . 1991. Theory refinment on Bayesian networks. Uncertainty in Artificial Intelligence (1991), 52--60. Google ScholarDigital Library
- A. Caponnetto and E. De Vito . 2006. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics (2006).Google Scholar
- D. M. Chickering . 2003. Optimal Structure Identification With Greedy Search. Journal of Machine Learning Research Vol. 3 (2003), 507--554. Google ScholarDigital Library
- D. M. Chickering and D. Heckerman . 1997. Efficient approximations for the marginal likelihood of bayesian networks with hidden variables. Machine Learning Vol. 29 (1997), 181--212. Google ScholarDigital Library
- T. Claassen and T. Heskes . 2012. A Bayesian approach to constraint based causal inference. Uncertainty in Artificial Intelligence (2012), 207--216. Google ScholarDigital Library
- K. Fukumizu, F. R. Bach, and M. I. Jordan . 2004. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research Vol. 5 (2004), 73--79. Google ScholarDigital Library
- K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf . 2007. Kernel measures of conditional dependence. NIPS Vol. 11 (2007), 489--496. Google ScholarDigital Library
- D. Geiger and D. Heckerman . 1994. Learning Gaussian networks. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence (1994), 235 --243. Google ScholarDigital Library
- D. Heckerman, D. Geiger, and D.M. Chickering . 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning Vol. 20 (1995), 197--243. Google ScholarDigital Library
- D. Heckerman, C. Meek, and G. Cooper . 2006. A Bayesian approach to causal discovery. Innovations in Machine Learning (2006), 1--28.Google Scholar
- P. Hoyer, D. Janzing, J. Mooji, Peters J., and B. Schölkopf . 2009. Nonlinear causal discovery with additive noise models. NIPS (2009). Google ScholarDigital Library
- B. Huang, K. Zhang, J. Zhang, R. Sanchez-Romero, C. Glymour, and B. Schölkopf . 2017. Behind Distribution Shift: Mining Driving Forces of Changes and Causal Arrows. ICDM (2017), 913--918.Google Scholar
- A. Hyttinen, F. Eberhardt, and M. J"arvisalo . 2014. Constraint-based causal discovery: Conflict resolution with answer set programming. Uncertainty in Artificial Intelligence (2014), 340--349. Google ScholarDigital Library
- A. Hyv"arinen and S.n M. Smith . 2013. Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research Vol. 14 (2013), 111--152. Google ScholarDigital Library
- S. Imoto, T. Goto, and S. Miyano . 2002. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pacific Symposium on Biocomputing (2002), 175--186.Google Scholar
- M. V. D. Laan, S. Dudoit, and S. Keles . 2004. Asymptotic optimality of likelihood-based cross-validation. Statistical Applications in Genetics and Molecular Biology Vol. 3(1) (2004), 1--23.Google Scholar
- S. Meiri and T. Dayan . 2003. On the validity of Bergmann's rule. Journal of Biogeography Vol. 30(3) (2003), 331--351.Google ScholarCross Ref
- J. Pearl . 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press New York. Google ScholarDigital Library
- A. N.V. Ruigrok, G. S. Khorshidi, M. Lai, S. B. Cohen, M. V. Lombardo, R. J. Tait, and J. Suckling . 2014. A meta-analysis of sex differences in human brain structure. Neuroscience and Biobehavioral Reviews Vol. 39 (2014), 34--50.Google ScholarCross Ref
- B. Schölkopf and A. J. Smola . 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA. Google ScholarDigital Library
- G. E. Schwarz . 1978. Estimating the dimension of a model. Annals of Statistics Vol. 6(2) (1978), 461--464.Google ScholarCross Ref
- E. Sokolova, P. Groot, T. Claassen, and T. Heskes . 2014. Causal discovery from databases with discrete and continuous variables. Workshop on Probabilistic Graphical Models (2014), 442--457.Google ScholarCross Ref
- P. Spirtes . 2010. Introduction to Causal Inference. Journal of Machine Learning Research Vol. 11 (2010), 1643--1662. Google ScholarDigital Library
- P. Spirtes, C. Glymour, and R. Scheines . 1993. Causation, Prediction, and Search. Spring-Verlag Lectures in Statistics.Google Scholar
- P. Spirtes and K. Zhang . 2016. Causal discovery and inference: Concepts and recent methodological advances. Applied Informatics Vol. 3(3) (2016).Google Scholar
- M. Springmann, D. Mason-DCroz, S. Robinson, P. Ballon, T. Garnett, and C. Godfray . 2016. The global and regional health impacts of future food production under climate change. The Lancet Vol. 387 (10031) (2016), 1937--1946.Google Scholar
- I. Tsamardinos, L. E. Brown, and C. F. Aliferis . 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning Vol. 65(1) (2006), 31--78. Google ScholarDigital Library
- K. Zhang, B. Huang, J. Zhang, C. Glymour, and B. Schölkopf . 2017. Causal discovery from nonstationary/heterogeneous data: Skeleton estimation and orientation determination. IJCAI (2017). Google ScholarDigital Library
- K Zhang and A. Hyv"arinen . 2009 a. Causality discovery with additive disturbances: An information-theoretical perspective. Machine learning and knowledge discovery in databases (2009), 570--585.Google Scholar
- K. Zhang and A. Hyv"arinen . 2009 b. On the identifiability of the post-nonlinear causal model. UAI (2009), 647--655. Google ScholarDigital Library
- K. Zhang, J. Peters, D. Janzing, and B. Schölkopf . 2011. Kernel-based conditional independence test and application in causal discovery. Uncertainty in Artificial Intelligence (2011), 804--813. Google ScholarDigital Library
- K. Zhang, B. Schölkopf, P. Spirtes, and C. Glymour . 2018. Learning causality and causality-related learning: some recent progress. National Science Review Vol. 5(1) (2018), 26--29.Google Scholar
Index Terms
- Generalized Score Functions for Causal Discovery
Recommendations
Generalised Partial Association in Causal Rules Discovery
Progress in Artificial IntelligenceAbstractOne of the most significant challenges for machine learning nowadays is the discovery of causal relationships from data. This causal discovery is commonly performed using Bayesian like algorithms. However, more recently, more and more causal ...
Causal Discovery via Causal Star Graphs
Discovering causal relationships among observed variables is an important research focus in data mining. Existing causal discovery approaches are mainly based on constraint-based methods and functional causal models (FCMs). However, the constraint-based ...
Disentangling causality: assumptions in causal discovery and inference
AbstractCausality has been a burgeoning field of research leading to the point where the literature abounds with different components addressing distinct parts of causality. For researchers, it has been increasingly difficult to discern the assumptions ...
Comments