skip to main content
article
Free Access

Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

Published:01 December 2009Publication History
Skip Abstract Section

Abstract

A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model.

References

  1. T. Akutsu, S. Miyano, and S. Kuhara. A simple greedy algorithm for finding functional relations: Efficient implementation and average case analysis. Theor. Comput. Sci., 292(2):481-495, 2003. doi: http://dx.doi.org/10.1016/S0304-3975(02)00183-4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon. Derandomization via small sample spaces (abstract). In SWAT '96: Proceedings of the 5th Scandinavian Workshop on Algorithm Theory, pages 1-3, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Arpe and E. Mossel. Application of a generalization of Russo's formula to learning from multiple random oracles. Combinatorics, Probability and Computing, to appear. Published online by Cambridge University Press 09 Jul 2009, doi:10.1017/S0963548309990277. Preliminary version published at http://arxiv.org/abs/0804.3817, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Arpe and R. Reischuk. When does greedy learning of relevant attributes succeed? In COCOON '07: Proceedings of the 13th Annual International Conference on Computing and Combinatorics, volume 4598 of Lecture Notes in Computer Science, pages 296-306. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Bach. Improved asymptotic formulas for counting correlation immune Boolean functions. SIAM J. Discrete Math., to appear. Preliminary version appeared as Technical Report Number 1616, Computer Sciences Dept., University of Wisconsin-Madison, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Blum. Learning a function of r relevant variables. In COLT/Kernel '03: Learning Theory and KernelMachines, Proceedings of the 16th Annual Conference on Learning Theory and 7th Kernel Workshop, Lecture Notes In Artificial Intelligence: Vol. 2777, pages 731-733. Springer Verlag, 2003.Google ScholarGoogle Scholar
  7. A. Blum, M. Furst, J. Jackson, M. Kearns, Y. Mansour, and S. Rudich. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In STOC '94: Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, pages 253-262, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Blum, L. Hellerstein, and N. Littlestone. Learning in the presence of finitely or infinitely many irrelevant attributes. J. Comput. Syst. Sci., 50(1):32-40, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.Google ScholarGoogle Scholar
  10. L. Brynielsson. A short proof of the Xiao-Massey lemma. IEEE Transactions on Information Theory, 35(6):1344-1344, 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. H. Bshouty and V. Feldman. On using extended statistical queries to avoid membership queries. J. Mach. Learn. Res., 2:359-395, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. H. Bshouty and L. Hellerstein. Attribute-efficient learning in query and mistake-bound models. J. Comput. Syst. Sci., 56(3):310-319, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Camion, C. Carlet, P. Charpin, and N. Sendrier. On correlation-immune functions. In CRYPTO '91: Advances in Cryptology, pages 86-100. Springer-Verlag, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Damaschke. Adaptive versus nonadaptive attribute-efficient learning. Machine Learning, 41(2): 197-215, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. V. Denisov. An asymptotic formula for the number of correlation-immune of order k Boolean functions. Discrete Math. Appls., 2(4):407-426, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  16. V. Feldman, P. Gopalan, S. Khot, and A. K. Ponnuswami. New results for learning noisy parities and halfspaces. In FOCS '06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 563-574, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Fukagawa and T. Akutsu. Performance analysis of a greedy algorithm for inferring Boolean functions. Inf. Process. Lett., 93(1):7-12, 2005. doi: http://dx.doi.org/10.1016/j.ipl.2004.09.017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. L. Furst, J. C. Jackson, and S. W. Smith. Improved learning of AC0functions. In COLT '91: Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 317-325, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S.W. Golomb. On the classification of boolean functions. IRE Transactions on Information Theory, IT-5:176-186, 1959.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. W. Golomb. On the cryptanalysis of nonlinear sequences. In IMA - Cryptography and Coding '99, Lecture Notes In Computer Science: Vol. 1746, pages 236-242. Springer-Verlag, 1999.Google ScholarGoogle Scholar
  21. D. Guijarro, J. Tarui, and T. Tsukiji. Finding relevant variables in PAC model with membership queries. In ALT '99: Proceedings of the 10th International Conference on Algorithmic Learning Theory, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Jackson. On the efficiency of noise-tolerant PAC algorithms derived from statistical queries. Annals of Math. and Artificial Intell., 39(3):291-313, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Lantz, S. Ray, and D. Page. Learning Bayesian network structure from correlation immune data. In UAI '07: Proceedings of the 23rd International Conference on Uncertainty in Artificial Intelligence, 2007.Google ScholarGoogle Scholar
  24. Y. Mansour. Learning Boolean functions via the Fourier transform. Theoretical Advances in Neural Computation and Learning, pages 391-424, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  25. H. L. Montgomery. Ten Lectures on the Interface Between Analytic Number Theory and Harmonic Analysis. AMS, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  26. E. Mossel, R. O'Donnell, and R. A. Servedio. Learning juntas. In STOC '03: Proceedings of the 35th Annual Symposium on the Theory of Computing, pages 206-212, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. M. Odlyzko. The rise and fall of knapsack cryptosystems. In Cryptology and Computational Number Theory: Proceedings of Symposia in Applied Mathematics, volume 42, pages 79-88. AMS, 1980.Google ScholarGoogle Scholar
  28. D. Page and S. Ray. Skewing: An efficient alternative to lookahead for decision tree induction. In IJCAI: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 601-612, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. M. Palmer, R. C. Read, and R. W. Robinson. Balancing the n-cube: a census of colorings. J. Algebraic Combin., 1:257-273, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. I. Pinelis. An approach to inequalities for the distributions of infinite-dimensional martingales. In R. M. Dudley, M. G. Hahn, and J. Kuelbs, editors, Probability in Banach Spaces, pages 128-134. Birkhauser, 1992.Google ScholarGoogle Scholar
  31. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Ray and D. Page. Sequential skewing: An improved skewing algorithm. In ICML '04: Proceedings of the 21st International Conference on Machine Learning, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Ray and D. Page. Generalized skewing for functions with continuous and nominal variables. In ICML '05: Proceedings of the 22nd International Conference on Machine Learning, pages 705-712, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Ray, E. Lantz, B. Rosell, L. Hellerstein, and D. Page. Learning correlation immune functions by skewing: An empirical evaluation. Unpublished manuscript, 2009.Google ScholarGoogle Scholar
  35. B. Rosell, L. Hellerstein, S. Ray, and D. Page. Why skewing works: Learning difficult functions with greedy tree learners. In ICML '05: Proceedings of the 22nd International Conference on Machine Learning, pages 728-735, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. B. Rosser and L. Schoenfeld. Approximate formulas for some functions of prime numbers. Illinois J. Math., 6:64-94, 1962.Google ScholarGoogle Scholar
  37. B. Roy. A brief outline of research on correlation immune functions. In Proceedings of the 7th Australian Conference on Information Security and Privacy, Lecture Notes In Computer Science: Vol. 2384, pages 379-384. Springer Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Siegenthaler. Correlation-immunity of nonlinear combining functions for cryptographic applications. IEEE Transactions on Information Theory, IT-30(5):776-780, 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G.-Z. Xiao and J. L. Massey. A spectral characterization of correlation-immune combining functions. IEEE Transactions on Information Theory, 34(3):569-570, 1988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. Yang. On learning correlated boolean functions using statistical queries. In ALT '01: Proceedings of the 12th International Conference on Algorithmic Learning Theory, volume 2225 of Lecture Notes in Computer Science, pages 59-76. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. Yang. New lower bounds for statistical query learning. J. Comput. Syst. Sci., 70(4):485-509, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image The Journal of Machine Learning Research
        The Journal of Machine Learning Research  Volume 10, Issue
        12/1/2009
        2936 pages
        ISSN:1532-4435
        EISSN:1533-7928
        Issue’s Table of Contents

        Publisher

        JMLR.org

        Publication History

        • Published: 1 December 2009
        Published in jmlr Volume 10, Issue

        Qualifiers

        • article
      • Article Metrics

        • Downloads (Last 12 months)5
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader