article

Free Access

Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

The Journal of Machine Learning Research Volume 10pp 2374–2411

Published:01 December 2009Publication History

The Journal of Machine Learning Research

Abstract

A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model.

References

T. Akutsu, S. Miyano, and S. Kuhara. A simple greedy algorithm for finding functional relations: Efficient implementation and average case analysis. Theor. Comput. Sci., 292(2):481-495, 2003. doi: http://dx.doi.org/10.1016/S0304-3975(02)00183-4 Google ScholarDigital Library
N. Alon. Derandomization via small sample spaces (abstract). In SWAT '96: Proceedings of the 5th Scandinavian Workshop on Algorithm Theory, pages 1-3, 1996. Google ScholarDigital Library
J. Arpe and E. Mossel. Application of a generalization of Russo's formula to learning from multiple random oracles. Combinatorics, Probability and Computing, to appear. Published online by Cambridge University Press 09 Jul 2009, doi:10.1017/S0963548309990277. Preliminary version published at http://arxiv.org/abs/0804.3817, 2008. Google ScholarDigital Library
J. Arpe and R. Reischuk. When does greedy learning of relevant attributes succeed? In COCOON '07: Proceedings of the 13th Annual International Conference on Computing and Combinatorics, volume 4598 of Lecture Notes in Computer Science, pages 296-306. Springer, 2007. Google ScholarDigital Library
E. Bach. Improved asymptotic formulas for counting correlation immune Boolean functions. SIAM J. Discrete Math., to appear. Preliminary version appeared as Technical Report Number 1616, Computer Sciences Dept., University of Wisconsin-Madison, 2007. Google ScholarDigital Library
A. Blum. Learning a function of r relevant variables. In COLT/Kernel '03: Learning Theory and KernelMachines, Proceedings of the 16th Annual Conference on Learning Theory and 7th Kernel Workshop, Lecture Notes In Artificial Intelligence: Vol. 2777, pages 731-733. Springer Verlag, 2003.Google Scholar
A. Blum, M. Furst, J. Jackson, M. Kearns, Y. Mansour, and S. Rudich. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In STOC '94: Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, pages 253-262, 1994. Google ScholarDigital Library
A. Blum, L. Hellerstein, and N. Littlestone. Learning in the presence of finitely or infinitely many irrelevant attributes. J. Comput. Syst. Sci., 50(1):32-40, 1995. Google ScholarDigital Library
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.Google Scholar
L. Brynielsson. A short proof of the Xiao-Massey lemma. IEEE Transactions on Information Theory, 35(6):1344-1344, 1989.Google ScholarDigital Library
N. H. Bshouty and V. Feldman. On using extended statistical queries to avoid membership queries. J. Mach. Learn. Res., 2:359-395, 2002. Google ScholarDigital Library
N. H. Bshouty and L. Hellerstein. Attribute-efficient learning in query and mistake-bound models. J. Comput. Syst. Sci., 56(3):310-319, 1998. Google ScholarDigital Library
P. Camion, C. Carlet, P. Charpin, and N. Sendrier. On correlation-immune functions. In CRYPTO '91: Advances in Cryptology, pages 86-100. Springer-Verlag, 1991. Google ScholarDigital Library
P. Damaschke. Adaptive versus nonadaptive attribute-efficient learning. Machine Learning, 41(2): 197-215, 2000. Google ScholarDigital Library
O. V. Denisov. An asymptotic formula for the number of correlation-immune of order k Boolean functions. Discrete Math. Appls., 2(4):407-426, 1992.Google ScholarCross Ref
V. Feldman, P. Gopalan, S. Khot, and A. K. Ponnuswami. New results for learning noisy parities and halfspaces. In FOCS '06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 563-574, 2006. Google ScholarDigital Library
D. Fukagawa and T. Akutsu. Performance analysis of a greedy algorithm for inferring Boolean functions. Inf. Process. Lett., 93(1):7-12, 2005. doi: http://dx.doi.org/10.1016/j.ipl.2004.09.017. Google ScholarDigital Library
M. L. Furst, J. C. Jackson, and S. W. Smith. Improved learning of AC⁰functions. In COLT '91: Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 317-325, 1991. Google ScholarDigital Library
S.W. Golomb. On the classification of boolean functions. IRE Transactions on Information Theory, IT-5:176-186, 1959.Google ScholarCross Ref
S. W. Golomb. On the cryptanalysis of nonlinear sequences. In IMA - Cryptography and Coding '99, Lecture Notes In Computer Science: Vol. 1746, pages 236-242. Springer-Verlag, 1999.Google Scholar
D. Guijarro, J. Tarui, and T. Tsukiji. Finding relevant variables in PAC model with membership queries. In ALT '99: Proceedings of the 10th International Conference on Algorithmic Learning Theory, 1999. Google ScholarDigital Library
J. Jackson. On the efficiency of noise-tolerant PAC algorithms derived from statistical queries. Annals of Math. and Artificial Intell., 39(3):291-313, 2003. Google ScholarDigital Library
E. Lantz, S. Ray, and D. Page. Learning Bayesian network structure from correlation immune data. In UAI '07: Proceedings of the 23rd International Conference on Uncertainty in Artificial Intelligence, 2007.Google Scholar
Y. Mansour. Learning Boolean functions via the Fourier transform. Theoretical Advances in Neural Computation and Learning, pages 391-424, 1994.Google ScholarCross Ref
H. L. Montgomery. Ten Lectures on the Interface Between Analytic Number Theory and Harmonic Analysis. AMS, 1994.Google ScholarCross Ref
E. Mossel, R. O'Donnell, and R. A. Servedio. Learning juntas. In STOC '03: Proceedings of the 35th Annual Symposium on the Theory of Computing, pages 206-212, 2003. Google ScholarDigital Library
A. M. Odlyzko. The rise and fall of knapsack cryptosystems. In Cryptology and Computational Number Theory: Proceedings of Symposia in Applied Mathematics, volume 42, pages 79-88. AMS, 1980.Google Scholar
D. Page and S. Ray. Skewing: An efficient alternative to lookahead for decision tree induction. In IJCAI: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 601-612, 2003. Google ScholarDigital Library
E. M. Palmer, R. C. Read, and R. W. Robinson. Balancing the n-cube: a census of colorings. J. Algebraic Combin., 1:257-273, 1992. Google ScholarDigital Library
I. Pinelis. An approach to inequalities for the distributions of infinite-dimensional martingales. In R. M. Dudley, M. G. Hahn, and J. Kuelbs, editors, Probability in Banach Spaces, pages 128-134. Birkhauser, 1992.Google Scholar
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1997. Google ScholarDigital Library
S. Ray and D. Page. Sequential skewing: An improved skewing algorithm. In ICML '04: Proceedings of the 21st International Conference on Machine Learning, 2004. Google ScholarDigital Library
S. Ray and D. Page. Generalized skewing for functions with continuous and nominal variables. In ICML '05: Proceedings of the 22nd International Conference on Machine Learning, pages 705-712, 2005. Google ScholarDigital Library
S. Ray, E. Lantz, B. Rosell, L. Hellerstein, and D. Page. Learning correlation immune functions by skewing: An empirical evaluation. Unpublished manuscript, 2009.Google Scholar
B. Rosell, L. Hellerstein, S. Ray, and D. Page. Why skewing works: Learning difficult functions with greedy tree learners. In ICML '05: Proceedings of the 22nd International Conference on Machine Learning, pages 728-735, 2005. Google ScholarDigital Library
J. B. Rosser and L. Schoenfeld. Approximate formulas for some functions of prime numbers. Illinois J. Math., 6:64-94, 1962.Google Scholar
B. Roy. A brief outline of research on correlation immune functions. In Proceedings of the 7th Australian Conference on Information Security and Privacy, Lecture Notes In Computer Science: Vol. 2384, pages 379-384. Springer Verlag, 2002. Google ScholarDigital Library
T. Siegenthaler. Correlation-immunity of nonlinear combining functions for cryptographic applications. IEEE Transactions on Information Theory, IT-30(5):776-780, 1984.Google ScholarDigital Library
L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, 1984. Google ScholarDigital Library
G.-Z. Xiao and J. L. Massey. A spectral characterization of correlation-immune combining functions. IEEE Transactions on Information Theory, 34(3):569-570, 1988.Google ScholarDigital Library
K. Yang. On learning correlated boolean functions using statistical queries. In ALT '01: Proceedings of the 12th International Conference on Algorithmic Learning Theory, volume 2225 of Lecture Notes in Computer Science, pages 59-76. Springer, 2001. Google ScholarDigital Library
K. Yang. New lower bounds for statistical query learning. J. Comput. Syst. Sci., 70(4):485-509, 2005. Google ScholarDigital Library

Index Terms

Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions
1. Computing methodologies
  1. Machine learning
2. Theory of computation
  1. Design and analysis of algorithms

Index terms have been assigned to the content through auto-classification.

Recommendations

On constructions and nonlinearity of correlation immune functions
EUROCRYPT '93: Workshop on the theory and application of cryptographic techniques on Advances in cryptology

A Boolean function is said to be correlation immune if its output leaks no information about its input values. Such functions have many applications in computer security practices including the construction of key stream generators from a set of shift ...
Read More
Autocorrelation Properties of Correlation Immune Boolean Functions
INDOCRYPT '01: Proceedings of the Second International Conference on Cryptology in India: Progress in Cryptology

In this paper we study the autocorrelation values of correlation immune and resilient Boolean functions. We provide new lower bounds and related results on absolute indicator and sum of square indicator of autocorrelation values for low orders of ...
Read More
On Correlation Immune Boolean Functions With Minimum Hamming Weight Power of 2
The notion of correlation immune functions has been introduced by Siegenthaler (1984) in symmetric cryptography in the framework of stream ciphers. At the conference CRYPTO’91 by Camion et al., it has been pointed out that this notion existed in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

The Journal of Machine Learning Research Volume 10, Issue
12/1/2009
2936 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
JMLR.org
Publication History
- Published: 1 December 2009
Published in jmlr Volume 10, Issue
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 61
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

On constructions and nonlinearity of correlation immune functions

Autocorrelation Properties of Correlation Immune Boolean Functions

On Correlation Immune Boolean Functions With Minimum Hamming Weight Power of 2

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

On constructions and nonlinearity of correlation immune functions

Autocorrelation Properties of Correlation Immune Boolean Functions

On Correlation Immune Boolean Functions With Minimum Hamming Weight Power of 2

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media