Article

Free Access

Analysis of breast feeding data using data mining methods

Authors:
Hongxing He

CSIRO Mathematical and Information Sciences, Canberra ACT, Australia

CSIRO Mathematical and Information Sciences, Canberra ACT, Australia
View Profile

,
Huidong Jin

CSIRO Mathematical and Information Sciences, Canberra ACT, Australia and National ICT Australia (NICTA), Canberra Lab, Canberra, Australia

CSIRO Mathematical and Information Sciences, Canberra ACT, Australia and National ICT Australia (NICTA), Canberra Lab, Canberra, Australia
View Profile

,
Jie Chen

CSIRO Mathematical and Information Sciences, Canberra ACT, Australia

CSIRO Mathematical and Information Sciences, Canberra ACT, Australia
View Profile

,
Damien McAullay

CSIRO Mathematical and Information Sciences, Canberra ACT, Australia

CSIRO Mathematical and Information Sciences, Canberra ACT, Australia
View Profile

,
Jiuyong Li

University of Southern Queensland, Toowoomba QLD, Australia

University of Southern Queensland, Toowoomba QLD, Australia
View Profile

,
Tony Fallon

University of Southern Queensland, Toowoomba QLD, Australia

University of Southern Queensland, Toowoomba QLD, Australia
View Profile

AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61November 2006Pages 47–52

Published:01 November 2006Publication History

AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61

Pages 47–52

ABSTRACT

The purpose of this study is to demonstrate the benefit of using common data mining techniques on survey data where statistical analysis is routinely applied. The statistical survey is commonly used to collect quantitative information about an item in a population. Statistical analysis is usually carried out on survey data to test hypothesis. We report in this paper an application of data mining methodologies to breast feeding survey data which have been conducted and analysed by statisticians. The purpose of the research is to study the factors leading to deciding whether or not to breast feed a new born baby. Various data mining methods are applied to the data. Feature or variable selection is conducted to select the most discriminative and least redundant features using an information theory based method and a statistical approach. Decision tree and regression approaches are tested on classification tasks using features selected. Risk pattern mining method is also applied to identify groups with high risk of not breast feeding. The success of data mining in this study suggests that using data mining approaches will be applicable to other similar survey data. The data mining methods, which enable a search for hypotheses, may be used as a complementary survey data analysis tool to traditional statistical analysis.

References

Chen, J., He, H., Li, J., Jin, H., McAullay, D., Williams, G., Sparks, R. & Kelman, C. (2005), Representing association classification rules mined from health data, in Proceedings of 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (KES2005), Melbourne, Australia, pp. 1225--1231. Google ScholarDigital Library
Cover, T. M. & Thomas., J. A. (1991), Elements of Information Theory, Wiley-Interscience. Google ScholarDigital Library
Fleiss, J. L. (1981), Statistical Methods for Rates and Proportions, Wiley.Google Scholar
Fleuret, F. (2004), 'Fast binary feature selection with conditional mutual information', Journal of Machine Learning Research 5, 1531--1555. Google ScholarDigital Library
Gu, L., Li, J., He, H., Williams, G., Hawkins, S. & Kelman, C. (2003), Association rule discovery with unbalanced class, in Proceedings of the 16th Australian Joint Conference on Artificial Intelligence (AI03), Lecture Notes in Artificial Intelligence, Perth, Western Australia, pp. 221--232.Google Scholar
He, H., Jin, H. & Chen, J. (2005), Automatic feature selection for classification of health data, in Proceedings of The 18th Australian Joint Conference on Artificial Intelligence (AI2005), Sydney, Australia, pp. 910--913. Google ScholarDigital Library
Hegney, D., Fallon, T., O'Brien, M., Plank, A., Doolan, J., Brodribb, W., Hennessy, J., Laurent, K. & Baker, S. (2003), The Toowoomba Infant Feeding Support Service Project: Report on Phase 1 A Longitudinal Needs Analysis of Breastfeeding Behaviours and Supports in the Toowoomba Region.Google Scholar
Jin, H., Chen, J., Kelman, C., He, H., McAullay, D. & O'Keefe, C. M. (2006), Mining unexpected associations for signalling potential adverse drug reactions from administrative health databases, in PAKDD'06, pp. 867--876. Google ScholarDigital Library
Jin, H.-D., Shum, W., Leung, K.-S. & Wong, M.-L. (2004), 'Expanding self-organizing map for data visualization and cluster analysis', Information Sciences 163, 157--173. Google ScholarDigital Library
Jin, H., Wong, M.-L. & Leung, K.-S. (2005), 'Scalable model-based clustering for large databases based on data summarization', IEEE Transactions on Pattern Analysis and Machine Intelligence 27(11), 1710--1719. Google ScholarDigital Library
Kohavi, R. & John, G. (1997), 'Wrappers for feature selection', Artificial Intelligence pp. 273--324. Google ScholarDigital Library
Kramer, M. S. & Kakuma, R. (2003), Optimal duration of exclusive breastfeeding, The Cochrane Library.Google Scholar
Li, J., Fu, A. W.-C., He, H., Chen, J., Jin, H., McAullay, D., Williams, G., Sparks, R. & Kelman, C. (2005), Mining risk patterns in medical data, in Proceedings of KDD'05, pp. 770--775. Google ScholarDigital Library
McAullay, D., Williams, G., Chen, J., Jin, H., He, H., Sparks, R. & Kelman, C. (2005), A delivery framework for health data mining and analytics, in V. Estivill-Castro, ed., Twenty-Eighth Australasian Computer Science Conference (ACSC2005), Vol. 38 of CRPIT, ACS, Newcastle, Australia, pp. 381--390. Google ScholarDigital Library
Quinlan, J. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann. Google ScholarDigital Library
Riodan, J. M. (1997), 'Commentary. the cost of not breastfeeding: a commentary.', Journal of Human Lactation 13(2), 93--97.Google ScholarCross Ref
Shannon., C. E. (1948), 'A mathematical theory of communication', Bell System Technical Journal 27, 379--423, 623--656.Google ScholarCross Ref
Smith, J. (2001), Mothers milk, money and markets, Ann Congress Perinatal Society Australia and New Zealand.Google Scholar
Smith, J. P., Thompson, J. F. & Ellwood, D. A. (2002), 'Hospital system costs of artificial infant feeding: Estimates for the australian capital territory', Australian and New Zealand Journal of Public Health 26(6), 543--551.Google ScholarCross Ref
Wang, G., Lochovsky, F. H. & Yang, Q. (2004), Feature selection with conditional mutual information maxmin in text categorization, in Proceedings of CIKM'04, Washington, US, pp. 8--13. Google ScholarDigital Library
WHO (2001), The optimal duration of exclusive breastfeeding, World Health Organization.Google Scholar
Yang, Y. & Pedersen, J. O. (1997), A comparative study on feature selection in text categorization, in Proceedings of International Conference on Machine Learning, Nashville, TN, USA. Google ScholarDigital Library
Yu, L. & Liu, H. (2004), Redundancy based feature selection for microarray data, in Proceedings of KDD'04, ACM Press, New York, NY, USA, pp. 737--742. Google ScholarDigital Library

Index Terms

Analysis of breast feeding data using data mining methods

Index terms have been assigned to the content through auto-classification.

Recommendations

Mining top-k frequent closed itemsets over data streams using the sliding window model

Association rule mining is an important research topic in the data mining community. There are two difficulties occurring in mining association rules. First, the user must specify a minimum support for mining. Typically it may require tuning the value ...
Read More
Mining top-k regular-frequent itemsets using database partitioning and support estimation

Temporal regularity of itemset appearance can be regarded as an important criterion for measuring the interestingness of itemsets in several applications. A frequent itemset can be said to be regular-frequent in a database if it appears at a regular ...
Read More
Image mining using association rules derived from feature matrix
ICAC3 '09: Proceedings of the International Conference on Advances in Computing, Communication and Control

Association rule mining is a very important research topic in the field of data mining. Discovering frequent itemsets is the key process in association rule mining. Traditional association rule algorithms adopt an iterative method to discovery, which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
November 2006
216 pages
ISBN:1920682414
Editors:
Peter Christen,
Paul J. Kennedy,
Jiuyong Li,
Simeon J. Simoff,
Graham J. Williams
Sponsors
In-Cooperation
Publisher
Australian Computer Society, Inc.
Australia
Publication History
- Published: 1 November 2006
Author Tags
association rule
classification
data mining
features selection
survey data
Qualifiers
- Article
Conference

Acceptance Rates
AusDM '06 Paper Acceptance Rate25of58submissions,43%Overall Acceptance Rate98of232submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 756
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Analysis of breast feeding data using data mining methods

AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining top-k frequent closed itemsets over data streams using the sliding window model

Mining top-k regular-frequent itemsets using database partitioning and support estimation

Image mining using association rules derived from feature matrix

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Analysis of breast feeding data using data mining methods

AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining top-k frequent closed itemsets over data streams using the sliding window model

Mining top-k regular-frequent itemsets using database partitioning and support estimation

Image mining using association rules derived from feature matrix

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media