skip to main content
10.5555/1273808.1273815dlproceedingsArticle/Chapter ViewAbstractPublication PagesausdmConference Proceedingsconference-collections
Article
Free Access

Analysis of breast feeding data using data mining methods

Authors Info & Claims
Published:01 November 2006Publication History

ABSTRACT

The purpose of this study is to demonstrate the benefit of using common data mining techniques on survey data where statistical analysis is routinely applied. The statistical survey is commonly used to collect quantitative information about an item in a population. Statistical analysis is usually carried out on survey data to test hypothesis. We report in this paper an application of data mining methodologies to breast feeding survey data which have been conducted and analysed by statisticians. The purpose of the research is to study the factors leading to deciding whether or not to breast feed a new born baby. Various data mining methods are applied to the data. Feature or variable selection is conducted to select the most discriminative and least redundant features using an information theory based method and a statistical approach. Decision tree and regression approaches are tested on classification tasks using features selected. Risk pattern mining method is also applied to identify groups with high risk of not breast feeding. The success of data mining in this study suggests that using data mining approaches will be applicable to other similar survey data. The data mining methods, which enable a search for hypotheses, may be used as a complementary survey data analysis tool to traditional statistical analysis.

References

  1. Chen, J., He, H., Li, J., Jin, H., McAullay, D., Williams, G., Sparks, R. & Kelman, C. (2005), Representing association classification rules mined from health data, in Proceedings of 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (KES2005), Melbourne, Australia, pp. 1225--1231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Cover, T. M. & Thomas., J. A. (1991), Elements of Information Theory, Wiley-Interscience. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Fleiss, J. L. (1981), Statistical Methods for Rates and Proportions, Wiley.Google ScholarGoogle Scholar
  4. Fleuret, F. (2004), 'Fast binary feature selection with conditional mutual information', Journal of Machine Learning Research 5, 1531--1555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gu, L., Li, J., He, H., Williams, G., Hawkins, S. & Kelman, C. (2003), Association rule discovery with unbalanced class, in Proceedings of the 16th Australian Joint Conference on Artificial Intelligence (AI03), Lecture Notes in Artificial Intelligence, Perth, Western Australia, pp. 221--232.Google ScholarGoogle Scholar
  6. He, H., Jin, H. & Chen, J. (2005), Automatic feature selection for classification of health data, in Proceedings of The 18th Australian Joint Conference on Artificial Intelligence (AI2005), Sydney, Australia, pp. 910--913. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hegney, D., Fallon, T., O'Brien, M., Plank, A., Doolan, J., Brodribb, W., Hennessy, J., Laurent, K. & Baker, S. (2003), The Toowoomba Infant Feeding Support Service Project: Report on Phase 1 A Longitudinal Needs Analysis of Breastfeeding Behaviours and Supports in the Toowoomba Region.Google ScholarGoogle Scholar
  8. Jin, H., Chen, J., Kelman, C., He, H., McAullay, D. & O'Keefe, C. M. (2006), Mining unexpected associations for signalling potential adverse drug reactions from administrative health databases, in PAKDD'06, pp. 867--876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jin, H.-D., Shum, W., Leung, K.-S. & Wong, M.-L. (2004), 'Expanding self-organizing map for data visualization and cluster analysis', Information Sciences 163, 157--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jin, H., Wong, M.-L. & Leung, K.-S. (2005), 'Scalable model-based clustering for large databases based on data summarization', IEEE Transactions on Pattern Analysis and Machine Intelligence 27(11), 1710--1719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kohavi, R. & John, G. (1997), 'Wrappers for feature selection', Artificial Intelligence pp. 273--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kramer, M. S. & Kakuma, R. (2003), Optimal duration of exclusive breastfeeding, The Cochrane Library.Google ScholarGoogle Scholar
  13. Li, J., Fu, A. W.-C., He, H., Chen, J., Jin, H., McAullay, D., Williams, G., Sparks, R. & Kelman, C. (2005), Mining risk patterns in medical data, in Proceedings of KDD'05, pp. 770--775. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. McAullay, D., Williams, G., Chen, J., Jin, H., He, H., Sparks, R. & Kelman, C. (2005), A delivery framework for health data mining and analytics, in V. Estivill-Castro, ed., Twenty-Eighth Australasian Computer Science Conference (ACSC2005), Vol. 38 of CRPIT, ACS, Newcastle, Australia, pp. 381--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Quinlan, J. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Riodan, J. M. (1997), 'Commentary. the cost of not breastfeeding: a commentary.', Journal of Human Lactation 13(2), 93--97.Google ScholarGoogle ScholarCross RefCross Ref
  17. Shannon., C. E. (1948), 'A mathematical theory of communication', Bell System Technical Journal 27, 379--423, 623--656.Google ScholarGoogle ScholarCross RefCross Ref
  18. Smith, J. (2001), Mothers milk, money and markets, Ann Congress Perinatal Society Australia and New Zealand.Google ScholarGoogle Scholar
  19. Smith, J. P., Thompson, J. F. & Ellwood, D. A. (2002), 'Hospital system costs of artificial infant feeding: Estimates for the australian capital territory', Australian and New Zealand Journal of Public Health 26(6), 543--551.Google ScholarGoogle ScholarCross RefCross Ref
  20. Wang, G., Lochovsky, F. H. & Yang, Q. (2004), Feature selection with conditional mutual information maxmin in text categorization, in Proceedings of CIKM'04, Washington, US, pp. 8--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. WHO (2001), The optimal duration of exclusive breastfeeding, World Health Organization.Google ScholarGoogle Scholar
  22. Yang, Y. & Pedersen, J. O. (1997), A comparative study on feature selection in text categorization, in Proceedings of International Conference on Machine Learning, Nashville, TN, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yu, L. & Liu, H. (2004), Redundancy based feature selection for microarray data, in Proceedings of KDD'04, ACM Press, New York, NY, USA, pp. 737--742. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analysis of breast feeding data using data mining methods
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader