skip to main content
10.1145/1089827.1089837acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Interruptible anytime algorithms for iterative improvement of decision trees

Published:21 August 2005Publication History

ABSTRACT

Finding a minimal decision tree consistent with the examples is an NP-complete problem. Therefore, most of the existing algorithms for decision tree induction use a greedy approach based on local heuristics. These algorithms usually require a fixed small amount of time and result in trees that are not globally optimal. Recently, the LSID3 contract anytime algorithm was introduced to allow using extra resources for building better decision trees. A contract anytime algorithm needs to get its resource allocation a priori. In many cases, however, the time allocation is not known in advance, disallowing the use of contract algorithms. To overcome this problem, in this work we present two interruptible anytime algorithms for inducing decision trees. Interruptible anytime algorithms do not require their resource allocation in advance and thus must be ready to be interrupted and return a valid solution at any moment. The first interruptible algorithm we propose is based on a general technique for converting a contract algorithm to an interruptible one by sequencing. The second is an iterative improvement algorithm that repeatedly selects a subtree whose reconstruction is estimated to yield the highest marginal utility and rebuilds it with higher resource allocation. Empirical evaluation shows a good anytime behavior for both algorithms. The iterative improvement algorithm shows smoother performance profiles which allow more refined control.

References

  1. C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998.]]Google ScholarGoogle Scholar
  2. A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Occam's Razor. Information Processing Letters, 24(6):377--380, 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Boddy and T. L. Dean. Deliberation scheduling for problem solving in time constrained environments. Artificial Intelligence, 67(2):245--285, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. R. Bouckaert. Choosing between two learning algorithms based on calibrated tests. In ICML'03, pages 51--58, Washington, DC, USA, 2003.]]Google ScholarGoogle Scholar
  5. L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.]] Google ScholarGoogle ScholarCross RefCross Ref
  6. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.]]Google ScholarGoogle Scholar
  7. C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. W. Craven and J. W. Shavlik. Extracting Comprehensible Models from Trained Neural Networks. PhD thesis, Computer Science department, University of Wisconsin, Madison, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Esmeir and S. Markovitch. Lookahead-based algorithms for anytime induction of decision trees. In ICML'04, pages 257--264, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Fan, H. Wang, P. S. Yu, and S. Ma. Is random model better? on its accuracy and efficiency. In ICDM'03, pages 51--58, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Freund and L. Mason. The alternating decision tree learning algorithm. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 124--133, Bled, Slovenia, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Gabrilovich and S. Markovitch. Text categorization with many redundant features: Using aggressive feature selection to make svms competitive with c4.5. In ICML'04, pages 321--328, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: data mining, inference, and prediction. New York: Springer-Verlag, 2001.]]Google ScholarGoogle Scholar
  14. E. Hovitz. Computation and Action under Bounded Resources. PhD thesis, Computer Science Department, Stanford University, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Huang, J. Lu, and C. Ling. Comparing naive bayes, decision trees, and svm using accuracy and auc. In ICDM'03, Melbourne, FL, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15--17, 1976.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. M. Last, A. Kandel, O. Maimon, and E. Eberbach. Anytime algorithm for feature selection. In RSCTC'01, pages 532--539. Springer-Verlag, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. J. Lizotte, O. Madani, and R. Greiner. Budgeted learning of naive bayes classifiers. In UAI'03, Acapulco, Mexico, 2003.]]Google ScholarGoogle Scholar
  19. O. J. Murphy and R. L. McCraw. Designing storage efficient decision trees. IEEE Transactions on Computers, 40(3):315--320, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. W. Opitz. An Anytime Approach to Connectionist Theory Refinement: Refining the Topologies of Knowledge-Based Neural Networks. PhD thesis, Department of Computer Sciences, University of Wisconsin-Madison, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Papagelis and D. Kalles. Breeding decision trees using evolutionary techniques. In ICML'01, pages 393--400, San Francisco, CA, USA, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81--106, 1986.]] Google ScholarGoogle ScholarCross RefCross Ref
  23. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. J. Russell and E. Wefald. Principles of metareasoning. In KR'89, pages 400--411, San Mateo, California, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. J. Russell and S. Zilberstein. Composing real-time systems. In IJCAI'91, pages 212--217, Sydney, Australia, 1991.]]Google ScholarGoogle Scholar
  26. S. J. Russell and S. Zilberstein. Optimal composition of real-time systems. Artificial Intelligence, 82(1-2):181--213, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Schapire. A brief introduction to boosting. In IJCAI'99, pages 1401--1406, Stockholm, Sweden, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Shan, E. Milios, A. Roger, C. Blouin, and E. Susko. Automatic recognition of regions of intrinsically poor multiple alignment using machine learning. In CSB'03, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. W. Shavlik, R. J. Mooney, and G. G. Towell. Symbolic and neural learning algorithms: An experimental comparison. Machine Learning, 6(2):111--143, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. E. Utgoff. Incremental induction of decision trees. Machine Learning, 4(2):161--186, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. E. Utgoff, N. C. Berkman, and J. A. Clouse. Decision tree induction based on efficient tree restructuring. Machine Learning, 29(1):5--44, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X. Zhu, S. Sun, S. E. Cheng, and M. Bern. Classification of protein crystallization imagery. In EMBS'04, San Francisco, CA, 2004.]]Google ScholarGoogle Scholar

Index Terms

  1. Interruptible anytime algorithms for iterative improvement of decision trees

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        UBDM '05: Proceedings of the 1st international workshop on Utility-based data mining
        August 2005
        104 pages
        ISBN:1595932089
        DOI:10.1145/1089827

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 August 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader