Article

Free Access

Mining high-speed data streams

Authors:
Pedro Domingos

Dept. of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA

Dept. of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA
View Profile

,
Geoff Hulten

Dept. of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA

Dept. of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA
View Profile

KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2000Pages 71–80https://doi.org/10.1145/347090.347107

Published:01 August 2000Publication History

KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 71–80

References

1.L. Breiman, J. H. Ftiedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, CA, 1984.Google Scholar
2.J. Catett. Megainduction: Machine Learning on Very Large Databases. PhD thesis, Basset Department of Computer Science, University of Sydney, Sydney, AustrMia, 1991.Google Scholar
3.T. G. Dietterich. Overfitting and undercomputing in machine learning. Computing Sueys, 27:326 327, 1995. Google ScholarDigital Library
4.M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In Proceedings of the Twenty-Fourth fnterrtational Conference on Very Large Data Bases, pages 323 333, New York, NY, 1998. Morgan Kaufmann. Google ScholarDigital Library
5.J. Gehrke, V. Ganti, R. Ramarishnan, and W.-L. Loh. BOAT: optimistic decision tree construction. In Proceedings of the 1999 ACM SIGMOD Interrtational Confer'ence on Management of Data, pages 169 180, Philattelphia, PA, 1999. ACM Press. Google ScholarDigital Library
6.J. Gratch. Sequential inductive learning. In Proceedings of the Thireeenth National Conference on Artificial fntelligence, pages 779 786, Portland, OR, 1996. AAAI Press. Google ScholarDigital Library
7.W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13 30, 1963.Google Scholar
8.N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285 318, 1997. Google ScholarDigital Library
9.O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural fnformation Processing Systems 6. Morgan Kaufmann, San Mateo, CA, 1994.Google Scholar
10.M. Mehta, A. AgrawM, and J. Rissanen. SLIQ: A fast scalable classifier for data mining. In Proceedings of the Fifth fnterrtational Conference on Extending Database Technology, pages 18 32, Avignon, France, 1996. Springer. Google ScholarDigital Library
11.R. G. Miller, Jr. Simultaneous Statistical fnference. Springer, New York, NY, 2nd edition, 1981.Google Scholar
12.A. W. Moore and M. S. Lee. Efficient algorithms for minimizing cross validation error. In Proceedings of the Eleventh fnterrtational Conference on Machine Learning, pages 190 198, New Brunswick, NO, 1994. Morgan Kaufmann.Google ScholarDigital Library
13.R. Musick, J. Catlett, and S. Russell. Decision theoretic subsampling for induction on large databases. In Proceedings of the Tenth fnterrtational Conference on Machine Learning, pages 212 219, Amherst, MA, 1993. Morgan Kauflnann.Google ScholarCross Ref
14.F. Provost, D. Jensen, and T. Oates. Efficient progressive sampling. In Proceedings of the Fifth A CM SIGKDD fnterrtational Conference on Knowledge Discovery and Data Mining, pages 23 32, San Diego, CA, 1999. ACM Press. Google ScholarDigital Library
15.J. R. Quinlan. C.5: Programs for Machine Learning. Morgan Kauflnann, San Mateo, CA, 1993. Google ScholarDigital Library
16.J. R. Quinlan and R. M. Cameron-Jones. Oversearching and layered search in empirical learning. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1019 1024, Montreal, Canada, 1995. Morgan Kaufmann. Google ScholarDigital Library
17.J. C. Sharer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In Proceedings of the Twenty-Second Interrtational Conference on Very Large Databases, pages 544 555, Mumbai, India, 1996. Morgan Kaufmann. Google ScholarDigital Library
18.P. Smyth and D. Wolpert. Anytime exploratory data anMysis for massive data sets. In Proceedings of the Third Interrtational Conference on Knowledge Discovery and Data Mining, pages 5&60, Newport Beach, CA, 1997. AAAI Press.Google Scholar
19.H. Toivonen. Sampling large databases for association rules. In Proceedings of the Twenty-Second fnterrtational Conference on Very Large Data Bases, pages 134 145, Mumbai, India, 1996. Morgan Kauflnann. Google ScholarDigital Library
20.P. E. Utgoff. Incremental induction of decision trees. Machine Learning, 4:161 186, 1989. Google ScholarDigital Library
21.P. E. Utgoff. An improved algorithm for incremental induction of decision trees. In Proceedings of the Eleventh International Conference on Machine Learning, pages 318 325, New Brunswick, NJ, 1994. Morgan Kaufmann.Google ScholarDigital Library
22.G. L Webb. OPUS: An efiqcient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 3:431 465, 1995. Google ScholarDigital Library
23.A. Wolman, G. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. KaHin, and H. Levy. Organization-based analysis of Web-object sharing and caching. In Proceedings of the Second USENIX Conference on Interrtet Technologies and Systems, pages 25-36, Boulder, CO, 1999. Google ScholarDigital Library

Index Terms

Mining high-speed data streams

Recommendations

Accurate decision trees for mining high-speed data streams
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining ...
Read More
Mining time-changing data streams
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months ...
Read More
Decision Trees for Mining Data Streams Based on the McDiarmid's Bound

In mining data streams the most popular tool is the Hoeffding tree algorithm. It uses the Hoeffding's bound to determine the smallest number of examples needed at a node to select a splitting attribute. In the literature the same Hoeffding's bound was ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2000
537 pages
ISBN:1581132336
DOI:10.1145/347090
Chairmen:
Raghu Ramakrishnan
Univ. of Wisconsin
,
Sal Stolfo
Columbia Univ., New York, NY
,
Roberto Bayardo
IBM Almaden Research Center, San Jose, CA
,
Ismail Parsa
Epsilon
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hoeffding bounds
decision trees
disk-based algorithms
incremental learning
subsampling
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,421
  Total Citations
  View Citations
- 6,429
  Total Downloads
- Downloads (Last 12 months)764
- Downloads (Last 6 weeks)88
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining high-speed data streams

KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining

References

Cited By

Index Terms

Recommendations

Accurate decision trees for mining high-speed data streams

Mining time-changing data streams

Decision Trees for Mining Data Streams Based on the McDiarmid's Bound

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Mining high-speed data streams

KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining

References

Cited By

Index Terms

Recommendations

Accurate decision trees for mining high-speed data streams

Mining time-changing data streams

Decision Trees for Mining Data Streams Based on the McDiarmid's Bound

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media