research-article

Fast and Accurate Time-Series Clustering

Authors:
John Paparrizos

Columbia University, Amsterdam Avenue, New York

Columbia University, Amsterdam Avenue, New York
View Profile

,
Luis Gravano

Columbia University, Amsterdam Avenue, New York

Columbia University, Amsterdam Avenue, New York
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 42 Issue 2Article No.: 8pp 1–49https://doi.org/10.1145/3044711

Published:01 June 2017Publication History

ACM Transactions on Database Systems

Abstract

The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data-mining methods, not only due to its exploratory power but also because it is often a preprocessing step or subroutine for other techniques. In this article, we present k-Shape and k-MultiShapes (k-MS), two novel algorithms for time-series clustering. k-Shape and k-MS rely on a scalable iterative refinement procedure. As their distance measure, k-Shape and k-MS use shape-based distance (SBD), a normalized version of the cross-correlation measure, to consider the shapes of time series while comparing them. Based on the properties of SBD, we develop two new methods, namely ShapeExtraction (SE) and MultiShapesExtraction (MSE), to compute cluster centroids that are used in every iteration to update the assignment of time series to clusters. k-Shape relies on SE to compute a single centroid per cluster based on all time series in each cluster. In contrast, k-MS relies on MSE to compute multiple centroids per cluster to account for the proximity and spatial distribution of time series in each cluster. To demonstrate the robustness of SBD, k-Shape, and k-MS, we perform an extensive experimental evaluation on 85 datasets against state-of-the-art distance measures and clustering methods for time series using rigorous statistical analysis. SBD, our efficient and parameter-free distance measure, achieves similar accuracy to Dynamic Time Warping (DTW), a highly accurate but computationally expensive distance measure that requires parameter tuning. For clustering, we compare k-Shape and k-MS against scalable and non-scalable partitional, hierarchical, spectral, density-based, and shapelet-based methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable methods in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable approaches, with one exception, namely k-medoids with DTW, which achieves similar accuracy. However, unlike k-Shape, this approach requires tuning of its distance measure and is significantly slower than k-Shape. k-MS performs similarly to k-Shape in comparison to rival methods, but k-MS is significantly more accurate than k-Shape. Beyond clustering, we demonstrate the effectiveness of k-Shape to reduce the search space of one-nearest-neighbor classifiers for time series. Overall, SBD, k-Shape, and k-MS emerge as domain-independent, highly accurate, and efficient methods for time-series comparison and clustering with broad applications.

References

Rakesh Agrawal, Christos Faloutsos, and Arun N. Swami. 1993. Efficient similarity search in sequence databases. In FODO. 69--84. Google ScholarDigital Library
Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. 2009. NP-hardness of euclidean sum-of-squares clustering. Mach. Learn. 75, 2 (2009), 245--248. Google ScholarDigital Library
Jonathan Alon, Stan Sclaroff, George Kollios, and Vladimir Pavlovic. 2003. Discovering clusters in motion time-series data. In CVPR. 375--381. Google ScholarDigital Library
Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 31, 3 (2017), 606--660. Google ScholarDigital Library
Anthony Bagnall and Jason Lines. 2014. An experimental evaluation of nearest neighbour time series classification. arXiv Preprint arXiv:1406.4757 (2014).Google Scholar
Anthony J. Bagnall and Gareth J. Janacek. 2004. Clustering time series from ARMA models with clipped data. In KDD. 49--58. Google ScholarDigital Library
Ziv Bar-Joseph, Georg Gerber, David K. Gifford, Tommi S. Jaakkola, and Itamar Simon. 2002. A new approach to analyzing gene expression time series data. In RECOMB. 39--48. Google ScholarDigital Library
Roberto Baragona. 2000. Genetic algorithms and cross-correlation clustering of time series. (2000). Accessed on October 2015 from http://citeseer.ist.pse.edu/baragona00genetic.html.Google Scholar
Gustavo E. A. P. A. Batista, Eamonn J. Keogh, Oben Moses Tataw, and Vinícius M. A. de Souza. 2013. CID: An efficient complexity-invariant distance for time series. Data Mining and Knowledge Discovery 28 (2013), 634--669. Google ScholarDigital Library
Nurjahan Begum, Liudmila Ulanova, Jun Wang, and Eamonn Keogh. 2015. Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In SIGKDD. 49--58. Google ScholarDigital Library
Donald J. Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the AAAI Workshop on Knowledge Discovery and Data Mining. 359--370. Google ScholarDigital Library
James C. Bezdek. 2013. Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science.Google Scholar
Yuhan Cai and Raymond Ng. 2004. Indexing spatio-temporal trajectories with Chebyshev polynomials. In SIGMOD. 599--610. Google ScholarDigital Library
B. B. Chaudhuri. 1996. A new definition of neighborhood of a point in multi-dimensional space. Pattern Recogn. Lett. 17, 1 (1996), 11--17. Google ScholarDigital Library
Lei Chen and Raymond Ng. 2004. On the marriage of lp-norms and edit distance. In VLDB. 792--803. Google ScholarDigital Library
Lei Chen, M Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity search for moving object trajectories. In SIGMOD. 491--502. Google ScholarDigital Library
Qiuxia Chen, Lei Chen, Xiang Lian, Yunhao Liu, and Jeffrey Xu Yu. 2007a. Indexable PLA for efficient similarity search. In VLDB. 435--446. Google ScholarDigital Library
Yueguo Chen, Mario A. Nascimento, Beng Chin Ooi, and Anthony K. H. Tung. 2007b. Spade: On shape-based pattern detection in streaming time series. In ICDE. 786--795.Google Scholar
James W. Cooley and John W. Tukey. 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comp. 19, 90 (1965), 297--301.Google ScholarCross Ref
Gautam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, and Padhraic Smyth. 1998. Rule discovery from time series. In KDD. 16--22. Google ScholarDigital Library
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 (2006), 1--30. Google ScholarDigital Library
Evgenia Dimitriadou, Andreas Weingessel, and Kurt Hornik. 2002. A combination scheme for fuzzy clustering. Int. J. Pattern Recogn. Artif. Intell. 16, 7 (2002), 901--912.Google ScholarCross Ref
Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB 1, 2 (2008), 1542--1552. Google ScholarDigital Library
Rui Ding, Qiang Wang, Yingnong Dang, Qiang Fu, Haidong Zhang, and Dongmei Zhang. 2015. Yading: Fast clustering of large-scale time series data. Proc. VLDB 8, 5 (2015), 473--484. Google ScholarDigital Library
Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In SIGKDD. 226--231. Google ScholarDigital Library
Christos Faloutsos, M. Ranganathan, and Yannis Manolopoulos. 1994. Fast subsequence matching in time-series databases. In SIGMOD. 419--429. Google ScholarDigital Library
Maurizio Filippone, Francesco Camastra, Francesco Masulli, and Stefano Rovetta. 2008. A survey of kernel and spectral methods for clustering. Pattern Recogn. 41, 1 (2008), 176--190. Google ScholarDigital Library
Elias Frentzos, Kostas Gratsias, and Yannis Theodoridis. 2007. Index-based most similar trajectory search. In ICDE. 816--825.Google Scholar
Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Statist. Assoc. 32 (1937), 675--701.Google ScholarCross Ref
Matteo Frigo and Steven G. Johnson. 2005. The design and implementation of FFTW3. Proc. IEEE 93, 2 (2005), 216--231.Google ScholarCross Ref
Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, and Rajeev Motwani. 2000. Mining the stock market: Which measure is best? In KDD. 487--496. Google ScholarDigital Library
Rafael Giusti and Gustavo E. A. P. A. Batista. 2013. An Empirical Comparison of Dissimilarity Measures for Time Series Classification. In BRACIS. 82--88. Google ScholarDigital Library
Steve Goddard, Sherri K. Harms, Stephen E. Reichenbach, Tsegaye Tadesse, and William J. Waltman. 2003. Geospatial decision support for drought risk management. Commun. ACM 46, 1 (2003), 35--37. Google ScholarDigital Library
Xavier Golay, Spyros Kollias, Gautier Stoll, Dieter Meier, Anton Valavanis, and Peter Boesiger. 1998. A new correlation-based fuzzy logic clustering algorithm for fMRI. Magn. Reson. Med. 40, 2 (1998), 249--260.Google ScholarCross Ref
Dina Q. Goldin and Paris C. Kanellakis. 1995. On similarity queries for time-series data: Constraint specification and implementation. In CP. 137--153. Google ScholarDigital Library
Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.Google Scholar
Jianping Gou, Zhang Yi, Lan Du, and Taisong Xiong. 2012. A local mean-based k-nearest centroid neighbor classifier. Comput. J. 55, 9 (2012), 1058--1071. Google ScholarDigital Library
Cyril Goutte, Peter Toft, Egill Rostrup, Finn Å Nielsen, and Lars Kai Hansen. 1999. On clustering fMRI time series. NeuroImage 9, 3 (1999), 298--310.Google ScholarCross Ref
Lalit Gupta, Dennis L. Molfese, Ravi Tammana, and Panagiotis G. Simos. 1996. Nonlinear alignment and averaging for estimating the evoked potential. IEEE Trans. Biomed. Eng. 43, 4 (1996), 348--356.Google ScholarCross Ref
Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On clustering validation techniques. J. Intell. Inf. Syst. 17, 2--3 (2001), 107--145. Google ScholarDigital Library
Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
Pierre Hansen and Brigitte Jaumard. 1997. Cluster analysis and mathematical programming. Math. Program. 79, 1--3 (1997), 191--215.Google ScholarCross Ref
Rie Honda, Shuai Wang, Tokio Kikuchi, and Osamu Konishi. 2002. Mining of moving objects from time-series images and its application to satellite weather imagery. J. Intell. Inf. Syst. 19, 1 (2002), 79--93. Google ScholarDigital Library
Frank Hőppner and Frank Klawonn. 2009. Compensation of translational displacement in time series clustering using cross correlation. In Advances in Intelligent Data Analysis VIII. Springer, 71--82. Google ScholarDigital Library
Bing Hu, Yanping Chen, and Eamonn Keogh. 2013. Time series classification under more realistic assumptions. In SDM. 578--586.Google Scholar
Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta. 2001. Distance measures for effective clustering of ARIMA time-series. In ICDM. 273--280. Google ScholarDigital Library
Yitzhak Katznelson. 2004. An Introduction to Harmonic Analysis. Cambridge University Press.Google Scholar
Leonard Kaufman and Peter J. Rousseeuw. 2009. Finding Groups in Data: An Introduction to Cluster Analysis. Vol. 344. John Wiley 8 Sons.Google Scholar
Eamonn Keogh. 2006. A decade of progress in indexing and mining large time series databases. In VLDB. 1268--1268. Google ScholarDigital Library
Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra. 2001. Locally adaptive dimensionality reduction for indexing large time series databases. In SIGMOD. 151--162. Google ScholarDigital Library
Eamonn Keogh and Jessica Lin. 2005. Clustering of time-series subsequences is meaningless: Implications for previous and future research. Knowl. Inf. Syst. 8, 2 (2005), 154--177. Google ScholarDigital Library
Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7, 3 (2005), 358--386. Google ScholarCross Ref
Eamonn Keogh, Xiaopeng Xi, Li Wei, and Chotirat Ann Ratanamahatana. The ucr time series classification/clustering homepage. Accessed October 2015 from www.cs.ucr.edu/&sim;eamonn/time_series_data.Google Scholar
Chan Kin-pong and Fu Ada. 1999. Efficient time series matching by wavelets. In ICDE. 126--133. Google ScholarDigital Library
Flip Korn, H. V. Jagadish, and Christos Faloutsos. 1997. Efficiently supporting ad hoc queries in large datasets of time sequences. In SIGMOD. 289--300. Google ScholarDigital Library
Chung-Sheng Li, Philip S. Yu, and Vittorio Castelli. 1998. MALM: A framework for mining sequence database at multiple abstraction levels. In CIKM. 267--272. Google ScholarDigital Library
Xiang Lian, Lei Chen, Jeffrey Xu Yu, Guoren Wang, and Ge Yu. 2007. Similarity match over high speed time-series streams. In ICDE. 1086--1095.Google Scholar
Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos. 2004. Iterative incremental clustering of time series. In EDBT. 106--122.Google Scholar
Jason Lines and Anthony Bagnall. 2014. Ensembles of elastic distance measures for time series classification. In SDM. 524--532.Google Scholar
Jason Lines and Anthony Bagnall. 2015. Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Discov. 29, 3 (2015), 565--592. Google ScholarDigital Library
James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In BSMSP. 281--297.Google Scholar
Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. 2009. The planar k-means problem is NP-hard. In WALCOM. 274--285. Google ScholarDigital Library
Rosario N. Mantegna. 1999. Hierarchical structure in financial markets. Eur. Phys. J. B. 11, 1 (1999), 193--197.Google ScholarCross Ref
Warissara Meesrikamolkul, Vit Niennattrakul, and Chotirat Ann Ratanamahatana. 2012. Shape-Based clustering for time series data. In PAKDD. 530--541. Google ScholarDigital Library
Vasileios Megalooikonomou, Qiang Wang, Guo Li, and Christos Faloutsos. 2005. A multiresolution symbolic representation of time series. In ICDE. 668--679. Google ScholarDigital Library
Yoshihiro Mitani and Yoshihiko Hamamoto. 2000. Classifier design based on the use of nearest neighbor samples. In ICPR, Vol. 2. 769--772.Google Scholar
Yoshihiro Mitani and Yoshihiko Hamamoto. 2006. A local mean-based nonparametric classifier. Pattern Recogn. Lett. 27, 10 (2006), 1151--1159. Google ScholarDigital Library
Michael D. Morse and Jignesh M. Patel. 2007. An efficient and accurate method for evaluating time series similarity. In SIGMOD. 569--580. Google ScholarDigital Library
Abdullah Mueen, Hossein Hamooni, and Trilce Estrada. 2014. Time series join on subsequence correlation. In ICDM. 450--459. Google ScholarDigital Library
Abdullah Mueen, Eamonn Keogh, and Neal Young. 2011. Logical-shapelets: An expressive primitive for time series classification. In KDD. 1154--1162. Google ScholarDigital Library
Peter Nemenyi. 1963. Distribution-free Multiple Comparisons. Ph.D. Dissertation. Princeton University.Google Scholar
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. In NIPS. 849--856. Google ScholarDigital Library
Vit Niennattrakul and Chotirat Ann Ratanamahatana. 2009. Shape averaging under time warping. In ECTI-CON. 626--629.Google Scholar
Tim Oates. 1999. Identifying distinctive subsequences in multivariate time series by clustering. In KDD. 322--326. Google ScholarDigital Library
Spiros Papadimitriou, Jimeng Sun, and Philip S. Yu. 2006. Local correlation tracking in time series. In ICDM. 456--465. Google ScholarDigital Library
Panagiotis Papapetrou, Vassilis Athitsos, Michalis Potamias, George Kollios, and Dimitrios Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. TODS 36, 3 (2011), 17. Google ScholarDigital Library
John Paparrizos and Luis Gravano. 2015. k-Shape: Efficient and accurate clustering of time series. In SIGMOD. 1855--1870. Google ScholarDigital Library
John Paparrizos and Luis Gravano. 2016. k-Shape: Efficient and accurate clustering of time series. ACM SIGMOD Rec. 45, 1 (2016), 69--76. Google ScholarDigital Library
Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A fast, scalable, in-memory time series database. Proc. VLDB 8, 12 (2015), 1816--1827. Google ScholarDigital Library
François Petitjean, Germain Forestier, Geoffrey I. Webb, Ann E. Nicholson, Yanping Chen, and Eamonn Keogh. 2014. Dynamic time warping averaging of time series allows faster and more accurate classification. In ICDM. 470--479. Google ScholarDigital Library
François Petitjean, Germain Forestier, Geoffrey I. Webb, Ann E. Nicholson, Yanping Chen, and Eamonn Keogh. 2015. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowl. Inf. Syst. 47 (2015), 1--26. Google ScholarDigital Library
François Petitjean and Pierre Gançarski. 2012. Summarizing a set of time series by averaging: From steiner sequence to compact multiple alignment. Theor. Comput. Sci. 414, 1 (2012), 76--91. Google ScholarDigital Library
François Petitjean, Alain Ketterlin, and Pierre Gançarski. 2011. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 44, 3 (2011), 678--693. Google ScholarDigital Library
Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD. 262--270. Google ScholarDigital Library
Thanawin Rakthanmanon, Eamonn J. Keogh, Stefano Lonardi, and Scott Evans. 2011. Time series epenthesis: Clustering time series streams requires ignoring some data. In ICDM. 547--556. Google ScholarDigital Library
William M. Rand. 1971. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 336 (1971), 846--850.Google ScholarCross Ref
Chotirat Ann Ratanamahatana and Eamonn Keogh. 2004. Making time-series classification more accurate using learned constraints. In SDM. 11--22.Google Scholar
John Rice. 2006. Mathematical Statistics and Data Analysis. Cengage Learning.Google Scholar
Alex Rodriguez and Alessandro Laio. 2014. Clustering by fast search and find of density peaks. Science 344, 6191 (2014), 1492--1496.Google Scholar
Eduardo J. Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis, and Alejandro Jaimes. 2012. Correlating financial time series with micro-blogging activity. In WSDM. 513--522. Google ScholarDigital Library
Sartaj Sahni and Teofilo Gonzalez. 1976. P-complete approximation problems. J. ACM 23, 3 (1976), 555--565. Google ScholarDigital Library
Naoki Saito. 1994. Local Feature Extraction and Its Applications Using a Library of Bases. Ph.D. Dissertation. Yale University. Google ScholarDigital Library
Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sign. Process. 26, 1 (1978), 43--49.Google ScholarCross Ref
Yasushi Sakurai, Spiros Papadimitriou, and Christos Faloutsos. 2005. Braid: Stream mining through group lag correlations. In SIGMOD. 599--610. Google ScholarDigital Library
Stan Salvador and Philip Chan. 2004. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In ICTAI. 576--584. Google ScholarDigital Library
José Salvador Sánchez, Filiberto Pla, and Francesc J. Ferri. 1997. On the use of neighbourhood-based non-parametric classifiers. Pattern Recogn. Lett. 18, 11 (1997), 1179--1186. Google ScholarDigital Library
Patrick Schäfer. 2015. Scalable time series classification. Data Min. Knowl. Discov. 30 (2015), 1273--1298. Google ScholarDigital Library
Yutao Shou, Nikos Mamoulis, and David Cheung. 2005. Fast and exact warping of time series using adaptive segmental approximations. Machine Learning 58, 2--3 (2005), 231--267. Google ScholarDigital Library
Diego F. Silva, Gustavo E. A. P. A. Batista, and Eamonn Keogh. 2016. Prefix and suffix invariant dynamic time warping. In ICDM. IEEE, 1209--1214.Google Scholar
Antoniu Stefan, Vassilis Athitsos, and Goutam Das. 2013. The move-split-merge metric for time series. IEEE Trans. Knowl. Data Eng. 25, 6 (2013), 1425--1438. Google ScholarDigital Library
Kuniaki Uehara and Mitsuomi Shimada. 2002. Extraction of primitive motion and discovery of association rules from human motion data. In Progress in Discovery Science. 338--348. Google ScholarDigital Library
Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, and Eamonn Keogh. 2006. Indexing multidimensional time-series. VLDB J. 15, 1 (2006), 1--20. Google ScholarDigital Library
Michail Vlachos, George Kollios, and Dimitrios Gunopulos. 2002. Discovering similar multidimensional trajectories. In ICDE. 673--684. Google ScholarDigital Library
Hao Wang, Yong-fu Cai, Yin Yang, Shiming Zhang, and Nikos Mamoulis. 2014. Durable queries over historical time series. IEEE Trans. Knowl. Data Eng. 26, 3 (2014), 595--607. Google ScholarDigital Library
Lusheng Wang and Tao Jiang. 1994. On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 4 (1994), 337--348.Google ScholarCross Ref
Xiaoyue Wang, Abdullah Mueen, Hui Ding, Goce Trajcevski, Peter Scheuermann, and Eamonn Keogh. 2013. Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26, 2 (2013), 275--309. Google ScholarDigital Library
Xiaozhe Wang, Kate Smith, and Rob Hyndman. 2006. Characteristic-based clustering for time series data. Data Min. Knowl. Discov. 13, 3 (2006), 335--364. Google ScholarDigital Library
T. Warren Liao. 2005. Clustering of time series data—A survey. Pattern Recogn. 38, 11 (2005), 1857--1874. Google ScholarDigital Library
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometr. Bull. (1945), 80--83.Google Scholar
D. Randall Wilson and Tony R. Martinez. 1997. Instance pruning techniques. In ICML. Vol. 97. 403--411. Google ScholarDigital Library
D. Randall Wilson and Tony R. Martinez. 2000. Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 3 (2000), 257--286. Google ScholarDigital Library
Di Wu, Yiping Ke, Jeffrey Xu Yu, S. Yu Philip, and Lei Chen. 2010. Detecting leaders from correlated time series. In DASFAA. 352--367. Google ScholarDigital Library
Yimin Xiong and Dit-Yan Yeung. 2002. Mixtures of ARMA models for model-based time series clustering. In ICDM. 717--720. Google ScholarDigital Library
Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In WSDM. 177--186. Google ScholarDigital Library
Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: A new primitive for data mining. In KDD. 947--956. Google ScholarDigital Library
Jamaluddin Zakaria, Abdullah Mueen, and Eamonn Keogh. 2012. Clustering time series using unsupervised-shapelets. In ICDM. 785--794. Google ScholarDigital Library
Yunyue Zhu and Dennis Shasha. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB. 358--369. Google ScholarDigital Library

Index Terms

Fast and Accurate Time-Series Clustering
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering
      2. Nearest-neighbor search
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Cluster analysis
      2. Time series analysis

Recommendations

Clustering data with measurement errors

Traditional clustering methods assume that there is no measurement error, or uncertainty, associated with data. Often, however, real world applications require treatment of data that have such errors. In the presence of measurement errors, well-known ...
Read More
In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

Recent papers have claimed that the result of K-means clustering for time series subsequences (STS clustering) is independent of the time series that created it. Our paper revisits this claim. In particular, we consider the following question: Given ...
Read More
Fast accurate fuzzy clustering through data reduction

Clustering is a useful approach in image segmentation, data mining, and other pattern recognition problems for which unlabeled data exist. Fuzzy clustering using fuzzy c-means or variants of it can provide a data partition that is both better and more ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Database Systems Volume 42, Issue 2
Invited Paper from SIGMOD 2015, Invited Paper from PODS 2015 and Regular Papers
June 2017
251 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/3086510
Editor:
Christian S. Jensen
Aalborg University, Denmark
Issue’s Table of Contents
Copyright © 2017 ACM
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2017
- Accepted: 1 January 2017
- Revised: 1 December 2016
- Received: 1 March 2016
Published in tods Volume 42, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Time-series clustering
distance measures
time-series classification
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 131
  Total Citations
  View Citations
- 2,926
  Total Downloads
- Downloads (Last 12 months)310
- Downloads (Last 6 weeks)50
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast and Accurate Time-Series Clustering

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Clustering data with measurement errors

In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure

Fast accurate fuzzy clustering through data reduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fast and Accurate Time-Series Clustering

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Clustering data with measurement errors

In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure

Fast accurate fuzzy clustering through data reduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media