Abstract
The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data-mining methods, not only due to its exploratory power but also because it is often a preprocessing step or subroutine for other techniques. In this article, we present k-Shape and k-MultiShapes (k-MS), two novel algorithms for time-series clustering. k-Shape and k-MS rely on a scalable iterative refinement procedure. As their distance measure, k-Shape and k-MS use shape-based distance (SBD), a normalized version of the cross-correlation measure, to consider the shapes of time series while comparing them. Based on the properties of SBD, we develop two new methods, namely ShapeExtraction (SE) and MultiShapesExtraction (MSE), to compute cluster centroids that are used in every iteration to update the assignment of time series to clusters. k-Shape relies on SE to compute a single centroid per cluster based on all time series in each cluster. In contrast, k-MS relies on MSE to compute multiple centroids per cluster to account for the proximity and spatial distribution of time series in each cluster. To demonstrate the robustness of SBD, k-Shape, and k-MS, we perform an extensive experimental evaluation on 85 datasets against state-of-the-art distance measures and clustering methods for time series using rigorous statistical analysis. SBD, our efficient and parameter-free distance measure, achieves similar accuracy to Dynamic Time Warping (DTW), a highly accurate but computationally expensive distance measure that requires parameter tuning. For clustering, we compare k-Shape and k-MS against scalable and non-scalable partitional, hierarchical, spectral, density-based, and shapelet-based methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable methods in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable approaches, with one exception, namely k-medoids with DTW, which achieves similar accuracy. However, unlike k-Shape, this approach requires tuning of its distance measure and is significantly slower than k-Shape. k-MS performs similarly to k-Shape in comparison to rival methods, but k-MS is significantly more accurate than k-Shape. Beyond clustering, we demonstrate the effectiveness of k-Shape to reduce the search space of one-nearest-neighbor classifiers for time series. Overall, SBD, k-Shape, and k-MS emerge as domain-independent, highly accurate, and efficient methods for time-series comparison and clustering with broad applications.
- Rakesh Agrawal, Christos Faloutsos, and Arun N. Swami. 1993. Efficient similarity search in sequence databases. In FODO. 69--84. Google ScholarDigital Library
- Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. 2009. NP-hardness of euclidean sum-of-squares clustering. Mach. Learn. 75, 2 (2009), 245--248. Google ScholarDigital Library
- Jonathan Alon, Stan Sclaroff, George Kollios, and Vladimir Pavlovic. 2003. Discovering clusters in motion time-series data. In CVPR. 375--381. Google ScholarDigital Library
- Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 31, 3 (2017), 606--660. Google ScholarDigital Library
- Anthony Bagnall and Jason Lines. 2014. An experimental evaluation of nearest neighbour time series classification. arXiv Preprint arXiv:1406.4757 (2014).Google Scholar
- Anthony J. Bagnall and Gareth J. Janacek. 2004. Clustering time series from ARMA models with clipped data. In KDD. 49--58. Google ScholarDigital Library
- Ziv Bar-Joseph, Georg Gerber, David K. Gifford, Tommi S. Jaakkola, and Itamar Simon. 2002. A new approach to analyzing gene expression time series data. In RECOMB. 39--48. Google ScholarDigital Library
- Roberto Baragona. 2000. Genetic algorithms and cross-correlation clustering of time series. (2000). Accessed on October 2015 from http://citeseer.ist.pse.edu/baragona00genetic.html.Google Scholar
- Gustavo E. A. P. A. Batista, Eamonn J. Keogh, Oben Moses Tataw, and Vinícius M. A. de Souza. 2013. CID: An efficient complexity-invariant distance for time series. Data Mining and Knowledge Discovery 28 (2013), 634--669. Google ScholarDigital Library
- Nurjahan Begum, Liudmila Ulanova, Jun Wang, and Eamonn Keogh. 2015. Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In SIGKDD. 49--58. Google ScholarDigital Library
- Donald J. Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the AAAI Workshop on Knowledge Discovery and Data Mining. 359--370. Google ScholarDigital Library
- James C. Bezdek. 2013. Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science.Google Scholar
- Yuhan Cai and Raymond Ng. 2004. Indexing spatio-temporal trajectories with Chebyshev polynomials. In SIGMOD. 599--610. Google ScholarDigital Library
- B. B. Chaudhuri. 1996. A new definition of neighborhood of a point in multi-dimensional space. Pattern Recogn. Lett. 17, 1 (1996), 11--17. Google ScholarDigital Library
- Lei Chen and Raymond Ng. 2004. On the marriage of lp-norms and edit distance. In VLDB. 792--803. Google ScholarDigital Library
- Lei Chen, M Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity search for moving object trajectories. In SIGMOD. 491--502. Google ScholarDigital Library
- Qiuxia Chen, Lei Chen, Xiang Lian, Yunhao Liu, and Jeffrey Xu Yu. 2007a. Indexable PLA for efficient similarity search. In VLDB. 435--446. Google ScholarDigital Library
- Yueguo Chen, Mario A. Nascimento, Beng Chin Ooi, and Anthony K. H. Tung. 2007b. Spade: On shape-based pattern detection in streaming time series. In ICDE. 786--795.Google Scholar
- James W. Cooley and John W. Tukey. 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comp. 19, 90 (1965), 297--301.Google ScholarCross Ref
- Gautam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, and Padhraic Smyth. 1998. Rule discovery from time series. In KDD. 16--22. Google ScholarDigital Library
- Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 (2006), 1--30. Google ScholarDigital Library
- Evgenia Dimitriadou, Andreas Weingessel, and Kurt Hornik. 2002. A combination scheme for fuzzy clustering. Int. J. Pattern Recogn. Artif. Intell. 16, 7 (2002), 901--912.Google ScholarCross Ref
- Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB 1, 2 (2008), 1542--1552. Google ScholarDigital Library
- Rui Ding, Qiang Wang, Yingnong Dang, Qiang Fu, Haidong Zhang, and Dongmei Zhang. 2015. Yading: Fast clustering of large-scale time series data. Proc. VLDB 8, 5 (2015), 473--484. Google ScholarDigital Library
- Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In SIGKDD. 226--231. Google ScholarDigital Library
- Christos Faloutsos, M. Ranganathan, and Yannis Manolopoulos. 1994. Fast subsequence matching in time-series databases. In SIGMOD. 419--429. Google ScholarDigital Library
- Maurizio Filippone, Francesco Camastra, Francesco Masulli, and Stefano Rovetta. 2008. A survey of kernel and spectral methods for clustering. Pattern Recogn. 41, 1 (2008), 176--190. Google ScholarDigital Library
- Elias Frentzos, Kostas Gratsias, and Yannis Theodoridis. 2007. Index-based most similar trajectory search. In ICDE. 816--825.Google Scholar
- Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Statist. Assoc. 32 (1937), 675--701.Google ScholarCross Ref
- Matteo Frigo and Steven G. Johnson. 2005. The design and implementation of FFTW3. Proc. IEEE 93, 2 (2005), 216--231.Google ScholarCross Ref
- Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, and Rajeev Motwani. 2000. Mining the stock market: Which measure is best? In KDD. 487--496. Google ScholarDigital Library
- Rafael Giusti and Gustavo E. A. P. A. Batista. 2013. An Empirical Comparison of Dissimilarity Measures for Time Series Classification. In BRACIS. 82--88. Google ScholarDigital Library
- Steve Goddard, Sherri K. Harms, Stephen E. Reichenbach, Tsegaye Tadesse, and William J. Waltman. 2003. Geospatial decision support for drought risk management. Commun. ACM 46, 1 (2003), 35--37. Google ScholarDigital Library
- Xavier Golay, Spyros Kollias, Gautier Stoll, Dieter Meier, Anton Valavanis, and Peter Boesiger. 1998. A new correlation-based fuzzy logic clustering algorithm for fMRI. Magn. Reson. Med. 40, 2 (1998), 249--260.Google ScholarCross Ref
- Dina Q. Goldin and Paris C. Kanellakis. 1995. On similarity queries for time-series data: Constraint specification and implementation. In CP. 137--153. Google ScholarDigital Library
- Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.Google Scholar
- Jianping Gou, Zhang Yi, Lan Du, and Taisong Xiong. 2012. A local mean-based k-nearest centroid neighbor classifier. Comput. J. 55, 9 (2012), 1058--1071. Google ScholarDigital Library
- Cyril Goutte, Peter Toft, Egill Rostrup, Finn Å Nielsen, and Lars Kai Hansen. 1999. On clustering fMRI time series. NeuroImage 9, 3 (1999), 298--310.Google ScholarCross Ref
- Lalit Gupta, Dennis L. Molfese, Ravi Tammana, and Panagiotis G. Simos. 1996. Nonlinear alignment and averaging for estimating the evoked potential. IEEE Trans. Biomed. Eng. 43, 4 (1996), 348--356.Google ScholarCross Ref
- Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On clustering validation techniques. J. Intell. Inf. Syst. 17, 2--3 (2001), 107--145. Google ScholarDigital Library
- Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- Pierre Hansen and Brigitte Jaumard. 1997. Cluster analysis and mathematical programming. Math. Program. 79, 1--3 (1997), 191--215.Google ScholarCross Ref
- Rie Honda, Shuai Wang, Tokio Kikuchi, and Osamu Konishi. 2002. Mining of moving objects from time-series images and its application to satellite weather imagery. J. Intell. Inf. Syst. 19, 1 (2002), 79--93. Google ScholarDigital Library
- Frank Hőppner and Frank Klawonn. 2009. Compensation of translational displacement in time series clustering using cross correlation. In Advances in Intelligent Data Analysis VIII. Springer, 71--82. Google ScholarDigital Library
- Bing Hu, Yanping Chen, and Eamonn Keogh. 2013. Time series classification under more realistic assumptions. In SDM. 578--586.Google Scholar
- Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta. 2001. Distance measures for effective clustering of ARIMA time-series. In ICDM. 273--280. Google ScholarDigital Library
- Yitzhak Katznelson. 2004. An Introduction to Harmonic Analysis. Cambridge University Press.Google Scholar
- Leonard Kaufman and Peter J. Rousseeuw. 2009. Finding Groups in Data: An Introduction to Cluster Analysis. Vol. 344. John Wiley 8 Sons.Google Scholar
- Eamonn Keogh. 2006. A decade of progress in indexing and mining large time series databases. In VLDB. 1268--1268. Google ScholarDigital Library
- Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra. 2001. Locally adaptive dimensionality reduction for indexing large time series databases. In SIGMOD. 151--162. Google ScholarDigital Library
- Eamonn Keogh and Jessica Lin. 2005. Clustering of time-series subsequences is meaningless: Implications for previous and future research. Knowl. Inf. Syst. 8, 2 (2005), 154--177. Google ScholarDigital Library
- Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7, 3 (2005), 358--386. Google ScholarCross Ref
- Eamonn Keogh, Xiaopeng Xi, Li Wei, and Chotirat Ann Ratanamahatana. The ucr time series classification/clustering homepage. Accessed October 2015 from www.cs.ucr.edu/∼eamonn/time_series_data.Google Scholar
- Chan Kin-pong and Fu Ada. 1999. Efficient time series matching by wavelets. In ICDE. 126--133. Google ScholarDigital Library
- Flip Korn, H. V. Jagadish, and Christos Faloutsos. 1997. Efficiently supporting ad hoc queries in large datasets of time sequences. In SIGMOD. 289--300. Google ScholarDigital Library
- Chung-Sheng Li, Philip S. Yu, and Vittorio Castelli. 1998. MALM: A framework for mining sequence database at multiple abstraction levels. In CIKM. 267--272. Google ScholarDigital Library
- Xiang Lian, Lei Chen, Jeffrey Xu Yu, Guoren Wang, and Ge Yu. 2007. Similarity match over high speed time-series streams. In ICDE. 1086--1095.Google Scholar
- Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos. 2004. Iterative incremental clustering of time series. In EDBT. 106--122.Google Scholar
- Jason Lines and Anthony Bagnall. 2014. Ensembles of elastic distance measures for time series classification. In SDM. 524--532.Google Scholar
- Jason Lines and Anthony Bagnall. 2015. Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Discov. 29, 3 (2015), 565--592. Google ScholarDigital Library
- James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In BSMSP. 281--297.Google Scholar
- Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. 2009. The planar k-means problem is NP-hard. In WALCOM. 274--285. Google ScholarDigital Library
- Rosario N. Mantegna. 1999. Hierarchical structure in financial markets. Eur. Phys. J. B. 11, 1 (1999), 193--197.Google ScholarCross Ref
- Warissara Meesrikamolkul, Vit Niennattrakul, and Chotirat Ann Ratanamahatana. 2012. Shape-Based clustering for time series data. In PAKDD. 530--541. Google ScholarDigital Library
- Vasileios Megalooikonomou, Qiang Wang, Guo Li, and Christos Faloutsos. 2005. A multiresolution symbolic representation of time series. In ICDE. 668--679. Google ScholarDigital Library
- Yoshihiro Mitani and Yoshihiko Hamamoto. 2000. Classifier design based on the use of nearest neighbor samples. In ICPR, Vol. 2. 769--772.Google Scholar
- Yoshihiro Mitani and Yoshihiko Hamamoto. 2006. A local mean-based nonparametric classifier. Pattern Recogn. Lett. 27, 10 (2006), 1151--1159. Google ScholarDigital Library
- Michael D. Morse and Jignesh M. Patel. 2007. An efficient and accurate method for evaluating time series similarity. In SIGMOD. 569--580. Google ScholarDigital Library
- Abdullah Mueen, Hossein Hamooni, and Trilce Estrada. 2014. Time series join on subsequence correlation. In ICDM. 450--459. Google ScholarDigital Library
- Abdullah Mueen, Eamonn Keogh, and Neal Young. 2011. Logical-shapelets: An expressive primitive for time series classification. In KDD. 1154--1162. Google ScholarDigital Library
- Peter Nemenyi. 1963. Distribution-free Multiple Comparisons. Ph.D. Dissertation. Princeton University.Google Scholar
- Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. In NIPS. 849--856. Google ScholarDigital Library
- Vit Niennattrakul and Chotirat Ann Ratanamahatana. 2009. Shape averaging under time warping. In ECTI-CON. 626--629.Google Scholar
- Tim Oates. 1999. Identifying distinctive subsequences in multivariate time series by clustering. In KDD. 322--326. Google ScholarDigital Library
- Spiros Papadimitriou, Jimeng Sun, and Philip S. Yu. 2006. Local correlation tracking in time series. In ICDM. 456--465. Google ScholarDigital Library
- Panagiotis Papapetrou, Vassilis Athitsos, Michalis Potamias, George Kollios, and Dimitrios Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. TODS 36, 3 (2011), 17. Google ScholarDigital Library
- John Paparrizos and Luis Gravano. 2015. k-Shape: Efficient and accurate clustering of time series. In SIGMOD. 1855--1870. Google ScholarDigital Library
- John Paparrizos and Luis Gravano. 2016. k-Shape: Efficient and accurate clustering of time series. ACM SIGMOD Rec. 45, 1 (2016), 69--76. Google ScholarDigital Library
- Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A fast, scalable, in-memory time series database. Proc. VLDB 8, 12 (2015), 1816--1827. Google ScholarDigital Library
- François Petitjean, Germain Forestier, Geoffrey I. Webb, Ann E. Nicholson, Yanping Chen, and Eamonn Keogh. 2014. Dynamic time warping averaging of time series allows faster and more accurate classification. In ICDM. 470--479. Google ScholarDigital Library
- François Petitjean, Germain Forestier, Geoffrey I. Webb, Ann E. Nicholson, Yanping Chen, and Eamonn Keogh. 2015. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowl. Inf. Syst. 47 (2015), 1--26. Google ScholarDigital Library
- François Petitjean and Pierre Gançarski. 2012. Summarizing a set of time series by averaging: From steiner sequence to compact multiple alignment. Theor. Comput. Sci. 414, 1 (2012), 76--91. Google ScholarDigital Library
- François Petitjean, Alain Ketterlin, and Pierre Gançarski. 2011. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 44, 3 (2011), 678--693. Google ScholarDigital Library
- Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD. 262--270. Google ScholarDigital Library
- Thanawin Rakthanmanon, Eamonn J. Keogh, Stefano Lonardi, and Scott Evans. 2011. Time series epenthesis: Clustering time series streams requires ignoring some data. In ICDM. 547--556. Google ScholarDigital Library
- William M. Rand. 1971. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 336 (1971), 846--850.Google ScholarCross Ref
- Chotirat Ann Ratanamahatana and Eamonn Keogh. 2004. Making time-series classification more accurate using learned constraints. In SDM. 11--22.Google Scholar
- John Rice. 2006. Mathematical Statistics and Data Analysis. Cengage Learning.Google Scholar
- Alex Rodriguez and Alessandro Laio. 2014. Clustering by fast search and find of density peaks. Science 344, 6191 (2014), 1492--1496.Google Scholar
- Eduardo J. Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis, and Alejandro Jaimes. 2012. Correlating financial time series with micro-blogging activity. In WSDM. 513--522. Google ScholarDigital Library
- Sartaj Sahni and Teofilo Gonzalez. 1976. P-complete approximation problems. J. ACM 23, 3 (1976), 555--565. Google ScholarDigital Library
- Naoki Saito. 1994. Local Feature Extraction and Its Applications Using a Library of Bases. Ph.D. Dissertation. Yale University. Google ScholarDigital Library
- Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sign. Process. 26, 1 (1978), 43--49.Google ScholarCross Ref
- Yasushi Sakurai, Spiros Papadimitriou, and Christos Faloutsos. 2005. Braid: Stream mining through group lag correlations. In SIGMOD. 599--610. Google ScholarDigital Library
- Stan Salvador and Philip Chan. 2004. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In ICTAI. 576--584. Google ScholarDigital Library
- José Salvador Sánchez, Filiberto Pla, and Francesc J. Ferri. 1997. On the use of neighbourhood-based non-parametric classifiers. Pattern Recogn. Lett. 18, 11 (1997), 1179--1186. Google ScholarDigital Library
- Patrick Schäfer. 2015. Scalable time series classification. Data Min. Knowl. Discov. 30 (2015), 1273--1298. Google ScholarDigital Library
- Yutao Shou, Nikos Mamoulis, and David Cheung. 2005. Fast and exact warping of time series using adaptive segmental approximations. Machine Learning 58, 2--3 (2005), 231--267. Google ScholarDigital Library
- Diego F. Silva, Gustavo E. A. P. A. Batista, and Eamonn Keogh. 2016. Prefix and suffix invariant dynamic time warping. In ICDM. IEEE, 1209--1214.Google Scholar
- Antoniu Stefan, Vassilis Athitsos, and Goutam Das. 2013. The move-split-merge metric for time series. IEEE Trans. Knowl. Data Eng. 25, 6 (2013), 1425--1438. Google ScholarDigital Library
- Kuniaki Uehara and Mitsuomi Shimada. 2002. Extraction of primitive motion and discovery of association rules from human motion data. In Progress in Discovery Science. 338--348. Google ScholarDigital Library
- Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, and Eamonn Keogh. 2006. Indexing multidimensional time-series. VLDB J. 15, 1 (2006), 1--20. Google ScholarDigital Library
- Michail Vlachos, George Kollios, and Dimitrios Gunopulos. 2002. Discovering similar multidimensional trajectories. In ICDE. 673--684. Google ScholarDigital Library
- Hao Wang, Yong-fu Cai, Yin Yang, Shiming Zhang, and Nikos Mamoulis. 2014. Durable queries over historical time series. IEEE Trans. Knowl. Data Eng. 26, 3 (2014), 595--607. Google ScholarDigital Library
- Lusheng Wang and Tao Jiang. 1994. On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 4 (1994), 337--348.Google ScholarCross Ref
- Xiaoyue Wang, Abdullah Mueen, Hui Ding, Goce Trajcevski, Peter Scheuermann, and Eamonn Keogh. 2013. Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26, 2 (2013), 275--309. Google ScholarDigital Library
- Xiaozhe Wang, Kate Smith, and Rob Hyndman. 2006. Characteristic-based clustering for time series data. Data Min. Knowl. Discov. 13, 3 (2006), 335--364. Google ScholarDigital Library
- T. Warren Liao. 2005. Clustering of time series data—A survey. Pattern Recogn. 38, 11 (2005), 1857--1874. Google ScholarDigital Library
- Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometr. Bull. (1945), 80--83.Google Scholar
- D. Randall Wilson and Tony R. Martinez. 1997. Instance pruning techniques. In ICML. Vol. 97. 403--411. Google ScholarDigital Library
- D. Randall Wilson and Tony R. Martinez. 2000. Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 3 (2000), 257--286. Google ScholarDigital Library
- Di Wu, Yiping Ke, Jeffrey Xu Yu, S. Yu Philip, and Lei Chen. 2010. Detecting leaders from correlated time series. In DASFAA. 352--367. Google ScholarDigital Library
- Yimin Xiong and Dit-Yan Yeung. 2002. Mixtures of ARMA models for model-based time series clustering. In ICDM. 717--720. Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In WSDM. 177--186. Google ScholarDigital Library
- Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: A new primitive for data mining. In KDD. 947--956. Google ScholarDigital Library
- Jamaluddin Zakaria, Abdullah Mueen, and Eamonn Keogh. 2012. Clustering time series using unsupervised-shapelets. In ICDM. 785--794. Google ScholarDigital Library
- Yunyue Zhu and Dennis Shasha. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB. 358--369. Google ScholarDigital Library
Index Terms
- Fast and Accurate Time-Series Clustering
Recommendations
Clustering data with measurement errors
Traditional clustering methods assume that there is no measurement error, or uncertainty, associated with data. Often, however, real world applications require treatment of data that have such errors. In the presence of measurement errors, well-known ...
In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementRecent papers have claimed that the result of K-means clustering for time series subsequences (STS clustering) is independent of the time series that created it. Our paper revisits this claim. In particular, we consider the following question: Given ...
Fast accurate fuzzy clustering through data reduction
Clustering is a useful approach in image segmentation, data mining, and other pattern recognition problems for which unlabeled data exist. Fuzzy clustering using fuzzy c-means or variants of it can provide a data partition that is both better and more ...
Comments