skip to main content
research-article

Fast and Accurate Time-Series Clustering

Published:01 June 2017Publication History
Skip Abstract Section

Abstract

The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data-mining methods, not only due to its exploratory power but also because it is often a preprocessing step or subroutine for other techniques. In this article, we present k-Shape and k-MultiShapes (k-MS), two novel algorithms for time-series clustering. k-Shape and k-MS rely on a scalable iterative refinement procedure. As their distance measure, k-Shape and k-MS use shape-based distance (SBD), a normalized version of the cross-correlation measure, to consider the shapes of time series while comparing them. Based on the properties of SBD, we develop two new methods, namely ShapeExtraction (SE) and MultiShapesExtraction (MSE), to compute cluster centroids that are used in every iteration to update the assignment of time series to clusters. k-Shape relies on SE to compute a single centroid per cluster based on all time series in each cluster. In contrast, k-MS relies on MSE to compute multiple centroids per cluster to account for the proximity and spatial distribution of time series in each cluster. To demonstrate the robustness of SBD, k-Shape, and k-MS, we perform an extensive experimental evaluation on 85 datasets against state-of-the-art distance measures and clustering methods for time series using rigorous statistical analysis. SBD, our efficient and parameter-free distance measure, achieves similar accuracy to Dynamic Time Warping (DTW), a highly accurate but computationally expensive distance measure that requires parameter tuning. For clustering, we compare k-Shape and k-MS against scalable and non-scalable partitional, hierarchical, spectral, density-based, and shapelet-based methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable methods in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable approaches, with one exception, namely k-medoids with DTW, which achieves similar accuracy. However, unlike k-Shape, this approach requires tuning of its distance measure and is significantly slower than k-Shape. k-MS performs similarly to k-Shape in comparison to rival methods, but k-MS is significantly more accurate than k-Shape. Beyond clustering, we demonstrate the effectiveness of k-Shape to reduce the search space of one-nearest-neighbor classifiers for time series. Overall, SBD, k-Shape, and k-MS emerge as domain-independent, highly accurate, and efficient methods for time-series comparison and clustering with broad applications.

References

  1. Rakesh Agrawal, Christos Faloutsos, and Arun N. Swami. 1993. Efficient similarity search in sequence databases. In FODO. 69--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. 2009. NP-hardness of euclidean sum-of-squares clustering. Mach. Learn. 75, 2 (2009), 245--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jonathan Alon, Stan Sclaroff, George Kollios, and Vladimir Pavlovic. 2003. Discovering clusters in motion time-series data. In CVPR. 375--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 31, 3 (2017), 606--660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Anthony Bagnall and Jason Lines. 2014. An experimental evaluation of nearest neighbour time series classification. arXiv Preprint arXiv:1406.4757 (2014).Google ScholarGoogle Scholar
  6. Anthony J. Bagnall and Gareth J. Janacek. 2004. Clustering time series from ARMA models with clipped data. In KDD. 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ziv Bar-Joseph, Georg Gerber, David K. Gifford, Tommi S. Jaakkola, and Itamar Simon. 2002. A new approach to analyzing gene expression time series data. In RECOMB. 39--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Roberto Baragona. 2000. Genetic algorithms and cross-correlation clustering of time series. (2000). Accessed on October 2015 from http://citeseer.ist.pse.edu/baragona00genetic.html.Google ScholarGoogle Scholar
  9. Gustavo E. A. P. A. Batista, Eamonn J. Keogh, Oben Moses Tataw, and Vinícius M. A. de Souza. 2013. CID: An efficient complexity-invariant distance for time series. Data Mining and Knowledge Discovery 28 (2013), 634--669. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nurjahan Begum, Liudmila Ulanova, Jun Wang, and Eamonn Keogh. 2015. Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In SIGKDD. 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Donald J. Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the AAAI Workshop on Knowledge Discovery and Data Mining. 359--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. James C. Bezdek. 2013. Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science.Google ScholarGoogle Scholar
  13. Yuhan Cai and Raymond Ng. 2004. Indexing spatio-temporal trajectories with Chebyshev polynomials. In SIGMOD. 599--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. B. Chaudhuri. 1996. A new definition of neighborhood of a point in multi-dimensional space. Pattern Recogn. Lett. 17, 1 (1996), 11--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lei Chen and Raymond Ng. 2004. On the marriage of lp-norms and edit distance. In VLDB. 792--803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lei Chen, M Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity search for moving object trajectories. In SIGMOD. 491--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Qiuxia Chen, Lei Chen, Xiang Lian, Yunhao Liu, and Jeffrey Xu Yu. 2007a. Indexable PLA for efficient similarity search. In VLDB. 435--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yueguo Chen, Mario A. Nascimento, Beng Chin Ooi, and Anthony K. H. Tung. 2007b. Spade: On shape-based pattern detection in streaming time series. In ICDE. 786--795.Google ScholarGoogle Scholar
  19. James W. Cooley and John W. Tukey. 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comp. 19, 90 (1965), 297--301.Google ScholarGoogle ScholarCross RefCross Ref
  20. Gautam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, and Padhraic Smyth. 1998. Rule discovery from time series. In KDD. 16--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 (2006), 1--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Evgenia Dimitriadou, Andreas Weingessel, and Kurt Hornik. 2002. A combination scheme for fuzzy clustering. Int. J. Pattern Recogn. Artif. Intell. 16, 7 (2002), 901--912.Google ScholarGoogle ScholarCross RefCross Ref
  23. Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB 1, 2 (2008), 1542--1552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rui Ding, Qiang Wang, Yingnong Dang, Qiang Fu, Haidong Zhang, and Dongmei Zhang. 2015. Yading: Fast clustering of large-scale time series data. Proc. VLDB 8, 5 (2015), 473--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In SIGKDD. 226--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Christos Faloutsos, M. Ranganathan, and Yannis Manolopoulos. 1994. Fast subsequence matching in time-series databases. In SIGMOD. 419--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Maurizio Filippone, Francesco Camastra, Francesco Masulli, and Stefano Rovetta. 2008. A survey of kernel and spectral methods for clustering. Pattern Recogn. 41, 1 (2008), 176--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Elias Frentzos, Kostas Gratsias, and Yannis Theodoridis. 2007. Index-based most similar trajectory search. In ICDE. 816--825.Google ScholarGoogle Scholar
  29. Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Statist. Assoc. 32 (1937), 675--701.Google ScholarGoogle ScholarCross RefCross Ref
  30. Matteo Frigo and Steven G. Johnson. 2005. The design and implementation of FFTW3. Proc. IEEE 93, 2 (2005), 216--231.Google ScholarGoogle ScholarCross RefCross Ref
  31. Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, and Rajeev Motwani. 2000. Mining the stock market: Which measure is best? In KDD. 487--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rafael Giusti and Gustavo E. A. P. A. Batista. 2013. An Empirical Comparison of Dissimilarity Measures for Time Series Classification. In BRACIS. 82--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Steve Goddard, Sherri K. Harms, Stephen E. Reichenbach, Tsegaye Tadesse, and William J. Waltman. 2003. Geospatial decision support for drought risk management. Commun. ACM 46, 1 (2003), 35--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xavier Golay, Spyros Kollias, Gautier Stoll, Dieter Meier, Anton Valavanis, and Peter Boesiger. 1998. A new correlation-based fuzzy logic clustering algorithm for fMRI. Magn. Reson. Med. 40, 2 (1998), 249--260.Google ScholarGoogle ScholarCross RefCross Ref
  35. Dina Q. Goldin and Paris C. Kanellakis. 1995. On similarity queries for time-series data: Constraint specification and implementation. In CP. 137--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.Google ScholarGoogle Scholar
  37. Jianping Gou, Zhang Yi, Lan Du, and Taisong Xiong. 2012. A local mean-based k-nearest centroid neighbor classifier. Comput. J. 55, 9 (2012), 1058--1071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Cyril Goutte, Peter Toft, Egill Rostrup, Finn Å Nielsen, and Lars Kai Hansen. 1999. On clustering fMRI time series. NeuroImage 9, 3 (1999), 298--310.Google ScholarGoogle ScholarCross RefCross Ref
  39. Lalit Gupta, Dennis L. Molfese, Ravi Tammana, and Panagiotis G. Simos. 1996. Nonlinear alignment and averaging for estimating the evoked potential. IEEE Trans. Biomed. Eng. 43, 4 (1996), 348--356.Google ScholarGoogle ScholarCross RefCross Ref
  40. Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On clustering validation techniques. J. Intell. Inf. Syst. 17, 2--3 (2001), 107--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Pierre Hansen and Brigitte Jaumard. 1997. Cluster analysis and mathematical programming. Math. Program. 79, 1--3 (1997), 191--215.Google ScholarGoogle ScholarCross RefCross Ref
  43. Rie Honda, Shuai Wang, Tokio Kikuchi, and Osamu Konishi. 2002. Mining of moving objects from time-series images and its application to satellite weather imagery. J. Intell. Inf. Syst. 19, 1 (2002), 79--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Frank Hőppner and Frank Klawonn. 2009. Compensation of translational displacement in time series clustering using cross correlation. In Advances in Intelligent Data Analysis VIII. Springer, 71--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Bing Hu, Yanping Chen, and Eamonn Keogh. 2013. Time series classification under more realistic assumptions. In SDM. 578--586.Google ScholarGoogle Scholar
  46. Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta. 2001. Distance measures for effective clustering of ARIMA time-series. In ICDM. 273--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yitzhak Katznelson. 2004. An Introduction to Harmonic Analysis. Cambridge University Press.Google ScholarGoogle Scholar
  48. Leonard Kaufman and Peter J. Rousseeuw. 2009. Finding Groups in Data: An Introduction to Cluster Analysis. Vol. 344. John Wiley 8 Sons.Google ScholarGoogle Scholar
  49. Eamonn Keogh. 2006. A decade of progress in indexing and mining large time series databases. In VLDB. 1268--1268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra. 2001. Locally adaptive dimensionality reduction for indexing large time series databases. In SIGMOD. 151--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Eamonn Keogh and Jessica Lin. 2005. Clustering of time-series subsequences is meaningless: Implications for previous and future research. Knowl. Inf. Syst. 8, 2 (2005), 154--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7, 3 (2005), 358--386. Google ScholarGoogle ScholarCross RefCross Ref
  53. Eamonn Keogh, Xiaopeng Xi, Li Wei, and Chotirat Ann Ratanamahatana. The ucr time series classification/clustering homepage. Accessed October 2015 from www.cs.ucr.edu/∼eamonn/time_series_data.Google ScholarGoogle Scholar
  54. Chan Kin-pong and Fu Ada. 1999. Efficient time series matching by wavelets. In ICDE. 126--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Flip Korn, H. V. Jagadish, and Christos Faloutsos. 1997. Efficiently supporting ad hoc queries in large datasets of time sequences. In SIGMOD. 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Chung-Sheng Li, Philip S. Yu, and Vittorio Castelli. 1998. MALM: A framework for mining sequence database at multiple abstraction levels. In CIKM. 267--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Xiang Lian, Lei Chen, Jeffrey Xu Yu, Guoren Wang, and Ge Yu. 2007. Similarity match over high speed time-series streams. In ICDE. 1086--1095.Google ScholarGoogle Scholar
  58. Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos. 2004. Iterative incremental clustering of time series. In EDBT. 106--122.Google ScholarGoogle Scholar
  59. Jason Lines and Anthony Bagnall. 2014. Ensembles of elastic distance measures for time series classification. In SDM. 524--532.Google ScholarGoogle Scholar
  60. Jason Lines and Anthony Bagnall. 2015. Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Discov. 29, 3 (2015), 565--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In BSMSP. 281--297.Google ScholarGoogle Scholar
  62. Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. 2009. The planar k-means problem is NP-hard. In WALCOM. 274--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Rosario N. Mantegna. 1999. Hierarchical structure in financial markets. Eur. Phys. J. B. 11, 1 (1999), 193--197.Google ScholarGoogle ScholarCross RefCross Ref
  64. Warissara Meesrikamolkul, Vit Niennattrakul, and Chotirat Ann Ratanamahatana. 2012. Shape-Based clustering for time series data. In PAKDD. 530--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Vasileios Megalooikonomou, Qiang Wang, Guo Li, and Christos Faloutsos. 2005. A multiresolution symbolic representation of time series. In ICDE. 668--679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Yoshihiro Mitani and Yoshihiko Hamamoto. 2000. Classifier design based on the use of nearest neighbor samples. In ICPR, Vol. 2. 769--772.Google ScholarGoogle Scholar
  67. Yoshihiro Mitani and Yoshihiko Hamamoto. 2006. A local mean-based nonparametric classifier. Pattern Recogn. Lett. 27, 10 (2006), 1151--1159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Michael D. Morse and Jignesh M. Patel. 2007. An efficient and accurate method for evaluating time series similarity. In SIGMOD. 569--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Abdullah Mueen, Hossein Hamooni, and Trilce Estrada. 2014. Time series join on subsequence correlation. In ICDM. 450--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Abdullah Mueen, Eamonn Keogh, and Neal Young. 2011. Logical-shapelets: An expressive primitive for time series classification. In KDD. 1154--1162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Peter Nemenyi. 1963. Distribution-free Multiple Comparisons. Ph.D. Dissertation. Princeton University.Google ScholarGoogle Scholar
  72. Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. In NIPS. 849--856. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Vit Niennattrakul and Chotirat Ann Ratanamahatana. 2009. Shape averaging under time warping. In ECTI-CON. 626--629.Google ScholarGoogle Scholar
  74. Tim Oates. 1999. Identifying distinctive subsequences in multivariate time series by clustering. In KDD. 322--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Spiros Papadimitriou, Jimeng Sun, and Philip S. Yu. 2006. Local correlation tracking in time series. In ICDM. 456--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Panagiotis Papapetrou, Vassilis Athitsos, Michalis Potamias, George Kollios, and Dimitrios Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. TODS 36, 3 (2011), 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. John Paparrizos and Luis Gravano. 2015. k-Shape: Efficient and accurate clustering of time series. In SIGMOD. 1855--1870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. John Paparrizos and Luis Gravano. 2016. k-Shape: Efficient and accurate clustering of time series. ACM SIGMOD Rec. 45, 1 (2016), 69--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A fast, scalable, in-memory time series database. Proc. VLDB 8, 12 (2015), 1816--1827. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. François Petitjean, Germain Forestier, Geoffrey I. Webb, Ann E. Nicholson, Yanping Chen, and Eamonn Keogh. 2014. Dynamic time warping averaging of time series allows faster and more accurate classification. In ICDM. 470--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. François Petitjean, Germain Forestier, Geoffrey I. Webb, Ann E. Nicholson, Yanping Chen, and Eamonn Keogh. 2015. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowl. Inf. Syst. 47 (2015), 1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. François Petitjean and Pierre Gançarski. 2012. Summarizing a set of time series by averaging: From steiner sequence to compact multiple alignment. Theor. Comput. Sci. 414, 1 (2012), 76--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. François Petitjean, Alain Ketterlin, and Pierre Gançarski. 2011. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 44, 3 (2011), 678--693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD. 262--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Thanawin Rakthanmanon, Eamonn J. Keogh, Stefano Lonardi, and Scott Evans. 2011. Time series epenthesis: Clustering time series streams requires ignoring some data. In ICDM. 547--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. William M. Rand. 1971. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 336 (1971), 846--850.Google ScholarGoogle ScholarCross RefCross Ref
  87. Chotirat Ann Ratanamahatana and Eamonn Keogh. 2004. Making time-series classification more accurate using learned constraints. In SDM. 11--22.Google ScholarGoogle Scholar
  88. John Rice. 2006. Mathematical Statistics and Data Analysis. Cengage Learning.Google ScholarGoogle Scholar
  89. Alex Rodriguez and Alessandro Laio. 2014. Clustering by fast search and find of density peaks. Science 344, 6191 (2014), 1492--1496.Google ScholarGoogle Scholar
  90. Eduardo J. Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis, and Alejandro Jaimes. 2012. Correlating financial time series with micro-blogging activity. In WSDM. 513--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Sartaj Sahni and Teofilo Gonzalez. 1976. P-complete approximation problems. J. ACM 23, 3 (1976), 555--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Naoki Saito. 1994. Local Feature Extraction and Its Applications Using a Library of Bases. Ph.D. Dissertation. Yale University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sign. Process. 26, 1 (1978), 43--49.Google ScholarGoogle ScholarCross RefCross Ref
  94. Yasushi Sakurai, Spiros Papadimitriou, and Christos Faloutsos. 2005. Braid: Stream mining through group lag correlations. In SIGMOD. 599--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Stan Salvador and Philip Chan. 2004. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In ICTAI. 576--584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. José Salvador Sánchez, Filiberto Pla, and Francesc J. Ferri. 1997. On the use of neighbourhood-based non-parametric classifiers. Pattern Recogn. Lett. 18, 11 (1997), 1179--1186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Patrick Schäfer. 2015. Scalable time series classification. Data Min. Knowl. Discov. 30 (2015), 1273--1298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Yutao Shou, Nikos Mamoulis, and David Cheung. 2005. Fast and exact warping of time series using adaptive segmental approximations. Machine Learning 58, 2--3 (2005), 231--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Diego F. Silva, Gustavo E. A. P. A. Batista, and Eamonn Keogh. 2016. Prefix and suffix invariant dynamic time warping. In ICDM. IEEE, 1209--1214.Google ScholarGoogle Scholar
  100. Antoniu Stefan, Vassilis Athitsos, and Goutam Das. 2013. The move-split-merge metric for time series. IEEE Trans. Knowl. Data Eng. 25, 6 (2013), 1425--1438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Kuniaki Uehara and Mitsuomi Shimada. 2002. Extraction of primitive motion and discovery of association rules from human motion data. In Progress in Discovery Science. 338--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, and Eamonn Keogh. 2006. Indexing multidimensional time-series. VLDB J. 15, 1 (2006), 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Michail Vlachos, George Kollios, and Dimitrios Gunopulos. 2002. Discovering similar multidimensional trajectories. In ICDE. 673--684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Hao Wang, Yong-fu Cai, Yin Yang, Shiming Zhang, and Nikos Mamoulis. 2014. Durable queries over historical time series. IEEE Trans. Knowl. Data Eng. 26, 3 (2014), 595--607. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Lusheng Wang and Tao Jiang. 1994. On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 4 (1994), 337--348.Google ScholarGoogle ScholarCross RefCross Ref
  106. Xiaoyue Wang, Abdullah Mueen, Hui Ding, Goce Trajcevski, Peter Scheuermann, and Eamonn Keogh. 2013. Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26, 2 (2013), 275--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Xiaozhe Wang, Kate Smith, and Rob Hyndman. 2006. Characteristic-based clustering for time series data. Data Min. Knowl. Discov. 13, 3 (2006), 335--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. T. Warren Liao. 2005. Clustering of time series data—A survey. Pattern Recogn. 38, 11 (2005), 1857--1874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometr. Bull. (1945), 80--83.Google ScholarGoogle Scholar
  110. D. Randall Wilson and Tony R. Martinez. 1997. Instance pruning techniques. In ICML. Vol. 97. 403--411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. D. Randall Wilson and Tony R. Martinez. 2000. Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 3 (2000), 257--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Di Wu, Yiping Ke, Jeffrey Xu Yu, S. Yu Philip, and Lei Chen. 2010. Detecting leaders from correlated time series. In DASFAA. 352--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Yimin Xiong and Dit-Yan Yeung. 2002. Mixtures of ARMA models for model-based time series clustering. In ICDM. 717--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In WSDM. 177--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: A new primitive for data mining. In KDD. 947--956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Jamaluddin Zakaria, Abdullah Mueen, and Eamonn Keogh. 2012. Clustering time series using unsupervised-shapelets. In ICDM. 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Yunyue Zhu and Dennis Shasha. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB. 358--369. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast and Accurate Time-Series Clustering

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Database Systems
            ACM Transactions on Database Systems  Volume 42, Issue 2
            Invited Paper from SIGMOD 2015, Invited Paper from PODS 2015 and Regular Papers
            June 2017
            251 pages
            ISSN:0362-5915
            EISSN:1557-4644
            DOI:10.1145/3086510
            Issue’s Table of Contents

            Copyright © 2017 ACM

            © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 June 2017
            • Accepted: 1 January 2017
            • Revised: 1 December 2016
            • Received: 1 March 2016
            Published in tods Volume 42, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader