skip to main content
10.1145/3357223.3362721acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open Access

Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond

Published:20 November 2019Publication History

ABSTRACT

The discovery of conserved (repeated) patterns in time series is arguably the most important primitive in time series data mining. Called time series motifs, these primitive patterns are useful in their own right, and are also used as inputs into classification, clustering, segmentation, visualization, and anomaly detection algorithms. Recently the Matrix Profile has emerged as a promising representation to allow the efficient exact computation of the top-k motifs in a time series. State-of-the-art algorithms for computing the Matrix Profile are fast enough for many tasks. However, in a handful of domains, including astronomy and seismology, there is an insatiable appetite to consider ever larger datasets. In this work we show that with several novel insights we can push the motif discovery envelope using a novel scalable framework in conjunction with a deployment to commercial GPU clusters in the cloud. We demonstrate the utility of our ideas with detailed case studies in seismology, demonstrating that the efficiency of our algorithm allows us to exhaustively consider datasets that are currently only approximately searchable, allowing us to find subtle precursor earthquakes that had previously escaped attention, and other novel seismic regularities.

References

  1. K. Aki and B. Chouet. Origin of coda waves: source, attenuation, and scattering effects. Geophysical Research, 80(23): 3322--3342, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  2. B. Atwater, et al. Summary of coastal geologic evidence for past great earthquakes at the Cascadia subduction zone. Earthquake spectra, 11(1): 1--18, 1995.Google ScholarGoogle Scholar
  3. N. Begum, B. Hu, T. Rakthanmanon, and E. J. Keogh. Towards a minimum description length-based stopping criterion for semi-supervised time series classification. IRI, 333--340, 2013.Google ScholarGoogle Scholar
  4. K. J. Bergen and G. C. Beroza. Detecting earthquakes over a seismic network using single-station similarity measures. Geophysical Journal International, 213(3): 1984--1998, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. Boyarko, Det al. (2015). Automated detection and location of tectonic tremor along the entire Cascadia margin from 2005 to 2011. Earth and Planetary Science Letters, 430, 160--170.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Brown, G Beroza, & D. Shelly (2008). An autocorrelation method to detect low frequency earthquakes within tremor. Geophysical Research Letters, 35(16).Google ScholarGoogle Scholar
  7. Yoon, C. E, et al. Earthquake detection through computationally efficient similarity search. Science advances, 1(11): e1501057, 2015.Google ScholarGoogle Scholar
  8. B. Castello, M. Olivieri, and G. Selvaggi. Local and duration magnitude determination for the Italian earthquake catalog, 1981-2002. Seismological Society of America, 97(1B): 128--139, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  9. H. Dau and E. Keogh. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. KDD, 125--134, 2017.Google ScholarGoogle Scholar
  10. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1): 107--113, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nvidia Tesla V100 Whitepaper: http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdfGoogle ScholarGoogle Scholar
  12. E. H. Field, et al. Uniform California earthquake rupture forecast, version 3 (UCERF3)---The time-independent model. Bulletin of the Seismological Society of America, 104(3): 1122--1180, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  13. I. Fox, L. Ang, M. Jaiswal, R. Pop-Busui, and J. Wiens. Contextual motifs: Increasing the utility of motifs using contextual data. In Proceedings of the 23rd ACM SIGKDD (2017), pages 155--164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Han, S., Mao, H. and Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR, 2016Google ScholarGoogle Scholar
  15. S. Gupta, et al. Deep learning with limited numerical precision. In Proceedings of the 32nd ICML, pages. 1737--1746. JMLR, 2015.Google ScholarGoogle Scholar
  16. S. Hainzl and D. Marsan. (2008). Dependence of the OmoriUtsu law parameters on main shock magnitude: Observations and modeling. Journal of Geophysical Research: Solid Earth, 113(B10), 2008.Google ScholarGoogle ScholarCross RefCross Ref
  17. D. Hill, et al. The 1989 earthquake swarm beneath Mammoth Mountain, California: An initial look at the 4 May through 30 September activity. Seismological Society of America, 80(2): 1990.Google ScholarGoogle Scholar
  18. N. M. Ho and W. F. Wong. Exploiting half precision arithmetic in Nvidia GPUs. HPEC, 1--7, 2017.Google ScholarGoogle Scholar
  19. A. Hutko, M. Bahavar, C. Trabant, R. Weekly, M. Fossen, and T. Ahern. Data products at the IRIS-DMC: Growth and usage. Seismological Research Letters, 88(3): 892--903, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. Hyndman and K. Wang. The rupture zone of Cascadia great earthquakes from current deformation and the thermal regime. Journal of Geophysical Research: Solid Earth, B11: 22133, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  21. F. Klein. (2002). User's guide to HYPOINVERSE-2000, a Fortran program to solve for earthquake locations and magnitudes. US Geological Survey, 02-171(1.0), 2002.Google ScholarGoogle Scholar
  22. I. Kolb et al. Evidence for long-timescale patterns of synaptic inputs in CA1 of awake behaving mice. Neuroscience, 1519--17, 2017.Google ScholarGoogle Scholar
  23. K. Mauck. (2018) Personal communicationGoogle ScholarGoogle Scholar
  24. A. Murillo (2018). Personal Communication.Google ScholarGoogle Scholar
  25. R. Nadeau, W. Foxall, and T. McEvilly. Clustering and periodic recurrence of microearthquakes on the San Andreas fault at Parkfield, California. Science, 267: 503--7, 1995.Google ScholarGoogle Scholar
  26. S. E. J. Nippress, A. Rietbrock, and A. E. Heath. Optimized automatic pickers: application to the ANCORP data set. Geophysical Journal International, 181(2): 911--925, 2010.Google ScholarGoogle Scholar
  27. SCAMP Supporting Webpage: https://sites.google.com/view/2019scampGoogle ScholarGoogle Scholar
  28. T. Omi, Y. Ogata, Y. Hirata, and K. Aihara. Forecasting large aftershocks within one day after the main shock. Scientific reports, 3: 2218, 2013.Google ScholarGoogle Scholar
  29. Z. Peng and P. Zhao. Migration of early aftershocks following the 2004 Parkfield earthquake. Nature Geoscience, 2(12): 877, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  30. https://devblogs.nvidia.com/gpu-pro-tip-fast-histograms-using-shared-atomics-maxwell/Google ScholarGoogle Scholar
  31. G. Poupinet, et al. Monitoring velocity variations in the crust using earthquake doublets: An application to the Calaveras Fault, California. Geophysical Research: Solid Earth, 89(B7): 1984.Google ScholarGoogle Scholar
  32. K. Rong, et al. Locality sensitive hashing for earthquake detection: a case study of scaling data-driven science. In VLDB 2017.Google ScholarGoogle Scholar
  33. Z. Ross and Y. Ben-Zion. Automatic picking of direct P, S seismic phases and fault zone head waves. Geophysical Journal International, 199(1): 368--381, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  34. A. Royer and M. Bostock. A comparative study of low frequency earthquake templates in northern Cascadia. Earth and Planetary Science Letters, 402: 247--256, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  35. W. Sandanayaka, Y. Jia, and J. G. Charles. EPG technique as a tool to reveal host plant acceptance by xylem sap-feeding insects. Journal of Applied Entomology, 137: 519--529, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  36. D. P. Schaff and F. Waldhauser. One magnitude unit reduction in detection threshold by cross correlation applied to Parkfield (California) and China seismicity. Bulletin of the Seismological Society of America, 100(6): 3224--3238, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  37. D. Schaff and F. Waldhauser. Waveform cross-correlation-based differential traveltime measurements at the Northern California Seismic Network. Seismological Society of America, 95(6): 2005.Google ScholarGoogle Scholar
  38. D. Silva, C-C M. Yeh, G. Batista, E. Keogh: SiMPle: Assessing Music Similarity Using Subsequences Joins. ISMIR 2016: 23--29.Google ScholarGoogle Scholar
  39. https://aws.amazon.com/ec2/spot/Google ScholarGoogle Scholar
  40. R. D. Vatavu. Small gestures go a long way: how many bits per gesture do recognizers actually need? In DIS '12, pp 328--337, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Zhu, et al. Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins. KAIS 1--34, 2018.Google ScholarGoogle Scholar
  42. What-if. https://what-if.xkcd.com/63/.Google ScholarGoogle Scholar
  43. C. C. M. Yeh, et al. Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. In ICDM, pages 1317--1322. IEEE, 2016.Google ScholarGoogle Scholar
  44. Y. Zhu, et al. Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins. In ICDM, pages 739--748. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  45. Bakun, W. H., et al. The Parkfield, California, earthquake prediction Experiment. Science, 229(4714): 619--624, 1984.Google ScholarGoogle Scholar
  46. Utsu, T., & Ogata, Y. The centenary of the Omori formula for a decay law of aftershock activity. Journal of Physics of the Earth, 43(1): 1--33, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  47. "HRSN (2014), High Resolution Seismic Network. UC Berkeley Seismological Laboratory. Dataset. doi:10.7932/HRSN."Google ScholarGoogle Scholar
  48. Brodsky, E. E. (2019). The importance of studying small earthquakes. Science, 364(6442), 736--737.Google ScholarGoogle Scholar
  49. Boyarko, D. C., & Brudzinski, M. R. (2010). Spatial and temporal patterns of nonvolcanic tremor along the southern Cascadia subduction zone. Journal of Geophysical Research: Solid Earth, 115(B8).Google ScholarGoogle Scholar
  50. Shelly, D. R. (2010). Migrating tremors illuminate complex deformation beneath the seismogenic San Andreas fault. Nature, 463(7281), 648.Google ScholarGoogle Scholar
  51. Shelly, D. R., G. C. Beroza, and S. Ide (2007). Non-volcanic tremor and low-frequency earthquake swarms, Nature 446, no. 7133, 305.Google ScholarGoogle Scholar
  52. Obara, K., & Kato, A. (2016). Connecting slow earthquakes to huge earthquakes. Science, 353(6296), 253--257.Google ScholarGoogle Scholar
  53. Rubinstein, J. L., Shelly, D. R., & Ellsworth, W. L. (2009). Non-volcanic tremor: A window into the roots of fault zones. In New Frontiers in Integrated Solid Earth Sciences (pp. 287--314). Springer, DordrechGoogle ScholarGoogle Scholar
  54. The TileDB Array Data Storage Manager, VLDB'16. https://tiledb.io/Google ScholarGoogle Scholar

Index Terms

  1. Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SoCC '19: Proceedings of the ACM Symposium on Cloud Computing
        November 2019
        503 pages
        ISBN:9781450369732
        DOI:10.1145/3357223

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 November 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        SoCC '19 Paper Acceptance Rate39of157submissions,25%Overall Acceptance Rate169of722submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader