ABSTRACT
The discovery of conserved (repeated) patterns in time series is arguably the most important primitive in time series data mining. Called time series motifs, these primitive patterns are useful in their own right, and are also used as inputs into classification, clustering, segmentation, visualization, and anomaly detection algorithms. Recently the Matrix Profile has emerged as a promising representation to allow the efficient exact computation of the top-k motifs in a time series. State-of-the-art algorithms for computing the Matrix Profile are fast enough for many tasks. However, in a handful of domains, including astronomy and seismology, there is an insatiable appetite to consider ever larger datasets. In this work we show that with several novel insights we can push the motif discovery envelope using a novel scalable framework in conjunction with a deployment to commercial GPU clusters in the cloud. We demonstrate the utility of our ideas with detailed case studies in seismology, demonstrating that the efficiency of our algorithm allows us to exhaustively consider datasets that are currently only approximately searchable, allowing us to find subtle precursor earthquakes that had previously escaped attention, and other novel seismic regularities.
- K. Aki and B. Chouet. Origin of coda waves: source, attenuation, and scattering effects. Geophysical Research, 80(23): 3322--3342, 1975.Google ScholarCross Ref
- B. Atwater, et al. Summary of coastal geologic evidence for past great earthquakes at the Cascadia subduction zone. Earthquake spectra, 11(1): 1--18, 1995.Google Scholar
- N. Begum, B. Hu, T. Rakthanmanon, and E. J. Keogh. Towards a minimum description length-based stopping criterion for semi-supervised time series classification. IRI, 333--340, 2013.Google Scholar
- K. J. Bergen and G. C. Beroza. Detecting earthquakes over a seismic network using single-station similarity measures. Geophysical Journal International, 213(3): 1984--1998, 2018.Google ScholarCross Ref
- D. Boyarko, Det al. (2015). Automated detection and location of tectonic tremor along the entire Cascadia margin from 2005 to 2011. Earth and Planetary Science Letters, 430, 160--170.Google ScholarCross Ref
- J. Brown, G Beroza, & D. Shelly (2008). An autocorrelation method to detect low frequency earthquakes within tremor. Geophysical Research Letters, 35(16).Google Scholar
- Yoon, C. E, et al. Earthquake detection through computationally efficient similarity search. Science advances, 1(11): e1501057, 2015.Google Scholar
- B. Castello, M. Olivieri, and G. Selvaggi. Local and duration magnitude determination for the Italian earthquake catalog, 1981-2002. Seismological Society of America, 97(1B): 128--139, 2007.Google ScholarCross Ref
- H. Dau and E. Keogh. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. KDD, 125--134, 2017.Google Scholar
- J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1): 107--113, 2008.Google ScholarDigital Library
- Nvidia Tesla V100 Whitepaper: http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdfGoogle Scholar
- E. H. Field, et al. Uniform California earthquake rupture forecast, version 3 (UCERF3)---The time-independent model. Bulletin of the Seismological Society of America, 104(3): 1122--1180, 2014.Google ScholarCross Ref
- I. Fox, L. Ang, M. Jaiswal, R. Pop-Busui, and J. Wiens. Contextual motifs: Increasing the utility of motifs using contextual data. In Proceedings of the 23rd ACM SIGKDD (2017), pages 155--164.Google ScholarDigital Library
- Han, S., Mao, H. and Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR, 2016Google Scholar
- S. Gupta, et al. Deep learning with limited numerical precision. In Proceedings of the 32nd ICML, pages. 1737--1746. JMLR, 2015.Google Scholar
- S. Hainzl and D. Marsan. (2008). Dependence of the OmoriUtsu law parameters on main shock magnitude: Observations and modeling. Journal of Geophysical Research: Solid Earth, 113(B10), 2008.Google ScholarCross Ref
- D. Hill, et al. The 1989 earthquake swarm beneath Mammoth Mountain, California: An initial look at the 4 May through 30 September activity. Seismological Society of America, 80(2): 1990.Google Scholar
- N. M. Ho and W. F. Wong. Exploiting half precision arithmetic in Nvidia GPUs. HPEC, 1--7, 2017.Google Scholar
- A. Hutko, M. Bahavar, C. Trabant, R. Weekly, M. Fossen, and T. Ahern. Data products at the IRIS-DMC: Growth and usage. Seismological Research Letters, 88(3): 892--903, 2017.Google ScholarCross Ref
- R. Hyndman and K. Wang. The rupture zone of Cascadia great earthquakes from current deformation and the thermal regime. Journal of Geophysical Research: Solid Earth, B11: 22133, 1995.Google ScholarCross Ref
- F. Klein. (2002). User's guide to HYPOINVERSE-2000, a Fortran program to solve for earthquake locations and magnitudes. US Geological Survey, 02-171(1.0), 2002.Google Scholar
- I. Kolb et al. Evidence for long-timescale patterns of synaptic inputs in CA1 of awake behaving mice. Neuroscience, 1519--17, 2017.Google Scholar
- K. Mauck. (2018) Personal communicationGoogle Scholar
- A. Murillo (2018). Personal Communication.Google Scholar
- R. Nadeau, W. Foxall, and T. McEvilly. Clustering and periodic recurrence of microearthquakes on the San Andreas fault at Parkfield, California. Science, 267: 503--7, 1995.Google Scholar
- S. E. J. Nippress, A. Rietbrock, and A. E. Heath. Optimized automatic pickers: application to the ANCORP data set. Geophysical Journal International, 181(2): 911--925, 2010.Google Scholar
- SCAMP Supporting Webpage: https://sites.google.com/view/2019scampGoogle Scholar
- T. Omi, Y. Ogata, Y. Hirata, and K. Aihara. Forecasting large aftershocks within one day after the main shock. Scientific reports, 3: 2218, 2013.Google Scholar
- Z. Peng and P. Zhao. Migration of early aftershocks following the 2004 Parkfield earthquake. Nature Geoscience, 2(12): 877, 2009.Google ScholarCross Ref
- https://devblogs.nvidia.com/gpu-pro-tip-fast-histograms-using-shared-atomics-maxwell/Google Scholar
- G. Poupinet, et al. Monitoring velocity variations in the crust using earthquake doublets: An application to the Calaveras Fault, California. Geophysical Research: Solid Earth, 89(B7): 1984.Google Scholar
- K. Rong, et al. Locality sensitive hashing for earthquake detection: a case study of scaling data-driven science. In VLDB 2017.Google Scholar
- Z. Ross and Y. Ben-Zion. Automatic picking of direct P, S seismic phases and fault zone head waves. Geophysical Journal International, 199(1): 368--381, 2014.Google ScholarCross Ref
- A. Royer and M. Bostock. A comparative study of low frequency earthquake templates in northern Cascadia. Earth and Planetary Science Letters, 402: 247--256, 2014.Google ScholarCross Ref
- W. Sandanayaka, Y. Jia, and J. G. Charles. EPG technique as a tool to reveal host plant acceptance by xylem sap-feeding insects. Journal of Applied Entomology, 137: 519--529, 2013.Google ScholarCross Ref
- D. P. Schaff and F. Waldhauser. One magnitude unit reduction in detection threshold by cross correlation applied to Parkfield (California) and China seismicity. Bulletin of the Seismological Society of America, 100(6): 3224--3238, 2010.Google ScholarCross Ref
- D. Schaff and F. Waldhauser. Waveform cross-correlation-based differential traveltime measurements at the Northern California Seismic Network. Seismological Society of America, 95(6): 2005.Google Scholar
- D. Silva, C-C M. Yeh, G. Batista, E. Keogh: SiMPle: Assessing Music Similarity Using Subsequences Joins. ISMIR 2016: 23--29.Google Scholar
- https://aws.amazon.com/ec2/spot/Google Scholar
- R. D. Vatavu. Small gestures go a long way: how many bits per gesture do recognizers actually need? In DIS '12, pp 328--337, 2012.Google ScholarDigital Library
- Y. Zhu, et al. Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins. KAIS 1--34, 2018.Google Scholar
- What-if. https://what-if.xkcd.com/63/.Google Scholar
- C. C. M. Yeh, et al. Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. In ICDM, pages 1317--1322. IEEE, 2016.Google Scholar
- Y. Zhu, et al. Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins. In ICDM, pages 739--748. IEEE, 2016.Google ScholarCross Ref
- Bakun, W. H., et al. The Parkfield, California, earthquake prediction Experiment. Science, 229(4714): 619--624, 1984.Google Scholar
- Utsu, T., & Ogata, Y. The centenary of the Omori formula for a decay law of aftershock activity. Journal of Physics of the Earth, 43(1): 1--33, 1995.Google ScholarCross Ref
- "HRSN (2014), High Resolution Seismic Network. UC Berkeley Seismological Laboratory. Dataset. doi:10.7932/HRSN."Google Scholar
- Brodsky, E. E. (2019). The importance of studying small earthquakes. Science, 364(6442), 736--737.Google Scholar
- Boyarko, D. C., & Brudzinski, M. R. (2010). Spatial and temporal patterns of nonvolcanic tremor along the southern Cascadia subduction zone. Journal of Geophysical Research: Solid Earth, 115(B8).Google Scholar
- Shelly, D. R. (2010). Migrating tremors illuminate complex deformation beneath the seismogenic San Andreas fault. Nature, 463(7281), 648.Google Scholar
- Shelly, D. R., G. C. Beroza, and S. Ide (2007). Non-volcanic tremor and low-frequency earthquake swarms, Nature 446, no. 7133, 305.Google Scholar
- Obara, K., & Kato, A. (2016). Connecting slow earthquakes to huge earthquakes. Science, 353(6296), 253--257.Google Scholar
- Rubinstein, J. L., Shelly, D. R., & Ellsworth, W. L. (2009). Non-volcanic tremor: A window into the roots of fault zones. In New Frontiers in Integrated Solid Earth Sciences (pp. 287--314). Springer, DordrechGoogle Scholar
- The TileDB Array Data Storage Manager, VLDB'16. https://tiledb.io/Google Scholar
Index Terms
- Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond
Recommendations
Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningTime series motif discovery has emerged as perhaps the most used primitive for time series data mining, and has seen applications to domains as diverse as robotics, medicine and climatology. There has been recent significant progress on the scalability ...
Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataIn the last fifteen years, data series motif discovery has emerged as one of the most useful primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the ...
An Enhanced Time Series Motif Discovery Using Approximated Matrix Profile
IPMV '20: Proceedings of the 2020 2nd International Conference on Image Processing and Machine VisionMotif discovery of time series data is one of the most prevalent data mining tasks in finding repeated patterns that contain important information in a time series sequence. In particular, its purpose is to find the most similar non-overlapping ...
Comments