skip to main content
10.1145/1066157.1066161acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Holistic aggregates in a networked world: distributed tracking of approximate quantiles

Published:14 June 2005Publication History

ABSTRACT

While traditional database systems optimize for performance on one-shot queries, emerging large-scale monitoring applications require continuous tracking of complex aggregates and data-distribution summaries over collections of physically-distributed streams. Thus, effective solutions have to be simultaneously space efficient (at each remote site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality estimates. In this paper, we propose novel algorithmic solutions for the problem of continuously tracking complex holistic aggregates in such a distributed-streams setting --- our primary focus is on approximate quantile summaries, but our approach is more broadly applicable and can handle other holistic-aggregate functions (e.g., "heavy-hitters" queries). We present the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote site as well as the communication cost across the network. In a nutshell, our algorithms employ a combination of local tracking at remote sites and simple prediction models for local site behavior in order to produce highly communication- and space-efficient solutions. We perform extensive experiments with real and synthetic data to explore the various tradeoffs and understand the role of prediction models in our schemes. The results clearly validate our approach, revealing significant savings over naive solutions as well as our analytical worst-case guarantees.

References

  1. N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. "Tracking Join and Self-Join Sizes in Limited Storage". In Proceedings of ACM PODS, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon, Y. Matias, and M. Szegedy. "The Space Complexity of Approximating the Frequency Moments". In Proceedings of ACM STOC, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Babcock and C. Olston. "Distributed Top-K Monitoring". In Proceedings of ACM SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Cormode and S. Muthukrishnan. "An improved data stream summary: The count-min sketch and its applications". In Proceedings of Latin American Informatics, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. G. Cormode and S. Muthukrishnan. "What's Hot and What's Not: Tracking Most Frequent Items Dynamically". In Proceedings of ACM PODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk. "Gigascope: A Stream Database for Network Applications". In Proceedings of ACM SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dartmouth wireless traces. (http://cmc.cs.dartmouth.-edu/data/dartmouth.html).Google ScholarGoogle Scholar
  8. A. Das, S. Ganguly, M. Garofalakis, and R. Rastogi. "Distributed Set-Expression Cardinality Estimation". In Proceedings of VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Deshpande, C. Guestrin, S. R. Madden, J. M. Hellerstein, and W. Hong. "Model-Driven Data Acquisition in Sensor Networks". In Proceedings of VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Ganguly, M. Garofalakis, and R. Rastogi. "Processing Set Expressions over Continuous Update Streams". In Proceedings of ACM SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. B. Gibbons. "Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports". In Proceedings of VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. "Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries". In Proceedings of VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. "How to Summarize the Universe: Dynamic Maintenance of Quantiles". In Proceedings of VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. B. Greenwald and S. Khanna. "Space-Efficient Online Computation of Quantile Summaries". In Proceedings of ACM SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. B. Greenwald and S. Khanna. "Power-Conserving Computation of Order-Statistics over Sensor Networks". In Proceedings of ACM PODS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. H. Hardy, J. E. Littlewood, and G. Pólya. "Inequalities". Cambridge University Press, 1988. (Second Edition).Google ScholarGoogle Scholar
  17. Internet traffic archive. (http://ita.ee.lbl.gov/).Google ScholarGoogle Scholar
  18. S. R. Madden, M. J. Franklin, J. M. Hellerstein, and H. Wei. "The Design of an Acquisitional Query Processor for Sensor Networks". In Proceedings of ACM SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. "Finding (Recently) Frequent Items in Distributed Data Streams". In Proceedings of IEEE ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. S. Manku, S. Rajagopalan, and B. G. Lindsey. "Approximate medians and other quantiles in one pass and with limited memory". In Proceedings of ACM SIGMOD, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. S. Manku and R. Motwani. "Approximate Frequency Counts over Data Streams". In Proceedings of VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Olston, J. Jiang, and J. Widom. "Adaptive Filters for Continuous Queries over Distributed Data Streams". In Proceedings of ACM SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Shrivastava, C. Buragohain, D. Agrawal, and S. Suri. "Medians and beyond: New aggregation techniques for sensor networks". In Proceedings of ACM SenSys, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
    June 2005
    990 pages
    ISBN:1595930604
    DOI:10.1145/1066157
    • Conference Chair:
    • Fatma Ozcan

    Copyright © 2005 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 June 2005

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate785of4,003submissions,20%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader