In this lecture many applications process high volumes of streaming data, among them Internet traffic analysis, financial tickers, and transaction log mining. In general, a data stream is an unbounded data set that is produced incrementally over time, rather than being available in full before its processing begins. In this lecture, we give an overview of recent research in stream processing, ranging from answering simple queries on high-speed streams to loading real-time data feeds into a streaming warehouse for off-line analysis. We will discuss two types of systems for end-to-end stream processing: Data Stream Management Systems (DSMSs) and Streaming Data Warehouses (SDWs). A traditional database management system typically processes a stream of ad-hoc queries over relatively static data. In contrast, a DSMS evaluates static (long-running) queries on streaming data, making a single pass over the data and using limited working memory. In the first part of this lecture, we will discuss research problems in DSMSs, such as continuous query languages, non-blocking query operators that continually react to new data, and continuous query optimization. The second part covers SDWs, which combine the real-time response of a DSMS by loading new data as soon as they arrive with a data warehouse's ability to manage Terabytes of historical data on secondary storage. Table of Contents: Introduction / Data Stream Management Systems / Streaming Data Warehouses / Conclusions
Cited By
- Gao L, Deng X and Yang W (2022). Smart city infrastructure protection: real-time threat detection employing online reservoir computing architecture, Neural Computing and Applications, 34:2, (833-842), Online publication date: 1-Jan-2022.
- Khamis M, Curtin R, Moseley B, Ngo H, Nguyen X, Olteanu D and Schleich M (2020). Functional Aggregate Queries with Additive Inequalities, ACM Transactions on Database Systems, 45:4, (1-41), Online publication date: 11-Dec-2020.
- Wingerath W, Gessert F and Ritter N (2020). InvaliDB, Proceedings of the VLDB Endowment, 13:12, (3032-3045), Online publication date: 1-Aug-2020.
- Abo Khamis M, Curtin R, Moseley B, Ngo H, Nguyen X, Olteanu D and Schleich M On Functional Aggregate Queries with Additive Inequalities Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, (414-431)
- Gorenflo C, Golab L and Keshav S Managing Sensor Data Streams Proceedings of the 29th International Conference on Scientific and Statistical Database Management, (1-11)
- Ge C, Kaufmann M, Golab L, Fischer P and Goel A Indexing bi-temporal windows Proceedings of the 27th International Conference on Scientific and Statistical Database Management, (1-12)
- Naumann F (2014). Data profiling revisited, ACM SIGMOD Record, 42:4, (40-49), Online publication date: 28-Feb-2014.
- Ge C and Golab L Lazy data structure maintenance for main-memory analytics over sliding windows Proceedings of the sixteenth international workshop on Data warehousing and OLAP, (33-38)
- Le-Phuoc D, Nguyen Mau Quoc H, Le Van C and Hauswirth M Elastic and Scalable Processing of Linked Stream Data in the Cloud Proceedings of the 12th International Semantic Web Conference - Part I, (280-297)
- Golab L and Johnson T Data stream warehousing Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, (949-952)
- Hoßbach B and Seeger B Anomaly management using complex event processing Proceedings of the 16th International Conference on Extending Database Technology, (149-154)
- Le-Phuoc D, Dao-Tran M, Pham M, Boncz P, Eiter T and Fink M Linked stream data processing engines Proceedings of the 11th international conference on The Semantic Web - Volume Part II, (300-312)
- Galić Z, Mešković E, Križanović K and Baranović M OCEANUS Proceedings of the 3rd ACM SIGSPATIAL International Workshop on GeoStreaming, (109-115)
- Bär A and Golab L Towards benchmarking stream data warehouses Proceedings of the fifteenth international workshop on Data warehousing and OLAP, (105-112)
- Gebser M, Grote T, Kaminski R, Obermeier P, Sabuncu O and Schaub T Stream reasoning with answer set programming Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, (613-617)
Index Terms
- Data Stream Management
Recommendations
Real-Time Scheduling for Data Stream Management Systems
ECRTS '05: Proceedings of the 17th Euromicro Conference on Real-Time SystemsQuality-aware management of data streams is gaining moreand more importance with the amount of data produced by streams growing continuously. The resources required for data stream processing depend on different factors and are limited by the ...