ABSTRACT
Modern users demand analytical features on fresh, real time data. Offering these analytical features to hundreds of millions of users is a relevant problem encountered by many large scale web companies.
Relational databases and key-value stores can be scaled to provide point lookups for a large number of users but fall apart at the combination of high ingest rates, high query rates at low latency for analytical queries. Online analytical databases typically rely on bulk data loads and are not typically built to handle nonstop operation in demanding web environments. Offline analytical systems have high throughput but do not offer low query latencies nor can scale to serving tens of thousands of queries per second.
We present Pinot, a single system used in production at Linkedin that can serve tens of thousands of analytical queries per second, offers near-realtime data ingestion from streaming data sources, and handles the operational requirements of large web properties. We also provide a performance comparison with Druid, a system similar to Pinot.
- Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 671--682. Google ScholarDigital Library
- Kevin Beyer and Raghu Ramakrishnan. 1999. Bottom-up computation of sparse and iceberg cube. In ACM SIGMOD Record, Vol. Vol. 28. ACM, 359--370. Google ScholarDigital Library
- MKABV Bittorf, Taras Bobrovytsky, Casey Ching Alan Choi Justin Erickson, Martin Grund Daniel Hecht, Matthew Jacobs Ishaan Joshi Lenni Kuff, Dileep Kumar Alex Leblang, Nong Li Ippokratis Pandis Henry Robinson, David Rorke Silvius Rus, John Russell Dimitris Tsirogiannis Skye Wanderman, and Milne Michael Yoder. 2015. Impala: A modern, open-source SQL engine for Hadoop Proceedings of the 7th Biennial Conference on Innovative Data Systems Research.Google Scholar
- Peter A Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. CIDR, Vol. Vol. 5. 225--237.Google Scholar
- Samy Chambi, Daniel Lemire, Robert Godin, Kamel Boukhalfa, Charles R Allen, and Fangjin Yang. 2016 a. Optimizing druid with roaring bitmaps. In Proceedings of the 20th International Database Engineering & Applications Symposium. ACM, 77--86. Google ScholarDigital Library
- Samy Chambi, Daniel Lemire, Owen Kaser, and Robert Godin. 2016 b. Better bitmap performance with roaring bitmaps. Software: practice and experience Vol. 46, 5 (2016), 709--719. Google ScholarDigital Library
- C. Chen. 2005. Top 10 unsolved information visualization problems. Computer Graphics and Applications, IEEE Vol. 25, 4 (july-aug.. 2005), 12--16. Google ScholarDigital Library
Index Terms
- Pinot: Realtime OLAP for 530 Million Users
Recommendations
Real-time analytical processing with SQL server
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, HawaiiOver the last two releases SQL Server has integrated two specialized engines into the core system: the Apollo column store engine for analytical workloads and the Hekaton in-memory engine for high-performance OLTP workloads. There is an increasing ...
Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs
In recent years, a horizontal table with a large number of attributes is widely used in OLAP or e-business applications to analyze multidimensional data efficiently. For efficient storing and querying of horizontal tables, recent works have tried to ...
OLAP and NoSQL: Happily Ever After
Advances in Databases and Information SystemsAbstractNoSQL databases are preferred to relational ones for storing heterogeneous data with variable schema and structure. However, their schemaless nature adds complexity to analytical applications, in which a single OLAP analysis often involves large ...
Comments