ABSTRACT
We present a method to compress relations close to their entropy while still allowing efficient queries. Column values are encoded into variable length codes to exploit skew in their frequencies. The codes in each tuple are concatenated and the resulting tuplecodes are sorted and delta-coded to exploit the lack of ordering in a relation. Correlation is exploited either by co-coding correlated columns, or by using a sort order that leverages the correlation. We prove that this method leads to near-optimal compression (within 4.3 bits/tuple of entropy), and in practice, we obtain up to a 40 fold compression ratio on vertical partitions tuned for TPC-H queries.We also describe initial investigations into efficient querying over compressed data. We present a novel Huffman coding scheme, called segregated coding, that allows range and equality predicates on compressed data, without accessing the full dictionary. We also exploit the delta coding to speed up scans, by reusing computations performed on nearly identical records. Initial results from a prototype suggest that with these optimizations, we can efficiently scan, tokenize and apply predicates on compressed relations.
- {1} Bala Iyer, David Wilhite. Data Compression Support in Data Bases. In VLDB 1994. Google ScholarDigital Library
- {2} G. Antoshenkov, D. Lomet, J. Murray. Order Preserving String Compression. In ICDE 1996. Google ScholarDigital Library
- {3} T. Cover, J. Thomas. Elements of Information Theory. John Wiley, 1991. Google ScholarDigital Library
- {4} Eric W. Weisstein et al. "Catalan Number." In MathWorld. http://mathworld.wolfram.com/CatalanNumber.html.Google Scholar
- {5} A. Ailamaki, D. J. DeWitt, M. D. Hill, M. Skounakis Weaving relations for cache performance. In VLDB 2001. Google ScholarDigital Library
- {6} M. Stonebraker et al. C-Store: A Column Oriented DBMS. In VLDB 2005. Google ScholarDigital Library
- {7} S. Babu et al. SPARTAN: A model-based semantic compression system for massive tables. SIGMOD 2001. Google ScholarDigital Library
- {8} J. Goldstein, R. Ramakrishnan, U. Shaft. Compressing Relations and Indexes. In ICDE 1998. Google ScholarDigital Library
- {9} G. V. Cormack and R. N. Horspool. Data Compression using Dynamic Markov Modelling, Computer Journal 1987. Google ScholarDigital Library
- {10} G. V. Cormack, Data Compression In a Database System, Comm. of the ACM 28(12), 1985. Google ScholarDigital Library
- {11} G. Copeland and S. Khoshafian. A decomposition storage model. In SIGMOD 1985. Google ScholarDigital Library
- {12} Sybase IQ. www.sybase.com/products/informationmanagementGoogle Scholar
- {13} D. Knuth. The Art of Computer Programming. Addison Wesley, 1998. Google ScholarDigital Library
- {14} S. Barnard and J. M. Child. Higher Algebra, Macmillan India Ltd., 1994.Google Scholar
- {15} T. C. Hu, A. C. Tucker, Optimal computer search trees and variable-length alphabetic cods, SIAM J. Appl. Math, 1971.Google Scholar
- {16} Huffman, D. A method for the construction of minimum redundancy codes. Proc. I.R.E. 40. 9. 1952.Google Scholar
- {17} Jim Gray. Commentary on 2005 Datamation benchmark. http://research.microsoft.com/barc/SortBenchmarkGoogle Scholar
- {18} A. Zandi et al. Sort Order Preserving Data Compresion for Extended Alphabets. Data Compression Conference 1993.Google Scholar
- {19} K. Bharat et al. The Connectivity server: Fast access to Linkage information on the web. In WWW 1998. Google ScholarDigital Library
- {20} G. Graefe and L. Shapiro. Data Compression and Database Performance. In Symp on Applied Computing, 1991.Google Scholar
- {21} M. Poess et al. Data compression in Oracle. In VLDB 2003. Google ScholarDigital Library
Index Terms
- How to wring a table dry: entropy compression of relations and querying of compressed relations
Recommendations
Constrained and recursive hierarchical table-lookup vector quantization
DCC '96: Proceedings of the Conference on Data CompressionThis paper presents techniques for the design of generic constrained and recursive vector quantizer encoders implemented by table-lookups. These vector quantizers include entropy-constrained VQ, tree structured VQ, classified VQ, product VQ, mean-...
Finite state hierarchical table-lookup vector quantization for images
ICASSP '96: Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 04This paper presents an algorithm for image compression using finite state hierarchical table-lookup vector quantization. Finite state vector quantizers are vector quantizers with memory. Finite state vector quantization (FSVQ) takes advantage of the ...
Code compression for VLIW embedded systems using a self-generating table
We propose a new class of methods for VLIW code compression using variable-sized branch blocks with self-generating tables. Code compression traditionally works on fixed-sized blocks with its efficiency limited by their small size. A branch block, a ...
Comments