ABSTRACT
Approximate Membership Query (AMQ) data structures, such as the Bloom filter, quotient filter, and cuckoo filter, have found numerous applications in databases, storage systems, networks, computational biology, and other domains. However, many applications must work around limitations in the capabilities or performance of current AMQs, making these applications more complex and less performant. For example, many current AMQs cannot delete or count the number of occurrences of each input item, take up large amounts of space, are slow, cannot be resized or merged, or have poor locality of reference and hence perform poorly when stored on SSD or disk. This paper proposes a new general-purpose AMQ, the counting quotient filter (CQF). The CQF supports approximate membership testing and counting the occurrences of items in a data set. This general-purpose AMQ is small and fast, has good locality of reference, scales out of RAM to SSD, and supports deletions, counting (even on skewed data sets), resizing, merging, and highly concurrent access. The paper reports on the structure's performance on both manufactured and application-generated data sets.
In our experiments, the CQF performs in-memory inserts and queries up to an order-of magnitude faster than the original quotient filter, several times faster than a Bloom filter, and similarly to the cuckoo filter, even though none of these other data structures support counting. On SSD, the CQF outperforms all structures by a factor of at least 2 because the CQF has good data locality.
The CQF achieves these performance gains by restructuring the metadata bits of the quotient filter to obtain fast lookups at high load factors (i.e., even when the data structure is almost full). As a result, the CQF offers good lookup performance even up to a load factor of 95%. Counting is essentially free in the CQF in the sense that the structure is comparable or more space efficient even than non-counting data structures (e.g., Bloom, quotient, and cuckoo filters).
The paper also shows how to speed up CQF operations by using new x86 bit-manipulation instructions introduced in Intel's Haswell line of processors. The restructured metadata transforms many quotient filter metadata operations into rank-and-select bit-vector operations. Thus, our efficient implementations of rank and select may be useful for other rank-and-select-based data structures.
- F. vesca genome read dataset. ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA020/SRA020125/SRX030576/SRR072006.fastq.bz2. {Online; accessed 19-February-2016}.Google Scholar
- P. K. Agarwal, G. Cormode, Z. Huang, J. M. Phillips, Z. Wei, and K. Yi. Mergeable summaries. ACM Transactions on Database Systems (TODS), 38(4):26, 2013. Google ScholarDigital Library
- P. S. Almeida, C. Baquero, N. Preguiça, and D. Hutchison. Scalable Bloom filters. Journal of Information Processing Letters, 101(6):255--261, 2007. Google ScholarDigital Library
- S. Alsubaiee, A. Behm, V. Borkar, Z. Heilbron, Y.-S. Kim, M. J. Carey, M. Dreseler, and C. Li. Storage management in AsterixDB. Proceedings of the VLDB Endowment, 7(10):841--852, 2014. Google ScholarDigital Library
- M. A. Bender, M. Farach-Colton, R. Johnson, R. Kaner, B. C. Kuszmaul, D. Medjedovic, P. Montes, P. Shetty, R. P. Spillane, and E. Zadok. Don't thrash: How to cache your hash on flash. Proceedings of the VLDB Endowment, 5(11), 2012. Google ScholarDigital Library
- M. A. Bender, M. Farach-Colton, and M. A. Mosteiro. Insertion sort is O(n log n). Theory of Computing Systems, 39(3):391--397, 2006. Special Issue on FUN '04. Google ScholarDigital Library
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970. Google ScholarDigital Library
- F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An improved construction for counting Bloom filters. In European Symposium on Algorithms (ESA), pages 684--695. Springer, 2006. Google ScholarDigital Library
- A. Broder and M. Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2004.Google ScholarCross Ref
- M. Canim, G. A. Mihaila, B. Bhattacharjee, C. A. Lang, and K. A. Ross. Buffered Bloom filters on solid state storage. In Proceedings of the International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS), pages 1--8, 2010.Google Scholar
- S. Cohen and Y. Matias. Spectral Bloom filters. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 241--252, 2003. Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005. Google ScholarDigital Library
- B. Corominas-Murtra and R. V. Solé. Universality of Zipf's law. Physical Review E, 82(1):011102, 2010.Google ScholarCross Ref
- B. Debnath, S. Sengupta, J. Li, D. J. Lilja, and D. H. Du. BloomFlash: Bloom filter on flash-based storage. In Proceedings of the 31st International Conference on Distributed Computing Systems (ICDCS), pages 635--644, 2011. Google ScholarDigital Library
- B. K. Debnath, S. Sengupta, and J. Li. Chunkstash: Speeding up inline storage deduplication using flash memory. In Proceedings of the USENIX Annual Technical Conference (ATC), 2010. Google ScholarDigital Library
- B. Fan. Cuckoo filter source code in C++. https://github.com/efficient/cuckoofilter, 2014. {Online; accessed 19-July-2014}.Google Scholar
- B. Fan, D. G. Andersen, M. Kaminsky, and M. D. Mitzenmacher. Cuckoo filter: Practically better than Bloom. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, pages 75--88, 2014. Google ScholarDigital Library
- L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking (TON), 8(3):281--293, 2000. Google ScholarDigital Library
- A. Geil. Quotient filters: Approximate membership queries on the GPU. http://on-demand.gputechconf.com/gtc/2016/presentation/s6464-afton-geil-quoetient-filters.pdf, 2016.Google Scholar
- E. Georganas, A. Buluç, J. Chapman, L. Oliker, D. Rokhsar, and K. Yelick. Parallel de Bruijn graph construction and traversal for de novo genome assembly. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 437--448, 2014. Google ScholarDigital Library
- R. González, S. Grabowski, V. Mäkinen, and G. Navarro. Practical implementation of rank and select queries. In Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA), pages 27--38, 2005.Google Scholar
- S. P. Karl Anderson. Firehose. http://firehose.sandia.gov/, 2013. {Online; accessed 19-Dec-2015}.Google Scholar
- G. Lu, B. Debnath, and D. H. Du. A forest-structured Bloom filter with flash memory. In Proceedings of the 27th Symposium on Mass Storage Systems and Technologies (MSST), pages 1--6, 2011. Google ScholarDigital Library
- P. Melsted and J. K. Pritchard. Efficient counting of k-mers in DNA sequences using a Bloom filter. BMC Bioinformatics, 12(1):1, 2011.Google ScholarCross Ref
- P. O'Neil, E. Cheng, D. Gawlic, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996. Google ScholarDigital Library
- A. Partow. C++Bloom filter library. https://code.google.com/p/bloom/. {Online; accessed 19-July-2014}.Google Scholar
- F. Putze, P. Sanders, and J. Singler. Cache-, hash-and space-efficient bloom filters. In International Workshop on Experimental and Efficient Algorithms, pages 108--121, 2007. Google ScholarDigital Library
- Y. Qiao, T. Li, and S. Chen. Fast Bloom filters and their generalization. IEEE Transactions on Parallel and Distributed Systems (TPDS), 25(1):93--103, 2014. Google ScholarDigital Library
- R. S. Roy, D. Bhattacharya, and A. Schliep. Turtle: Identifying frequent k-mers with cache-efficient algorithms. Bioinformatics, 30:1950--1957, 2014.Google ScholarCross Ref
- S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz. Theory and practice of Bloom filters for distributed systems. IEEE Communications Surveys & Tutorials, 14(1):131--155, 2012.Google ScholarCross Ref
- M. Vallentin. Counting Bloom filter source code in C++. https://github.com/mavam/libbf, 2014. {Online; accessed 19-July-2015}.Google Scholar
- P. Wang, G. Sun, S. Jiang, J. Ouyang, S. Lin, C. Zhang, and J. Cong. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In Proceedings of the 9th European Conference on Computer Systems (EuroSys), pages 16:1--16:14, 2014. Google ScholarDigital Library
- Q. Zhang, J. Pell, R. Canino-Koning, A. C. Howe, and C. T. Brown. These are not the k-mers you are looking for: Efficient online k-mer counting using a probabilistic data structure. PLoS One, 9(7):e101271, 2014.Google ScholarCross Ref
- B. Zhu, K. Li, and R. H. Patterson. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), pages 1--14, 2008. Google ScholarDigital Library
Index Terms
- A General-Purpose Counting Filter: Making Every Bit Count
Recommendations
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Computing prestack Kirchhoff time migration on general purpose GPU
This paper introduces how to optimize a practical prestack Kirchhoff time migration program by the Compute Unified Device Architecture (CUDA) on a general purpose GPU (GPGPU). A few useful optimization methods on GPGPU are demonstrated, such as how to ...
Comments