skip to main content
research-article

Accelerating generalized linear models with MLWeaving: a one-size-fits-all system for any-precision learning

Published:01 March 2019Publication History
Skip Abstract Section

Abstract

Learning from the data stored in a database is an important function increasingly available in relational engines. Methods using lower precision input data are of special interest given their overall higher efficiency. However, in databases, these methods have a hidden cost: the quantization of the real value into a smaller number is an expensive step. To address this issue, we present ML-Weaving, a data structure and hardware acceleration technique intended to speed up learning of generalized linear models over low precision data. MLWeaving provides a compact in-memory representation that enables the retrieval of data at any level of precision. MLWeaving also provides a highly efficient implementation of stochastic gradient descent on FPGAs and enables the dynamic tuning of precision, instead of using a fixed precision level during learning. Experimental results show that MLWeaving converges up to 16 x faster than low-precision implementations of first-order methods on CPUs.

References

  1. S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and R. Das. Compute Caches. In HPCA, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  2. V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, H. Esmaeilzadeh, and R. Gupta. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks. In ISCA, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Albericio, A. Delmás, P. Judd, S. Sharify, G. O'Leary, R. Genov, and A. Moshovos. Bit-pragmatic Deep Neural Network Computing. In MICRO, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. Cnvlutin: Ineffectual-neuron-free Deep Neural Network Computing. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Boutros, S. Yazdanshenas, and V. Betz. Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs. In FPL, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Cadambi, I. Durdanovic, V. Jakkula, M. Sankaradass, E. Cosatto, S. Chakradhar, and H. P. Graf. A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines. In FCCM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Cai, A. Ren, N. Liu, C. Ding, L. Wang, X. Qian, M. Pedram, and Y. Wang. VIBNN: Hardware Acceleration of Bayesian Neural Networks. In ASPLOS, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. A Cloud-Scale Acceleration Architecture. In MICRO, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Chang and C. Lin. LIBSVM: A Library for Support Vector Machines. TIST, 2(3):27:1--27:27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Chen, J. Emer, and V. Sze. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Chen, T. Krishna, J. S. Emer, and V. Sze. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. JSSC, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  12. Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. DaDianNao: A Machine-Learning Supercomputer. In MICRO, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Chen, J. Gehrke, and F. Korn. Query Optimization in Compressed Database Systems. In SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. R. Chiu, A. C. Ling, D. Capalija, A. Bitar, and M. S. Abdelfattah. Flexibility: FPGAs and CAD in Deep Learning Acceleration. In ISPD, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, M. Abeydeera, L. Adams, H. Angepat, C. Boehn, D. Chiou, O. Firestein, A. Forin, K. S. Gatlin, M. Ghandi, S. Heil, K. Holohan, A. E. Husseini, T. Juhasz, K. Kagi, R. Kovvuri, S. Lanka, F. van Megen, D. Mukhortov, P. Patel, B. Perez, A. Rapsang, S. Reinhardt, B. Rouhani, A. Sapek, R. Seera, S. Shekar, B. Sridharan, G. Weisz, L. Woods, P. Y. Xiao, D. Zhang, R. Zhao, and D. Burger. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro, 38(2):8--20, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Courbariaux, Y. Bengio, and J. David. Low Precision Arithmetic For Deep Learning. CoRR, abs/1412.7024, 2014.Google ScholarGoogle Scholar
  17. C. De Sa, M. Feldman, C. Ré, and K. Olukotun. Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent. In ISCA, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. De Sa, M. Leszczynski, J. Zhang, A. Marzoev, C. R. Aberger, K. Olukotun, and C. Ré. High-Accuracy Low-Precision Training. ArXiv, 2018.Google ScholarGoogle Scholar
  19. A. Delmas, S. Sharify, P. Judd, and A. Moshovos. Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability. CoRR, abs/1707.09068, 2017.Google ScholarGoogle Scholar
  20. Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam. ShiDianNao: Shifting Vision Processing Closer to the Sensor. In ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaauw, and R. Das. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks. In ISCA, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Elgohary, M. Boehm, P. J. Haas, F. R. Reiss, and B. Reinwald. Compressed Linear Algebra for Large-scale Machine Learning. PVLDB, 9(12):960--971, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Espasa, F. Ardanaz, J. Emer, S. Felix, J. Gago, R. Gramunt, I. Hernandez, T. Juan, G. Lowney, M. Mattina, and A. Seznec. Tarantula: a vector extension to the alpha architecture. In ISCA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Farber, N. May, W. Lehner, I. Muller, H. Rauhe, J. Dees, and S. Ag. The SAP HANA Database: An Architecture Overview. In IEEE Data Eng. Bull., 2012.Google ScholarGoogle Scholar
  25. Z. Feng, E. Lo, B. Kao, and W. Xu. ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout. In SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, P. Patel, A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, A. M. Caulfield, E. S. Chung, and D. Burger. A Configurable Cloud-Scale DNN Processor for Real-Time AI. In ISCA, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: A Main Memory Hybrid Storage Engine. PVLDB, 4(2):105--116, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Gupta. Accelerating Datacenter Workloads. FPL, 2016.Google ScholarGoogle Scholar
  29. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. Deep Learning with Limited Numerical Precision. In ICML, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Han, H. Mao, and W. J. Dally. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In ICLR, 2015.Google ScholarGoogle Scholar
  32. J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, and A. Kumar. The MADlib Analytics Library: Or MAD Skills, the SQL. PVLDB, 5(12):1700--1711, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. Hillis. The Connection Machine. MIT Press, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. L. Holloway, V. Raman, G. Swart, and D. J. DeWitt. How to Barter Bits for Chronons: Compression and Bandwidth Trade Offs for Database Scans. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, and M. Kersten. MonetDB: Two Decades of Research in Column-oriented Database. IEEE Data Engineering Bulletin, 2012.Google ScholarGoogle Scholar
  36. Z. Istvan, D. Sidler, and G. Alonso. Runtime Parameterizable Regular Expression Operators for Databases. In FCCM, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  37. Z. Istvan, L. Woods, and G. Alonso. Histograms As a Side Effect of Data Movement for Big Data. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Jain, A. Phanishayee, J. Mars, L. Tang, and G. Pekhimenko. Gist: Efficient Data Encoding for Deep Neural Network Training. In ISCA, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Ji, Y. Zhang, W. Chen, and Y. Xie. Bridge the Gap Between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler. In ASPLOS, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Johnson and I. Pandis. The Bionic DBMS is Coming, but What Will It Look Like? In CIDR, 2013.Google ScholarGoogle Scholar
  41. N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. In-Datacenter Performance Analysis of a Tensor Processing Unit. In ISCA, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, N. E. Jerger, and A. Moshovos. Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks. In ICS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. Stripes: Bit-serial deep neural network computing. In MICRO, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An Efficient k-Means Clustering Algorithm: Analysis and Implementation. TPAMI, 24:881--892, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. K. Kara, D. Alistarh, G. Alonso, O. Mutlu, and C. Zhang. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off. In FCCM, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  46. K. Kara, K. Eguro, C. Zhang, and G. Alonso. ColumnML: Column-Store Machine Learning with On-the-Fly Data Transformation. PVLDB, 12(4):348--361, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. K. Kara, J. Giceva, and G. Alonso. Fpga-Based Data Partitioning. In SIGMOD, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. TPDS, 24(10):1930--1940, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Kim, M. Sullivan, E. Choukse, and M. Erez. Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. U. Köster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable, O. Elibol, S. Gray, S. Hall, L. Hornof, A. Khosrowshahi, C. Kloss, R. J. Pai, and N. Rao. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In NIPS. 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. C. Kozyrakis and D. Patterson. Vector vs. Superscalar and VLIW architectures For Embedded Multimedia Benchmarks. In MICRO, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. E. Kozyrakis and D. A. Patterson. Scalable, Vector Processors For Embedded Systems. IEEE Micro, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. H. Kwon, A. Samajdar, and T. Krishna. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. In ASPLOS, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. L. Lamport. Multiple Byte Processing with Full-word Instructions. Commun. ACM, 18(8):471--475, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. D. Lee and S. Sebastian. Algorithms for Non-negative Matrix Factorization. In NIPS. 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. M. Li, T. Zhang, Y. Chen, and A. J. Smola. Efficient Mini-batch Training for Stochastic Optimization. In SIGKDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In DAC, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Y. Li and J. M. Patel. BitWeaving: Fast Scans for Main Memory Data Processing. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. PVLDB, 7(10):907--918, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Z. Li, C. Ding, S. Wang, W. Wen, Y. Zhuo, C. Liu, Q. Qiu, W. Xu, X. Lin, X. Qian, and Y. Wang. E-RNN: Design Optimization for Efficient Recurrent Neural Networksin FPGAs. In HPCA, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  62. D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen. PuDianNao: A Polyvalent Machine Learning Accelerator. In ASPLOS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Y. Liu, H. Zhang, L. Zeng, W. Wu, and C. Zhang. MLbench: Benchmarking Machine Learning Services Against Human Experts. PVLDB, 11(10):1220--1232, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. D. Mahajan, J. K. Kim, J. Sacks, A. Ardalan, A. Kumar, and H. Esmaeilzadeh. In-RDBMS Hardware Acceleration of Advanced Analytics. PVLDB, 11(11):1317--1331, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh. TABLA: A Unified Template-based Framework For Accelerating Statistical Machine Learning. In HPCA, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  66. B. Moons and M. Verhelst. A 0.3 ×2013;2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets. In VLSI-Circuits, 2016.Google ScholarGoogle Scholar
  67. T. Moreau, M. Wyse, J. Nelson, A. Sampson, H. Esmaeilzadeh, L. Ceze, and M. Oskin. SNNAP: Approximate Computing on Programmable SoCs via Neural Acceleration. In HPCA, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  68. R. Mueller, J. Teubner, and G. Alonso. Data Processing on FPGAs. PVLDB, 2(1):910--921, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. R. Mueller, J. Teubner, and G. Alonso. Streams on Wires: A Query Compiler for FPGAs. PVLDB, 2(1):229--240, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. R. Mueller, J. Teubner, and G. Alonso. Glacier: A Query-to-hardware Compiler. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. F. Niu, B. Recht, C. Re, and S. Wright. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In NIPS. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. P. O'Neil and D. Quass. Improved Query Performance with Variant Indexes. In SIGMOD, pages 38--49, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. M. Owaida, D. Sidler, K. Kara, and G. Alonso. Centaur: A Framework for Hybrid CPU-FPGA Databases. In FCCM, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  74. G. Pekhimnko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly Compressed Pages: A Low-Complexity, Low-Latency Main Memory Compression Framework. In MICRO, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. O. Polychroniou and K. A. Ross. Efficient Lightweight Compression Alongside Fast Scans. In DaMoN, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G.-Y. Wei, and D. Brooks. Minerva: Enabling Low-power, Highly-accurate Deep Neural Network Accelerators. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. R. Rifkin and A. Klautau. In Defense of One-Vs-All Classification. In JMLR. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. D. Rinfret, P. O'Neil, and E. O'Neil. Bit-Sliced Index Arithmetic. In SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. R. M. Russell. The CRAY-1 Computer System. Commun. ACM, 21(1):63--72, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. B. Schlegel, R. Gemulla, and W. Lehner. Fast Integer Compression Using SIMD Instructions. In DaMoN, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. V. Seshadri, K. Hsieh, A. Boroum, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Fast Bulk Bitwise AND and OR in DRAM. IEEE CAL, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Ambit: In-memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In MICRO, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. From High-Level Deep Neural Models to FPGAs. In MICRO, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, J. Kim, V. Chandra, and H. Esmaeilzadeh. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks. In ISCA, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. D. Sidler, Z. István, M. Owaida, and G. Alonso. Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures. In SIGMOD, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. D. Sidler, Z. István, M. Owaida, K. Kara, and G. Alonso. doppioDB: A Hardware Accelerated Database. In SIGMOD, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. A. Sinha and A. P. Chandrakasan. Energy Efficient Filtering Using Adaptive Precision and Variable Voltage. In IEEE International ASIC/SOC Conference, 1999.Google ScholarGoogle Scholar
  89. V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proceedings of the IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  90. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception Architecture For Computer Vision. In CVPR. 2016.Google ScholarGoogle Scholar
  91. J. Teubner and L. Woods. Data Processing on FPGAs Synthesis Lectures on Data Management. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. Finn: A framework For Fast, Scalable Binarized Neural Network Inference. In FPGA, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Y. Umuroglu, L. Rasnayake, and M. Sjlander. BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing. In FPL, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  94. J. Wang and G. Joshi. Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD. CoRR, abs/1810.08313, 2018.Google ScholarGoogle Scholar
  95. Z. Wang, B. He, and W. Zhang. A Study of Data Partitioning on OpenCL-based FPGAs. In FPL, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  96. Z. Wang, K. Kara, H. Zhang, G. Alonso, O. Mutlu, and C. Zhang. Accelerating Generalized Linear Models with MLWeaving:A One-Size-Fits-All System for Any-Precision Learning (Technical Report). CoRR, 2019.Google ScholarGoogle Scholar
  97. Z. Wang, J. Paul, H. Y. Cheah, B. He, and W. Zhang. Relational Query Processing on OpenCL-based FPGAs. In FPL, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  98. Z. Wang, J. Paul, B. He, and W. Zhang. Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs. TVLSI, 25(6):1906--1918, 2017.Google ScholarGoogle Scholar
  99. Z. Wang, K. Zhang, H. Zhou, X. Liu, and B. He. Hebe: An Order-Oblivious and High-Performance Execution Scheme for Conjunctive Predicates. In ICDE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  100. K. Q. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. J. Smola. Feature Hashing for Large Scale Multitask Learning. CoRR, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The Implementation and Performance of Compressed Databases. SIGMOD Rec., 29(3):55--67, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. S. White. Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review. IEEE ASSP Magazine, 6(3):4--19, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  103. T. Willhalm, R. Dementiev, and P. Fay. Intel Performance Counter Monitor - A better way to measure CPU utilization, https://software.intel.com/en-us/articles/intel-performance-counter-monitor, 2016.Google ScholarGoogle Scholar
  104. L. Woods, Z. István, and G. Alonso. Ibex: An Intelligent Storage Engine with Support for Advanced SQL Offloading. VLDB, 7(11):963--974, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. T. Xanthopoulos and A. Chandrakasan. A Low-power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization. In Symposium on VLSI Circuits, 1999.Google ScholarGoogle Scholar
  106. J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In ISCA, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. C. Zhang and C. Ré. DimmWitted: A Study of Main-memory Statistical Analytics. PVLDB, 7(12):1283--1294, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. H. Zhang, J. Li, K. Kara, D. Alistarh, J. Liu, and C. Zhang. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning. In ICML, volume 70, pages 4035--4043, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized Stochastic Gradient Descent. In NIPS. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 12, Issue 7
    March 2019
    112 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 March 2019
    Published in pvldb Volume 12, Issue 7

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader