Abstract
Learning from the data stored in a database is an important function increasingly available in relational engines. Methods using lower precision input data are of special interest given their overall higher efficiency. However, in databases, these methods have a hidden cost: the quantization of the real value into a smaller number is an expensive step. To address this issue, we present ML-Weaving, a data structure and hardware acceleration technique intended to speed up learning of generalized linear models over low precision data. MLWeaving provides a compact in-memory representation that enables the retrieval of data at any level of precision. MLWeaving also provides a highly efficient implementation of stochastic gradient descent on FPGAs and enables the dynamic tuning of precision, instead of using a fixed precision level during learning. Experimental results show that MLWeaving converges up to 16 x faster than low-precision implementations of first-order methods on CPUs.
- S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and R. Das. Compute Caches. In HPCA, 2017.Google ScholarCross Ref
- V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, H. Esmaeilzadeh, and R. Gupta. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks. In ISCA, 2018. Google ScholarDigital Library
- J. Albericio, A. Delmás, P. Judd, S. Sharify, G. O'Leary, R. Genov, and A. Moshovos. Bit-pragmatic Deep Neural Network Computing. In MICRO, 2017. Google ScholarDigital Library
- J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. Cnvlutin: Ineffectual-neuron-free Deep Neural Network Computing. In ISCA, 2016. Google ScholarDigital Library
- A. Boutros, S. Yazdanshenas, and V. Betz. Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs. In FPL, 2018.Google ScholarCross Ref
- S. Cadambi, I. Durdanovic, V. Jakkula, M. Sankaradass, E. Cosatto, S. Chakradhar, and H. P. Graf. A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines. In FCCM, 2009. Google ScholarDigital Library
- R. Cai, A. Ren, N. Liu, C. Ding, L. Wang, X. Qian, M. Pedram, and Y. Wang. VIBNN: Hardware Acceleration of Bayesian Neural Networks. In ASPLOS, 2018. Google ScholarDigital Library
- A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. A Cloud-Scale Acceleration Architecture. In MICRO, 2016. Google ScholarDigital Library
- C. Chang and C. Lin. LIBSVM: A Library for Support Vector Machines. TIST, 2(3):27:1--27:27, 2011. Google ScholarDigital Library
- Y. Chen, J. Emer, and V. Sze. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ISCA, 2016. Google ScholarDigital Library
- Y. Chen, T. Krishna, J. S. Emer, and V. Sze. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. JSSC, 2017.Google ScholarCross Ref
- Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. DaDianNao: A Machine-Learning Supercomputer. In MICRO, 2014. Google ScholarDigital Library
- Z. Chen, J. Gehrke, and F. Korn. Query Optimization in Compressed Database Systems. In SIGMOD, 2001. Google ScholarDigital Library
- G. R. Chiu, A. C. Ling, D. Capalija, A. Bitar, and M. S. Abdelfattah. Flexibility: FPGAs and CAD in Deep Learning Acceleration. In ISPD, 2018. Google ScholarDigital Library
- E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, M. Abeydeera, L. Adams, H. Angepat, C. Boehn, D. Chiou, O. Firestein, A. Forin, K. S. Gatlin, M. Ghandi, S. Heil, K. Holohan, A. E. Husseini, T. Juhasz, K. Kagi, R. Kovvuri, S. Lanka, F. van Megen, D. Mukhortov, P. Patel, B. Perez, A. Rapsang, S. Reinhardt, B. Rouhani, A. Sapek, R. Seera, S. Shekar, B. Sridharan, G. Weisz, L. Woods, P. Y. Xiao, D. Zhang, R. Zhao, and D. Burger. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro, 38(2):8--20, 2018.Google ScholarCross Ref
- M. Courbariaux, Y. Bengio, and J. David. Low Precision Arithmetic For Deep Learning. CoRR, abs/1412.7024, 2014.Google Scholar
- C. De Sa, M. Feldman, C. Ré, and K. Olukotun. Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent. In ISCA, 2017. Google ScholarDigital Library
- C. De Sa, M. Leszczynski, J. Zhang, A. Marzoev, C. R. Aberger, K. Olukotun, and C. Ré. High-Accuracy Low-Precision Training. ArXiv, 2018.Google Scholar
- A. Delmas, S. Sharify, P. Judd, and A. Moshovos. Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability. CoRR, abs/1707.09068, 2017.Google Scholar
- Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam. ShiDianNao: Shifting Vision Processing Closer to the Sensor. In ISCA, 2015. Google ScholarDigital Library
- C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaauw, and R. Das. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks. In ISCA, 2018. Google ScholarDigital Library
- A. Elgohary, M. Boehm, P. J. Haas, F. R. Reiss, and B. Reinwald. Compressed Linear Algebra for Large-scale Machine Learning. PVLDB, 9(12):960--971, 2016. Google ScholarDigital Library
- R. Espasa, F. Ardanaz, J. Emer, S. Felix, J. Gago, R. Gramunt, I. Hernandez, T. Juan, G. Lowney, M. Mattina, and A. Seznec. Tarantula: a vector extension to the alpha architecture. In ISCA, 2002. Google ScholarDigital Library
- F. Farber, N. May, W. Lehner, I. Muller, H. Rauhe, J. Dees, and S. Ag. The SAP HANA Database: An Architecture Overview. In IEEE Data Eng. Bull., 2012.Google Scholar
- Z. Feng, E. Lo, B. Kao, and W. Xu. ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout. In SIGMOD, 2015. Google ScholarDigital Library
- J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, P. Patel, A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, A. M. Caulfield, E. S. Chung, and D. Burger. A Configurable Cloud-Scale DNN Processor for Real-Time AI. In ISCA, 2018. Google ScholarDigital Library
- M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: A Main Memory Hybrid Storage Engine. PVLDB, 4(2):105--116, 2010. Google ScholarDigital Library
- P. Gupta. Accelerating Datacenter Workloads. FPL, 2016.Google Scholar
- S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. Deep Learning with Limited Numerical Precision. In ICML, 2015. Google ScholarDigital Library
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ISCA, 2016. Google ScholarDigital Library
- S. Han, H. Mao, and W. J. Dally. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In ICLR, 2015.Google Scholar
- J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, and A. Kumar. The MADlib Analytics Library: Or MAD Skills, the SQL. PVLDB, 5(12):1700--1711, 2012. Google ScholarDigital Library
- W. Hillis. The Connection Machine. MIT Press, 1986. Google ScholarDigital Library
- A. L. Holloway, V. Raman, G. Swart, and D. J. DeWitt. How to Barter Bits for Chronons: Compression and Bandwidth Trade Offs for Database Scans. In SIGMOD, 2007. Google ScholarDigital Library
- S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, and M. Kersten. MonetDB: Two Decades of Research in Column-oriented Database. IEEE Data Engineering Bulletin, 2012.Google Scholar
- Z. Istvan, D. Sidler, and G. Alonso. Runtime Parameterizable Regular Expression Operators for Databases. In FCCM, 2016.Google ScholarCross Ref
- Z. Istvan, L. Woods, and G. Alonso. Histograms As a Side Effect of Data Movement for Big Data. In SIGMOD, 2014. Google ScholarDigital Library
- A. Jain, A. Phanishayee, J. Mars, L. Tang, and G. Pekhimenko. Gist: Efficient Data Encoding for Deep Neural Network Training. In ISCA, 2018. Google ScholarDigital Library
- Y. Ji, Y. Zhang, W. Chen, and Y. Xie. Bridge the Gap Between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler. In ASPLOS, 2018. Google ScholarDigital Library
- R. Johnson and I. Pandis. The Bionic DBMS is Coming, but What Will It Look Like? In CIDR, 2013.Google Scholar
- N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. In-Datacenter Performance Analysis of a Tensor Processing Unit. In ISCA, 2017. Google ScholarDigital Library
- P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, N. E. Jerger, and A. Moshovos. Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks. In ICS, 2016. Google ScholarDigital Library
- P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. Stripes: Bit-serial deep neural network computing. In MICRO, 2016. Google ScholarDigital Library
- T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An Efficient k-Means Clustering Algorithm: Analysis and Implementation. TPAMI, 24:881--892, 2002. Google ScholarDigital Library
- K. Kara, D. Alistarh, G. Alonso, O. Mutlu, and C. Zhang. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off. In FCCM, 2017.Google ScholarCross Ref
- K. Kara, K. Eguro, C. Zhang, and G. Alonso. ColumnML: Column-Store Machine Learning with On-the-Fly Data Transformation. PVLDB, 12(4):348--361, 2018. Google ScholarDigital Library
- K. Kara, J. Giceva, and G. Alonso. Fpga-Based Data Partitioning. In SIGMOD, 2017. Google ScholarDigital Library
- V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. TPDS, 24(10):1930--1940, 2013. Google ScholarDigital Library
- J. Kim, M. Sullivan, E. Choukse, and M. Erez. Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures. In ISCA, 2016. Google ScholarDigital Library
- U. Köster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable, O. Elibol, S. Gray, S. Hall, L. Hornof, A. Khosrowshahi, C. Kloss, R. J. Pai, and N. Rao. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In NIPS. 2017. Google ScholarDigital Library
- C. Kozyrakis and D. Patterson. Vector vs. Superscalar and VLIW architectures For Embedded Multimedia Benchmarks. In MICRO, 2002. Google ScholarDigital Library
- C. E. Kozyrakis and D. A. Patterson. Scalable, Vector Processors For Embedded Systems. IEEE Micro, 2003. Google ScholarDigital Library
- A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS. 2012. Google ScholarDigital Library
- H. Kwon, A. Samajdar, and T. Krishna. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. In ASPLOS, 2018. Google ScholarDigital Library
- L. Lamport. Multiple Byte Processing with Full-word Instructions. Commun. ACM, 18(8):471--475, 1975. Google ScholarDigital Library
- D. Lee and S. Sebastian. Algorithms for Non-negative Matrix Factorization. In NIPS. 2001.Google ScholarDigital Library
- M. Li, T. Zhang, Y. Chen, and A. J. Smola. Efficient Mini-batch Training for Stochastic Optimization. In SIGKDD, 2014. Google ScholarDigital Library
- S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In DAC, 2016. Google ScholarDigital Library
- Y. Li and J. M. Patel. BitWeaving: Fast Scans for Main Memory Data Processing. In SIGMOD, 2013. Google ScholarDigital Library
- Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. PVLDB, 7(10):907--918, 2014. Google ScholarDigital Library
- Z. Li, C. Ding, S. Wang, W. Wen, Y. Zhuo, C. Liu, Q. Qiu, W. Xu, X. Lin, X. Qian, and Y. Wang. E-RNN: Design Optimization for Efficient Recurrent Neural Networksin FPGAs. In HPCA, 2019.Google ScholarCross Ref
- D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen. PuDianNao: A Polyvalent Machine Learning Accelerator. In ASPLOS, 2015. Google ScholarDigital Library
- Y. Liu, H. Zhang, L. Zeng, W. Wu, and C. Zhang. MLbench: Benchmarking Machine Learning Services Against Human Experts. PVLDB, 11(10):1220--1232, 2018. Google ScholarDigital Library
- D. Mahajan, J. K. Kim, J. Sacks, A. Ardalan, A. Kumar, and H. Esmaeilzadeh. In-RDBMS Hardware Acceleration of Advanced Analytics. PVLDB, 11(11):1317--1331, 2018. Google ScholarDigital Library
- D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh. TABLA: A Unified Template-based Framework For Accelerating Statistical Machine Learning. In HPCA, 2016.Google ScholarCross Ref
- B. Moons and M. Verhelst. A 0.3 ×2013;2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets. In VLSI-Circuits, 2016.Google Scholar
- T. Moreau, M. Wyse, J. Nelson, A. Sampson, H. Esmaeilzadeh, L. Ceze, and M. Oskin. SNNAP: Approximate Computing on Programmable SoCs via Neural Acceleration. In HPCA, 2015.Google ScholarCross Ref
- R. Mueller, J. Teubner, and G. Alonso. Data Processing on FPGAs. PVLDB, 2(1):910--921, 2009. Google ScholarDigital Library
- R. Mueller, J. Teubner, and G. Alonso. Streams on Wires: A Query Compiler for FPGAs. PVLDB, 2(1):229--240, 2009. Google ScholarDigital Library
- R. Mueller, J. Teubner, and G. Alonso. Glacier: A Query-to-hardware Compiler. In SIGMOD, 2010. Google ScholarDigital Library
- F. Niu, B. Recht, C. Re, and S. Wright. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In NIPS. 2011. Google ScholarDigital Library
- P. O'Neil and D. Quass. Improved Query Performance with Variant Indexes. In SIGMOD, pages 38--49, 1997. Google ScholarDigital Library
- M. Owaida, D. Sidler, K. Kara, and G. Alonso. Centaur: A Framework for Hybrid CPU-FPGA Databases. In FCCM, 2017.Google ScholarCross Ref
- G. Pekhimnko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly Compressed Pages: A Low-Complexity, Low-Latency Main Memory Compression Framework. In MICRO, 2013. Google ScholarDigital Library
- O. Polychroniou and K. A. Ross. Efficient Lightweight Compression Alongside Fast Scans. In DaMoN, 2015. Google ScholarDigital Library
- V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In ICDE, 2008. Google ScholarDigital Library
- B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G.-Y. Wei, and D. Brooks. Minerva: Enabling Low-power, Highly-accurate Deep Neural Network Accelerators. In ISCA, 2016. Google ScholarDigital Library
- R. Rifkin and A. Klautau. In Defense of One-Vs-All Classification. In JMLR. 2004. Google ScholarDigital Library
- D. Rinfret, P. O'Neil, and E. O'Neil. Bit-Sliced Index Arithmetic. In SIGMOD, 2001. Google ScholarDigital Library
- R. M. Russell. The CRAY-1 Computer System. Commun. ACM, 21(1):63--72, 1978. Google ScholarDigital Library
- B. Schlegel, R. Gemulla, and W. Lehner. Fast Integer Compression Using SIMD Instructions. In DaMoN, 2010. Google ScholarDigital Library
- V. Seshadri, K. Hsieh, A. Boroum, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Fast Bulk Bitwise AND and OR in DRAM. IEEE CAL, 2015. Google ScholarDigital Library
- V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Ambit: In-memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In MICRO, 2017. Google ScholarDigital Library
- H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. From High-Level Deep Neural Models to FPGAs. In MICRO, 2016. Google ScholarDigital Library
- H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, J. Kim, V. Chandra, and H. Esmaeilzadeh. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks. In ISCA, 2018. Google ScholarDigital Library
- D. Sidler, Z. István, M. Owaida, and G. Alonso. Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures. In SIGMOD, 2017. Google ScholarDigital Library
- D. Sidler, Z. István, M. Owaida, K. Kara, and G. Alonso. doppioDB: A Hardware Accelerated Database. In SIGMOD, 2017. Google ScholarDigital Library
- A. Sinha and A. P. Chandrakasan. Energy Efficient Filtering Using Adaptive Precision and Variable Voltage. In IEEE International ASIC/SOC Conference, 1999.Google Scholar
- V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proceedings of the IEEE, 2017.Google ScholarCross Ref
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception Architecture For Computer Vision. In CVPR. 2016.Google Scholar
- J. Teubner and L. Woods. Data Processing on FPGAs Synthesis Lectures on Data Management. 2013. Google ScholarDigital Library
- Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. Finn: A framework For Fast, Scalable Binarized Neural Network Inference. In FPGA, 2017. Google ScholarDigital Library
- Y. Umuroglu, L. Rasnayake, and M. Sjlander. BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing. In FPL, 2018.Google ScholarCross Ref
- J. Wang and G. Joshi. Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD. CoRR, abs/1810.08313, 2018.Google Scholar
- Z. Wang, B. He, and W. Zhang. A Study of Data Partitioning on OpenCL-based FPGAs. In FPL, 2015.Google ScholarCross Ref
- Z. Wang, K. Kara, H. Zhang, G. Alonso, O. Mutlu, and C. Zhang. Accelerating Generalized Linear Models with MLWeaving:A One-Size-Fits-All System for Any-Precision Learning (Technical Report). CoRR, 2019.Google Scholar
- Z. Wang, J. Paul, H. Y. Cheah, B. He, and W. Zhang. Relational Query Processing on OpenCL-based FPGAs. In FPL, 2016.Google ScholarCross Ref
- Z. Wang, J. Paul, B. He, and W. Zhang. Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs. TVLSI, 25(6):1906--1918, 2017.Google Scholar
- Z. Wang, K. Zhang, H. Zhou, X. Liu, and B. He. Hebe: An Order-Oblivious and High-Performance Execution Scheme for Conjunctive Predicates. In ICDE, 2018.Google ScholarCross Ref
- K. Q. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. J. Smola. Feature Hashing for Large Scale Multitask Learning. CoRR, 2009.Google ScholarDigital Library
- T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The Implementation and Performance of Compressed Databases. SIGMOD Rec., 29(3):55--67, 2000. Google ScholarDigital Library
- S. White. Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review. IEEE ASSP Magazine, 6(3):4--19, 1989.Google ScholarCross Ref
- T. Willhalm, R. Dementiev, and P. Fay. Intel Performance Counter Monitor - A better way to measure CPU utilization, https://software.intel.com/en-us/articles/intel-performance-counter-monitor, 2016.Google Scholar
- L. Woods, Z. István, and G. Alonso. Ibex: An Intelligent Storage Engine with Support for Advanced SQL Offloading. VLDB, 7(11):963--974, 2014. Google ScholarDigital Library
- T. Xanthopoulos and A. Chandrakasan. A Low-power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization. In Symposium on VLSI Circuits, 1999.Google Scholar
- J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In ISCA, 2017. Google ScholarDigital Library
- C. Zhang and C. Ré. DimmWitted: A Study of Main-memory Statistical Analytics. PVLDB, 7(12):1283--1294, 2014. Google ScholarDigital Library
- H. Zhang, J. Li, K. Kara, D. Alistarh, J. Liu, and C. Zhang. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning. In ICML, volume 70, pages 4035--4043, 2017. Google ScholarDigital Library
- M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized Stochastic Gradient Descent. In NIPS. 2010. Google ScholarDigital Library
Recommendations
Generalized Linear Complementarity Problems
We introduce the concept of the generalized monotone linear complementarity problem GLCP in order to unify LP, convex QP, monotone LCP, and mixed monotone LCP. We establish the basic properties of GLCP and develop canonical forms for its representation. ...
Learning Generalized Linear Models Over Normalized Data
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataEnterprise data analytics is a booming area in the data management industry. Many companies are racing to develop toolkits that closely integrate statistical and machine learning techniques with data management systems. Almost all such toolkits assume ...
Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions
We propose a general algorithm for solving an $n\times n$ nonsingular linear system $Ax = b$ based on iterative refinement with three precisions. The working precision is combined with possibly different precisions for solving for the correction term and for ...
Comments