research-article

Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant

Authors:
Johann Hauswald

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Michael A. Laurenzano

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Yunqi Zhang

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Hailong Yang

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Yiping Kang

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Cheng Li

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Austin Rovinski

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Arjun Khurana

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Ronald G. Dreslinski

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Trevor Mudge

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Vinicius Petrucci

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Lingjia Tang

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

,
Jason Mars

Clarity Lab, University of Michigan at Ann Arbor; Beihang University

Clarity Lab, University of Michigan at Ann Arbor; Beihang University
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 34 Issue 1Article No.: 2pp 1–32https://doi.org/10.1145/2870631

Published:06 April 2016Publication History

ACM Transactions on Computer Systems

Abstract

As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current datacenter (DC) architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this article, we present the design of Sirius, an open end-to-end IPA Web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of eight benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 8.5× and 15×, respectively. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of DCs by 2.3× and 1.3×, respectively.

References

ABIResearch. 2013. Wearable computing devices, like Apple iWatch, will exceed 485 million annual shipments by 2018. Retrieved February 18, 2016, from https://www.abiresearch.com/press/wearable-computing-devices-like-apples-iwatch-will.Google Scholar
ApacheNutch. 2010. Apache Nutch Home Page. Retrieved February 18, 2016, from http://nutch.apache.org.Google Scholar
AppleSiri. 2011. Apple’s Siri. Retrieved February 18, 2016, from https://www.apple.com/ios/siri/.Google Scholar
Luiz Andre Barroso, Jimmy Clidaras, and Urs Holzle. 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition. Morgan & Claypool.Google Scholar
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded up robust features. In Computer Vision—ECCV 2006. Lecture Notes in Computer Science, Vol. 3951. Springer, 404--417.Google Scholar
Dimitris Bouris, Antonis Nikitakis, and Ioannis Papaefstathiou. 2010. Fast and efficient FPGA-based feature detection employing the SURF algorithm. In Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’10). IEEE, Los Alamitos, CA, 3--10. DOI:http://dx.doi.org/10.1109/FCCM.2010.11Google ScholarDigital Library
G. Bradski. 2000. Dr. Dobb’s Journal of Software Tools. OpenCV Library.Google Scholar
Vijay R. Chandrasekhar, David M. Chen, Sam S. Tsai, Ngai-Man Cheung, Huizhong Chen, Gabriel Takacs, Yuriy Reznik, Ramakrishna Vedantham, Radek Grzeszczuk, Jeff Bach, and Bernd Girod. 2011. The Stanford mobile visual search data set. In Proceedings of the 2nd Annual ACM Conference on Multimedia Systems (MMSys’11). ACM, New York, NY, 117--122. DOI:http://dx.doi.org/10.1145/1943552.1943568Google ScholarDigital Library
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 269--284. DOI:http://dx.doi.org/10.1145/2541940.2541967Google ScholarDigital Library
Jike Chong, Ekaterina Gonina, and Kurt Keutzer. 2011. Efficient automatic speech recognition on the GPU. In GPU Computing Gems Emerald Edition, W.-M. W. Hwu (Ed.). Morgan Kaufmann, 601--618.Google Scholar
ClarityLab. 2015. Sirius: An Open End-to-End Voice and Vision Personal Assistant. Retrieved February 18, 2016, from http://sirius.clarity-lab.org.Google Scholar
George E. Dahl, Dong Yu, Li Deng, and Alex Acero. 2012. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1, 30--42.Google ScholarDigital Library
Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’12).Google Scholar
Tung H. Dinh, Dao Q. Vu, Vu-Duc Ngo, Nam Pham Ngoc, and Vu T. Truong. 2014. High throughput FPGA architecture for corner detection in traffic images. In Proceedings of the 2014 IEEE 5th International Conference on Communications and Electronics (ICCE’14). IEEE, Los Alamitos, CA, 297--302.Google Scholar
Paul R. Dixon, Tasuku Oonishi, and Sadaoki Furui. 2009. Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition. Computer Speech and Language 23, 4, 510--526. DOI:http://dx.doi.org/10.1016/j.csl.2009.03.005Google ScholarDigital Library
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE, Los Alamitos, CA, 449--460. DOI:http://dx.doi.org/10.1109/MICRO.2012.48Google ScholarDigital Library
Clément Farabet, Yann LeCun, Koray Kavukcuoglu, Eugenio Culurciello, Berin Martini, Polina Akselrod, and Selcuk Talay. 2011. Large-scale FPGA-based convolutional networks. In Scaling Up Machine Learning, R. Bekkerman, M. Bilenko, and J. Langford (Eds.). Cambridge University Press, 399--419. http://yann.lecun.com/exdb/publis/pdf/farabet-suml-11.pdf.Google Scholar
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, 37--48. DOI:http://dx.doi.org/10.1145/2150976.2150982Google ScholarDigital Library
David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A. Kalyanpur, Adam Lally, J. William Murdock, Eric Nyberg, John Prager, Nico Schlaefer, and Chris Welty. 2010. Building Watson: An overview of the DeepQA project—Ferrucci—AI magazine. AI MAGAZINE 31, 3, 59--79. http://www.aaai.org/ojs/index.php/aimagazine/article/view/2303.Google ScholarCross Ref
G. David Forney Jr. 1973. The Viterbi algorithm. Proceedings of the IEEE 61, 3, 268--278.Google ScholarCross Ref
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition.Google ScholarDigital Library
GoogleAndroidWear. 2014. Android Wear. Retrieved February 18, 2016, from http://www.android.com/wear/.Google Scholar
GoogleGlass. 2014. Google Glass. Retrieved February 18, 2016, from http://www.google.com/glass.Google Scholar
GoogleNow. 2014. Google Now. Retrieved February 18, 2016, from http://www.google.com/landing/now/.Google Scholar
Alex Graves, Abdel-Rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’13). IEEE, Los Alamitos, CA, 6645--6649.Google ScholarCross Ref
J. Hauswald, T. Manville, Q. Zheng, R. Dreslinski, C. Chakrabarti, and T. Mudge. 2014. A hybrid approach to offloading mobile image classification. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’14). IEEE, Los Alamitos, CA, 8375--8379.Google Scholar
Marti A. Hearst. 2011. ‘Natural’ search user interfaces. Communications of the ACM 54, 11, 60--67. DOI:http://dx.doi.org/10.1145/2018396.2018414Google ScholarDigital Library
Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel Rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury. 2012. Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine Article No. 38131.Google Scholar
Chang-Hong Hsu, Yunqi Zhang, Michael A. Laurenzano, David Meisner, Thomas Wenisch, Lingjia Tang, Jason Mars, and Ron Dreslinski. 2015. Adrenaline: Pinpointing and reigning in tail queries with quick voltage boosting. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). IEEE, Los Alamitos, CA, 10.Google ScholarCross Ref
Xuedong Huang, James Baker, and Raj Reddy. 2014. A historical perspective of speech recognition. Communications of the ACM 57, 1, 94--103. DOI:http://dx.doi.org/10.1145/2500887Google ScholarDigital Library
David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W. Black, Mosur Ravishankar, and Alex I. Rudnicky. 2006. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. IEEE, Los Alamitos, CA, I.Google Scholar
IDCMobile. 2015. Smartphone OS Market Share, 2015 Q2.Google Scholar
IntelVTune. 2015. Intel VTune Home Page. Retrieved February 18, 2016, from https://software.intel.com/ en-us/intel-vtune-amplifier-xe.Google Scholar
Ravi Iyer, Sadagopan Srinivasan, Omesh Tickoo, Zhen Fang, Ramesh Illikkal, Steven Zhang, Vineet Chadha, Paul M. Stillwell Jr., and Seung Eun Lee. 2011. CogniServe: Heterogeneous server architecture for large-scale recognition. IEEE Micro 31, 3, 20--31.Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google Scholar
Jungsuk Kim, Jike Chong, and Ian R. Lane. 2012. Efficient on-the-fly hypothesis rescoring in a hybrid GPU/CPU-based large vocabulary continuous speech recognition engine. In Proceedings of the 13th Annual Conference on the International Speech Communication Association (INTERSPEECH’12).Google Scholar
Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, 468--479.Google ScholarDigital Library
Rajeev Krishna, Scott Mahlke, and Todd Austin. 2003. Architectural optimizations for low-power, real-time speech recognition. In Proceedings of the 2003 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’03). ACM, New York, NY, 220--231. DOI:http://dx.doi.org/10.1145/951710.951740Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates Inc., 1097--1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convol utional-neural-networks.pdf.Google Scholar
John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 282--289.Google ScholarDigital Library
Michael Laurenzano, Yunqi Zhang, Lingjia Tang, and Jason Mars. 2014. Protean code: Achieving near-free online code transformations for warehouse scale computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). ACM, New York, NY.Google ScholarDigital Library
Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2013. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 36--47.Google Scholar
Edward C. Lin, Kai Yu, Rob A. Rutenbar, and Tsuhan Chen. 2007. A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. In Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays (FPGA’07). ACM, New York, NY, 60--68. DOI:http://dx.doi.org/10.1145/1216919.1216928Google ScholarDigital Library
Jan Van Lunteren, Christoph Hagleitner, Timothy Heil, Giora Biran, Uzi Shvadron, and Kubilay Atasu. 2012. Designing a programmable wire-speed regular-expression matching accelerator. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE, Los Alamitos, CA, 461--472. DOI:http://dx.doi.org/10.1109/MICRO.2012.49Google ScholarDigital Library
Sergey Lyubka. 2009. SLRE: Super Light Regular Expression Library. Available at http://cesanta.com/.Google Scholar
Jason Mars and Lingjia Tang. 2013. Whare-map: Heterogeneity in homogeneous warehouse-scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). IEEE, Los Alamitos, CA.Google ScholarDigital Library
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, 248--259. DOI:http://dx.doi.org/10.1145/2155620.2155650Google ScholarDigital Library
Jason Mars, Lingjia Tang, Kevin Skadron, Mary Lou Soffa, and Robert Hundt. 2012. Increasing utilization in modern warehouse-scale computers using bubble-up. IEEE Micro 32, 3, 88--99. DOI:http://dx.doi.org/10.1109/MM.2012.22Google ScholarDigital Library
Binu Mathew, Al Davis, and Zhen Fang. 2003. A low-power accelerator for the SPHINX 3 speech recognition system. In Proceedings of the 2003 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’03). ACM, New York, NY, 210--219. DOI:http://dx.doi.org/10.1145/951710.951739Google ScholarDigital Library
MicrosoftCortana. 2015. Cortana. Retrieved February 18, 2016, from http://www.windowsphone.com/ en-us/features-8-1.Google Scholar
MobileMarketing. 2014. Qualcomm Acquires Kooaba Visual Recognition Company. Retrieved February 18, 2016, from http://mobilemarketingmagazine.com/qualcomm-acquires-kooaba-visual-recognition-company/.Google Scholar
NVIDIA cuDNN. 2015. NVIDIA cuDNN: GPU Accelerated Deep Learning. Retrieved February 18, 2016, from https://developer.nvidia.com/cudnn.Google Scholar
Naoaki Okazaki. 2007. CRFsuite: A fast implementation of conditional random fields (CRFs). Retrieved February 18, 2016, from http://www.chokkan.org/software/crfsuite/.Google Scholar
Vinicius Petrucci, Michael A. Laurenzano, Yunqi Zhang, John Doherty, Daniel Mosse, Jason Mars, and Lingjia Tang. 2015. Octopus-man: QoS-driven task management for heterogeneous multicore in warehouse scale computers. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). IEEE, Los Alamitos, CA, 10.Google ScholarCross Ref
Nico Piatkowski. 2011. Linear-Chain CRF@GPU. Retrieved February 18, 2016, from http://sfb876.tu-dortmund.de/crfgpu/linear_crf_cuda.html.Google Scholar
Martin F. Porter. 1980. An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14, 3, 130--137.Google ScholarCross Ref
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely. 2011. The Kaldi speech recognition toolkit. In Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE, Los Alamitos, CA.Google Scholar
Andrew Putnam, Adrian Caulfield, Eric Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, Jim Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA’14). http://research.microsoft.com/apps/pubs/default.aspx?id=212001.Google ScholarDigital Library
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 2564--2571.Google ScholarDigital Library
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211--252. DOI:http://dx.doi.org/10.1007/s11263-015-0816-yGoogle ScholarDigital Library
David Rybach, Stefan Hahn, Patrick Lehnen, David Nolden, Martin Sundermeyer, Zoltan Tüske, Siemon Wiesler, Ralf Schlüter, and Hermann Ney. 2011. RASR—the RWTH Aachen University Open Source Speech Recognition Toolkit. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop.Google Scholar
Frank Seide, Gang Li, and Dong Yu. 2011. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH’11). 437--440. http://msr-waypoint.com/pubs/153169/CD-DNN-HMM-SWB-Interspeech2011-Pub.pdf.Google Scholar
M. G. Siegler. 2011. Apple’s Massive New Data Center Set to Host Nuance Tech; Partnership Announcement Due at WWDC. Retrieved February 18, 2016, from http://techcrunch.com/2011/05/09/apple-nuance-data-center-deal/.Google Scholar
A. Singh, N. Kumar, S. Gera, and A. Mittal. 2010. Achieving magnitude order improvement in Porter Stemmer algorithm over multi-core architecture. In Proceedings of the 2010 7th International Conference on Informatics and Systems (INFOS’10). 1--8.Google Scholar
Yuliang Sun, Zilong Wang, Sitao Huang, Lanjun Wang, Yu Wang, Rong Luo, and Huazhong Yang. 2014. Accelerating frequent item counting with FPGA. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’14). ACM, New York, NY, 109--112. DOI:http://dx.doi.org/10.1145/2554688.2554766Google ScholarDigital Library
Sriram Swaminathan, Russell Tessier, Dennis Goeckel, and Wayne Burleson. 2002. A dynamically reconfigurable adaptive Viterbi decoder. In Proceedings of the 2002 ACM/SIGDA 10th International Symposium on Field-Programmable Gate Arrays (FPGA’02). ACM, New York, NY, 227--236. DOI:http://dx.doi.org/10.1145/503048.503081Google ScholarDigital Library
Lingjia Tang, Jason Mars, Wei Wang, Tanima Dey, and Mary Lou Soffa. 2013a. ReQoS: Reactive static/dynamic compilation for QoS in warehouse scale computers. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’13). ACM, New York, NY, 89--100. DOI:http://dx.doi.org/10.1145/2451116.2451126Google ScholarDigital Library
Lingjia Tang, Jason Mars, Xiao Zhang, Robert Hagmann, Robert Hundt, and Eric Tune. 2013b. Optimizing Google’s warehouse scale computers: The NUMA experience. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). IEEE, Los Alamitos, CA, 188--197. DOI:http://dx.doi.org/10.1109/HPCA.2013.6522318Google ScholarDigital Library
ThinkMate. 2014. RAX XF2-1130V3-SH. Retrieved February 18, 2016, from http://www.thinkmate.com/system/rax-xf2-1130v3-sh.Google Scholar
Erik F. Tjong, Kim Sang, and Sabine Buchholz. 2000. Introduction to the CoNLL-2000 shared task: Chunking. In Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning—Volume 7 (ConLL’00). 127--132. DOI:http://dx.doi.org/10.3115/1117601.1117631Google Scholar
Oscar Tackstrom, Dipanjan Das, Slav Petrov, Ryan McDonald, and Joakim Nivre. 2013. Token and type constraints for cross-lingual part-of-speech tagging. Transactions of the Association for Computational Linguistics 1, 1--12.Google ScholarCross Ref
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision 104, 2, 154--171. https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013.Google ScholarDigital Library
Giorgos Vasiliadis, Michalis Polychronakis, Spiros Antonatos, Evangelos P. Markatos, and Sotiris Ioannidis. 2009. Regular expression matching on graphics hardware for intrusion detection. In Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection (RAID’09). 265--283. DOI:http://dx.doi.org/10.1007/978-3-642-04342-0_14Google ScholarDigital Library
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 607--618. DOI:http://dx.doi.org/10.1145/2485922.2485974Google ScholarDigital Library
Yi-Hua E. Yang, Weirong Jiang, and Viktor K. Prasanna. 2008. Compact architecture for high-throughput regular expression matching on FPGA. In Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS’08). ACM, New York, NY, 30--39. DOI:http://dx.doi.org/10.1145/1477942.1477948Google Scholar
Yunqi Zhang, Michael Laurenzano, Jason Mars, and Lingjia Tang. 2014. SMiTe: Precise QoS prediction on real system SMT processors to improve utilization in warehouse scale computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). ACM, New York, NY.Google ScholarDigital Library

Index Terms

Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant
1. Computer systems organization
  1. Architectures

Recommendations

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

As user demand scales for intelligent personal assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future ...
Read More
Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers
ASPLOS '15

As user demand scales for intelligent personal assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future ...
Read More
Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers
ASPLOS'15

As user demand scales for intelligent personal assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 34, Issue 1
April 2016
91 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2912578
Editor:
Todd C. Mowry
Carnegie Mellon University, Pittsburgh, PA
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 April 2016
- Accepted: 1 December 2015
- Received: 1 October 2015
Published in tocs Volume 34, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Datacenters
emerging workloads
intelligent personal assistants
warehouse-scale computers
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 703
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media