skip to main content
10.1145/3341301.3359654acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Parity models: erasure-coded resilience for prediction serving systems

Published:27 October 2019Publication History

ABSTRACT

Machine learning models are becoming the primary work-horses for many applications. Services deploy models through prediction serving systems that take in queries and return predictions by performing inference on models. Prediction serving systems are commonly run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency. Erasure coding is a popular technique for achieving resource-efficient resilience to data unavailability in storage and communication systems. However, existing approaches for imparting erasure-coded resilience to distributed computation apply only to a severely limited class of functions, precluding their use for many serving workloads, such as neural network inference.

We introduce parity models, a new approach for enabling erasure-coded resilience in prediction serving systems. A parity model is a neural network trained to transform erasure-coded queries into a form that enables a decoder to reconstruct slow or failed predictions. We implement parity models in ParM, a prediction serving system that makes use of erasure-coded resilience. ParM encodes multiple queries into a "parity query," performs inference over parity queries using parity models, and decodes approximations of unavailable predictions by using the output of a parity model. We showcase the applicability of parity models to image classification, speech recognition, and object localization tasks. Using parity models, ParM reduces the gap between 99.9th percentile and median latency by up to 3.5X, while maintaining the same median. These results display the potential of parity models to unlock a new avenue to imparting resource-efficient resilience to prediction serving systems.

References

  1. Amazon Alexa. https://developer.amazon.com/alexa. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  2. Amazon EC2 C5 Instances. https://aws.amazon.com/ec2/instance-types/c5/. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  3. Azure Machine Learning Studio. https://azure.microsoft.com/en-us/services/machine-learning-studio/. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  4. Google Cloud AI. https://cloud.google.com/products/machine-learning/. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  5. Google lens: real-time answers to questions about the world around you. https://bit.ly/2MHAOLq. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  6. HDFS RAID. http://www.slideshare.net/ydn/hdfs-raid-facebook. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  7. iOS Siri. https://www.apple.com/ios/siri/. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  8. Machine Learning on AWS. https://aws.amazon.com/machine-learning/. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  9. Model Server for Apache MXNet. https://github.com/awslabs/mxnet-model-server. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  10. NVIDIA TensorRT. https://developer.nvidia.com/tensorrt. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  11. OpenCV. https://opencv.org/. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  12. PyTorch. https://pytorch.org/. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  13. Speculative Execution in Hadoop MapReduce. https://data-flair.training/blogs/speculative-execution-in-hadoop-mapreduce/. Last accessed 01 September 2019.Google ScholarGoogle Scholar
  14. Asirra: A CAPTCHA That Exploits Interest-aligned Manual Image Categorization. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS 07) (2007).Google ScholarGoogle Scholar
  15. Agarwal, D., Long, B., Traupman, J., Xin, D., and Zhang, L. LASER: A Scalable Response Prediction Platform for Online Advertising. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM 14) (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alex Krizhevsky and Vinod Nair and Geoffrey Hinton. The CIFAR-10 and CIFAR-100 Datasets. https://www.cs.toronto.edu/~kriz/cifar.html.Google ScholarGoogle Scholar
  17. Alipourfard, O., Liu, H. H., Chen, J., Venkataraman, S., Yu, M., and Zhang, M. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ananthanarayanan, G., Ghodsi, A., Shenker, S., and Stoica, I. Effective Straggler Mitigation: Attack of the Clones. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (2013).Google ScholarGoogle Scholar
  19. Ananthanarayanan, G., Kandula, S., Greenberg, A. G., Stoica, I., Lu, Y., Saha, B., and Harris, E. Reining in the Outliers in Map-Reduce Clusters using Mantri. In 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10) (2010).Google ScholarGoogle Scholar
  20. Aoudia, F. A., and Hoydis, J. End-to-End Learning of Communications Systems Without a Channel Model. arXiv preprint arXiv:1804.02276 (2018).Google ScholarGoogle Scholar
  21. Baylor, D., Breck, E., Cheng, H.-T., Fiedel, N., Foo, C. Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., et al. TFX: A Tensorflow-Based Production-Scale Machine Learning Platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 17) (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Cowan, M., Wang, L., Hu, Y., Ceze, L., Guestrin, C., and Krishnamurthy, A. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18).Google ScholarGoogle Scholar
  23. Chung, E., Fowers, J., Ovtcharov, K., Papamichael, M., Caulfield, A., Massengill, T., Liu, M., Lo, D., Alkalay, S., Haselman, M., et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro 38, 2 (2018), 8--20.Google ScholarGoogle Scholar
  24. Crankshaw, D., Sela, G.-E., Zumar, C., Mo, X., Gonzalez, J. E., Stoica, I., and Tumanov, A. InferLine: ML Inference Pipeline Composition Framework. arXiv preprint arXiv:1812.01776 (2018).Google ScholarGoogle Scholar
  25. Crankshaw, D., Wang, X., Zhou, G., Franklin, M. J., Gonzalez, J. E., and Stoica, I. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) (2017).Google ScholarGoogle Scholar
  26. Dean, J., and Barroso, L. A. The Tail at Scale. Communications of the ACM 56, 2 (2013), 74--80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR 15) (2015).Google ScholarGoogle Scholar
  28. Dutta, S., Bai, Z., Jeong, H., Low, T. M., and Grover, P. A Unified Coded Deep Neural Network Training Strategy Based on Generalized Polydot Codes for Matrix Multiplication. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT 18) (2018).Google ScholarGoogle ScholarCross RefCross Ref
  29. Dutta, S., Cadambe, V., and Grover, P. Short-dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products. In Advances In Neural Information Processing Systems (NIPS 16) (2016).Google ScholarGoogle Scholar
  30. Dutta, S., Cadambe, V., and Grover, P. Coded Convolution for Parallel and Distributed Computing Within a Deadline. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT 17) (2017).Google ScholarGoogle ScholarCross RefCross Ref
  31. Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., and Hyytia, E. Reducing Latency via Redundant Requests: Exact Analysis. ACM SIGMETRICS Performance Evaluation Review 43, 1 (2015), 347--360.Google ScholarGoogle Scholar
  32. Glorot, X., and Bengio, Y. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 10) (2010).Google ScholarGoogle Scholar
  33. Grosvenor, M. P., Schwarzkopf, M., Gog, I., Watson, R. N. M., Moore, A. W., Hand, S., and Crowcroft, J. Queues Don't Matter When You Can JUMP Them! In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15) (2015).Google ScholarGoogle Scholar
  34. Gujarati, A., Elnikety, S., He, Y., McKinley, K. S., and Brandenburg, B. B. Swayam: Distributed Autoscaling to Meet SLAs of Machine Learning Inference Services with Resource Efficiency. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference (Middleware 17) (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hao, M., Li, H., Tong, M. H., Pakha, C., Suminto, R. O., Stuardo, C. A., Chien, A. A., and Gunawi, H. S. MittOS: Supporting Millisecond Tail Tolerance with Fast Rejecting SLO-Aware OS Interface. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP 17) (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Harchol-Balter, M. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Harlap, A., Cui, H., Dai, W., Wei, J., Ganger, G. R., Gibbons, P. B., Gibson, G. A., and Xing, E. P. Addressing the Straggler Problem for Iterative Convergent Parallel ML. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC 16) (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hauswald, J., Kang, Y., Laurenzano, M. A., Chen, Q., Li, C., Mudge, T., Dreslinski, R. G., Mars, J., and Tang, L. DjiNN and Tonic: DNN as a Service and Its Implications for Future Warehouse Scale Computers. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA 15) (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., Kalro, A., et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA 18) (2018).Google ScholarGoogle ScholarCross RefCross Ref
  40. He, K., Zhang, X., Ren, S., and Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 16) (2016).Google ScholarGoogle ScholarCross RefCross Ref
  41. Ho, Q., Cipar, J., Cui, H., Lee, S., Kim, J. K., Gibbons, P. B., Gibson, G. A., Ganger, G., and Xing, E. P. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In Advances in Neural Information Processing Systems (NIPS 13) (2013).Google ScholarGoogle Scholar
  42. Hu, H., Dey, D., Bagnell, J. A., and Hebert, M. Learning Anytime Predictions in Neural Networks via Adaptive Loss Balancing. arXiv preprint arXiv:1708.06832 (2018).Google ScholarGoogle Scholar
  43. Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., and Yekhanin, S. Erasure Coding in Windows Azure Storage. In 2012 USENIX Annual Technical Conference (USENIX ATC 12) (2012).Google ScholarGoogle Scholar
  44. Iorgulescu, C., Azimi, R., Kwon, Y., Elnikety, S., Syamala, M., Narasayya, V., Herodotou, H., Tomita, P., Chen, A., Zhang, J., and Wang, J. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services. In 2018 USENIX Annual Technical Conference (USENIX ATC 18) (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., and Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. arXiv preprint arXiv:1712.05877 (2017).Google ScholarGoogle Scholar
  46. Jiang, A. H., Wong, D. L.-K., Canel, C., Tang, L., Misra, I., Kaminsky, M., Kozuch, M. A., Pillai, P., Andersen, D. G., and Ganger, G. R. Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18) (2018).Google ScholarGoogle Scholar
  47. Joshi, G., Liu, Y., and Soljanin, E. On the Delay-Storage Trade-Off in Content Download From Coded Distributed Storage Systems. IEEE JSAC, 5 (2014), 989--997.Google ScholarGoogle Scholar
  48. Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., et al. In-Datacenter Performance Analysis of a Tensor Processing Unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA 17) (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Kim, H., Jiang, Y., Rana, R., Kannan, S., Oh, S., and Viswanath, P. Communication Algorithms via Deep Learning. In International Conference on Learning Representations (ICLR 18) (2018).Google ScholarGoogle Scholar
  50. Kosaian, J., Rashmi, K. V., and Venkataraman, S. Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation. arXiv preprint arXiv:1806.01259 (2018).Google ScholarGoogle Scholar
  51. Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS 12) (2012).Google ScholarGoogle Scholar
  52. LeCun, Y. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/.Google ScholarGoogle Scholar
  53. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle Scholar
  54. Lee, K., Lam, M., Pedarsani, R., Papailiopoulos, D., and Ramchandran, K. Speeding Up Distributed Machine Learning Using Codes. IEEE Transactions on Information Theory (July 2018).Google ScholarGoogle Scholar
  55. Lee, Y., Scolari, A., Chun, B.-G., Santambrogio, M. D., Weimer, M., and Interlandi, M. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (2018).Google ScholarGoogle Scholar
  56. Lee, Y., Scolari, A., Interlandi, M., Weimer, M., and Chun, B.-G. Towards High-Performance Prediction Serving Systems. NIPS ML Systems Workshop (2017).Google ScholarGoogle Scholar
  57. Li, S., Maddah-Ali, M. A., and Avestimehr, A. S. A Unified Coding Framework for Distributed Computing With Straggling Servers. In 2016 IEEE Globecom Workshops (GC Wkshps) (2016).Google ScholarGoogle ScholarCross RefCross Ref
  58. Li, Z. L., Liang, C.-J. M., He, W., Zhu, L., Dai, W., Jiang, J., and Sun, G. Metis: Robustly Tuning Tail Latencies of Cloud Systems. In 2018 USENIX Annual Technical Conference (USENIX ATC 18) (2018).Google ScholarGoogle Scholar
  59. Liang, G., and Kozat, U. C. FAST CLOUD: Pushing the Envelope on Delay Performance of Cloud Storage with Coding. arXiv:1301.1294 (Jan. 2013).Google ScholarGoogle Scholar
  60. Liu, Y., Wang, Y., Yu, R., Li, M., Sharma, V., and Wang, Y. Optimizing CNN Model Inference on CPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) (2019).Google ScholarGoogle Scholar
  61. Mace, J., Bodik, P., Musuvathi, M., Fonseca, R., and Varadarajan, K. 2DFQ: Two-Dimensional Fair Queuing for Multi-Tenant Cloud Services. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM 16) (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Mallick, A., Chaudhari, M., and Joshi, G. Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication. arXiv preprint arXiv:1804.10331 (2018).Google ScholarGoogle Scholar
  63. Nachmani, E., Marciano, E., Lugosch, L., Gross, W. J., Burshtein, D., and Be'ery, Y. Deep Learning Methods for Improved Decoding of Linear Codes. IEEE Journal of Selected Topics in Signal Processing 12, 1 (2018), 119--131.Google ScholarGoogle Scholar
  64. Olston, C., Fiedel, N., Gorovoy, K., Harmsen, J., Lao, L., Li, F., Rajashekhar, V., Ramesh, S., and Soyke, J. TensorFlow-Serving: Flexible, High-Performance ML Serving. NIPS ML Systems Workshop (2017).Google ScholarGoogle Scholar
  65. Park, J., Naumov, M., Basu, P., Deng, S., Kalaiah, A., Khudia, D., Law, J., Malani, P., Malevich, A., Nadathur, S., et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications. arXiv preprint arXiv:1811.09886 (2018).Google ScholarGoogle Scholar
  66. Patterson, D. A., Gibson, G., and Katz, R. H. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 88) (1988).Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Rashmi, K. V., Chowdhury, M., Kosaian, J., Stoica, I., and Ramchandran, K. EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016).Google ScholarGoogle Scholar
  68. Rashmi, K. V., Shah, N. B., Gu, D., Kuang, H., Borthakur, D., and Ramchandran, K. A Hitchhiker's Guide to Fast and Efficient Data Reconstruction in Erasure-Coded Data Centers. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM 14) (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Recht, B., Re, C., Wright, S., and Niu, F. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Advances in Neural Information Processing Systems (NIPS 11) (2011).Google ScholarGoogle Scholar
  70. Reed, I. S., and Solomon, G. Polynomial Codes Over Certain Finite Fields. Journal of the society for industrial and applied mathematics 8, 2 (1960), 300--304.Google ScholarGoogle Scholar
  71. Reisizadeh, A., Prakash, S., Pedarsani, R., and Avestimehr, S. Coded Computation Over Heterogeneous Clusters. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT 17) (2017).Google ScholarGoogle ScholarCross RefCross Ref
  72. Richardson, T., and Urbanke, R. Modern Coding Theory. Cambridge University Press, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Rizzo, L. Effective Erasure Codes for Reliable Computer Communication Protocols. ACM SIGCOMM Computer Communication Review 27, 2 (1997), 24--36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.Google ScholarGoogle Scholar
  75. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. MobilenetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 18) (2018).Google ScholarGoogle ScholarCross RefCross Ref
  76. Shah, N. B., Lee, K., and Ramchandran, K. When do Redundant Requests Reduce Latency? IEEE Transactions on Communications 64, 2 (2016), 715--722.Google ScholarGoogle ScholarCross RefCross Ref
  77. Simonyan, K., and Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR 15) (2015).Google ScholarGoogle Scholar
  78. So, J., Guler, B., Avestimehr, A. S., and Mohassel, P. CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning. arXiv preprint arXiv:1902.00641 (2019).Google ScholarGoogle Scholar
  79. Suresh, L., Canini, M., Schmid, S., and Feldmann, A. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15) (2015).Google ScholarGoogle Scholar
  80. Venkataraman, S., Yang, Z., Franklin, M., Recht, B., and Stoica, I. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16) (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Viola, P., and Jones, M. J. Robust Real-Time Face Detection. International Journal of Computer Vision 57, 2 (2004), 137--154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Wang, S., Liu, J., and Shroff, N. Coded Sparse Matrix Multiplication. In Proceedings of the International Conference on Machine Learning (ICML 18) (2018).Google ScholarGoogle Scholar
  83. Wang, W., Gao, J., Zhang, M., Wang, S., Chen, G., Ng, T. K., Ooi, B. C., Shao, J., and Reyad, M. Rafiki: Machine Learning as an Analytics Service System. Proceedings of the VLDB Endowment 12, 2 (2018), 128--140.Google ScholarGoogle Scholar
  84. Wang, X., Luo, Y., Crankshaw, D., Tumanov, A., Yu, F., and Gonzalez, J. E. IDK Cascades: Fast Deep Learning by Learning not to Overthink. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI 18) (2018).Google ScholarGoogle Scholar
  85. Warden, P. Speech commands: A Dataset for Limited-Vocabulary Speech Recognition. arXiv preprint arXiv:1804.03209 (2018).Google ScholarGoogle Scholar
  86. Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., and Perona, P. Caltech-UCSD Birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology, 2010.Google ScholarGoogle Scholar
  87. Xiao, H., Rasul, K., and Vollgraf, R. Fashion-Mnist: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).Google ScholarGoogle Scholar
  88. Xu, Y., Musgrave, Z., Noble, B., and Bailey, M. Bobtail: Avoiding Long Tails in the Cloud. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (2013).Google ScholarGoogle Scholar
  89. Yadwadkar, N. J., Ananthanarayanan, G., and Katz, R. Wrangler: Predictable and Faster Jobs using Fewer Resources. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 14) (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Yadwadkar, N. J., Hariharan, B., Gonzalez, J. E., Smith, B., and Katz, R. H. Selecting the Best VM Across Multiple Public Clouds: A Data-Driven Performance Modeling Approach. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 17) (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Yan, S., Li, H., Hao, M., Tong, M. H., Sundararaman, S., Chien, A. A., and Gunawi, H. S. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. In 15th USENIX Conference on File and Storage Technologies (FAST 17) (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Yu, Q., Maddah-Ali, M., and Avestimehr, S. Polynomial Codes: An Optimal Design for High-Dimensional Coded Matrix Multiplication. In Advances in Neural Information Processing Systems (NIPS 17) (2017).Google ScholarGoogle Scholar
  93. Yu, Q., Raviv, N., So, J., and Avestimehr, A. S. Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 19) (2019).Google ScholarGoogle Scholar
  94. Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R., and Stoica, I. Improving MapReduce Performance in Heterogeneous Environments. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI 08) (2008).Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Zhang, H., Ananthanarayanan, G., Bodik, P., Philipose, M., Bahl, P., and Freedman, M. J. Live Video Analytics at Scale with Approximation and Delay-Tolerance. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) (2017).Google ScholarGoogle Scholar
  96. Zhang, M., Rajbhandari, S., Wang, W., and He, Y. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. In 2018 USENIX Annual Technical Conference (USENIX ATC 18) (2018).Google ScholarGoogle Scholar
  97. Zoph, B., and Le, Q. V. Neural Architecture Search with Reinforcement Learning. arXiv preprint arXiv:1611.01578 (2016).Google ScholarGoogle Scholar

Index Terms

  1. Parity models: erasure-coded resilience for prediction serving systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems Principles
        October 2019
        615 pages
        ISBN:9781450368735
        DOI:10.1145/3341301

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 October 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate131of716submissions,18%

        Upcoming Conference

        SOSP '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader