ABSTRACT
Machine learning models are becoming the primary work-horses for many applications. Services deploy models through prediction serving systems that take in queries and return predictions by performing inference on models. Prediction serving systems are commonly run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency. Erasure coding is a popular technique for achieving resource-efficient resilience to data unavailability in storage and communication systems. However, existing approaches for imparting erasure-coded resilience to distributed computation apply only to a severely limited class of functions, precluding their use for many serving workloads, such as neural network inference.
We introduce parity models, a new approach for enabling erasure-coded resilience in prediction serving systems. A parity model is a neural network trained to transform erasure-coded queries into a form that enables a decoder to reconstruct slow or failed predictions. We implement parity models in ParM, a prediction serving system that makes use of erasure-coded resilience. ParM encodes multiple queries into a "parity query," performs inference over parity queries using parity models, and decodes approximations of unavailable predictions by using the output of a parity model. We showcase the applicability of parity models to image classification, speech recognition, and object localization tasks. Using parity models, ParM reduces the gap between 99.9th percentile and median latency by up to 3.5X, while maintaining the same median. These results display the potential of parity models to unlock a new avenue to imparting resource-efficient resilience to prediction serving systems.
- Amazon Alexa. https://developer.amazon.com/alexa. Last accessed 01 September 2019.Google Scholar
- Amazon EC2 C5 Instances. https://aws.amazon.com/ec2/instance-types/c5/. Last accessed 01 September 2019.Google Scholar
- Azure Machine Learning Studio. https://azure.microsoft.com/en-us/services/machine-learning-studio/. Last accessed 01 September 2019.Google Scholar
- Google Cloud AI. https://cloud.google.com/products/machine-learning/. Last accessed 01 September 2019.Google Scholar
- Google lens: real-time answers to questions about the world around you. https://bit.ly/2MHAOLq. Last accessed 01 September 2019.Google Scholar
- HDFS RAID. http://www.slideshare.net/ydn/hdfs-raid-facebook. Last accessed 01 September 2019.Google Scholar
- iOS Siri. https://www.apple.com/ios/siri/. Last accessed 01 September 2019.Google Scholar
- Machine Learning on AWS. https://aws.amazon.com/machine-learning/. Last accessed 01 September 2019.Google Scholar
- Model Server for Apache MXNet. https://github.com/awslabs/mxnet-model-server. Last accessed 01 September 2019.Google Scholar
- NVIDIA TensorRT. https://developer.nvidia.com/tensorrt. Last accessed 01 September 2019.Google Scholar
- OpenCV. https://opencv.org/. Last accessed 01 September 2019.Google Scholar
- PyTorch. https://pytorch.org/. Last accessed 01 September 2019.Google Scholar
- Speculative Execution in Hadoop MapReduce. https://data-flair.training/blogs/speculative-execution-in-hadoop-mapreduce/. Last accessed 01 September 2019.Google Scholar
- Asirra: A CAPTCHA That Exploits Interest-aligned Manual Image Categorization. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS 07) (2007).Google Scholar
- Agarwal, D., Long, B., Traupman, J., Xin, D., and Zhang, L. LASER: A Scalable Response Prediction Platform for Online Advertising. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM 14) (2014).Google ScholarDigital Library
- Alex Krizhevsky and Vinod Nair and Geoffrey Hinton. The CIFAR-10 and CIFAR-100 Datasets. https://www.cs.toronto.edu/~kriz/cifar.html.Google Scholar
- Alipourfard, O., Liu, H. H., Chen, J., Venkataraman, S., Yu, M., and Zhang, M. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) (2017).Google ScholarDigital Library
- Ananthanarayanan, G., Ghodsi, A., Shenker, S., and Stoica, I. Effective Straggler Mitigation: Attack of the Clones. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (2013).Google Scholar
- Ananthanarayanan, G., Kandula, S., Greenberg, A. G., Stoica, I., Lu, Y., Saha, B., and Harris, E. Reining in the Outliers in Map-Reduce Clusters using Mantri. In 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10) (2010).Google Scholar
- Aoudia, F. A., and Hoydis, J. End-to-End Learning of Communications Systems Without a Channel Model. arXiv preprint arXiv:1804.02276 (2018).Google Scholar
- Baylor, D., Breck, E., Cheng, H.-T., Fiedel, N., Foo, C. Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., et al. TFX: A Tensorflow-Based Production-Scale Machine Learning Platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 17) (2017).Google ScholarDigital Library
- Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Cowan, M., Wang, L., Hu, Y., Ceze, L., Guestrin, C., and Krishnamurthy, A. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18).Google Scholar
- Chung, E., Fowers, J., Ovtcharov, K., Papamichael, M., Caulfield, A., Massengill, T., Liu, M., Lo, D., Alkalay, S., Haselman, M., et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro 38, 2 (2018), 8--20.Google Scholar
- Crankshaw, D., Sela, G.-E., Zumar, C., Mo, X., Gonzalez, J. E., Stoica, I., and Tumanov, A. InferLine: ML Inference Pipeline Composition Framework. arXiv preprint arXiv:1812.01776 (2018).Google Scholar
- Crankshaw, D., Wang, X., Zhou, G., Franklin, M. J., Gonzalez, J. E., and Stoica, I. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) (2017).Google Scholar
- Dean, J., and Barroso, L. A. The Tail at Scale. Communications of the ACM 56, 2 (2013), 74--80.Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR 15) (2015).Google Scholar
- Dutta, S., Bai, Z., Jeong, H., Low, T. M., and Grover, P. A Unified Coded Deep Neural Network Training Strategy Based on Generalized Polydot Codes for Matrix Multiplication. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT 18) (2018).Google ScholarCross Ref
- Dutta, S., Cadambe, V., and Grover, P. Short-dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products. In Advances In Neural Information Processing Systems (NIPS 16) (2016).Google Scholar
- Dutta, S., Cadambe, V., and Grover, P. Coded Convolution for Parallel and Distributed Computing Within a Deadline. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT 17) (2017).Google ScholarCross Ref
- Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., and Hyytia, E. Reducing Latency via Redundant Requests: Exact Analysis. ACM SIGMETRICS Performance Evaluation Review 43, 1 (2015), 347--360.Google Scholar
- Glorot, X., and Bengio, Y. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 10) (2010).Google Scholar
- Grosvenor, M. P., Schwarzkopf, M., Gog, I., Watson, R. N. M., Moore, A. W., Hand, S., and Crowcroft, J. Queues Don't Matter When You Can JUMP Them! In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15) (2015).Google Scholar
- Gujarati, A., Elnikety, S., He, Y., McKinley, K. S., and Brandenburg, B. B. Swayam: Distributed Autoscaling to Meet SLAs of Machine Learning Inference Services with Resource Efficiency. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference (Middleware 17) (2017).Google ScholarDigital Library
- Hao, M., Li, H., Tong, M. H., Pakha, C., Suminto, R. O., Stuardo, C. A., Chien, A. A., and Gunawi, H. S. MittOS: Supporting Millisecond Tail Tolerance with Fast Rejecting SLO-Aware OS Interface. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP 17) (2017).Google ScholarDigital Library
- Harchol-Balter, M. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013.Google ScholarDigital Library
- Harlap, A., Cui, H., Dai, W., Wei, J., Ganger, G. R., Gibbons, P. B., Gibson, G. A., and Xing, E. P. Addressing the Straggler Problem for Iterative Convergent Parallel ML. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC 16) (2016).Google ScholarDigital Library
- Hauswald, J., Kang, Y., Laurenzano, M. A., Chen, Q., Li, C., Mudge, T., Dreslinski, R. G., Mars, J., and Tang, L. DjiNN and Tonic: DNN as a Service and Its Implications for Future Warehouse Scale Computers. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA 15) (2015).Google ScholarDigital Library
- Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., Kalro, A., et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA 18) (2018).Google ScholarCross Ref
- He, K., Zhang, X., Ren, S., and Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 16) (2016).Google ScholarCross Ref
- Ho, Q., Cipar, J., Cui, H., Lee, S., Kim, J. K., Gibbons, P. B., Gibson, G. A., Ganger, G., and Xing, E. P. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In Advances in Neural Information Processing Systems (NIPS 13) (2013).Google Scholar
- Hu, H., Dey, D., Bagnell, J. A., and Hebert, M. Learning Anytime Predictions in Neural Networks via Adaptive Loss Balancing. arXiv preprint arXiv:1708.06832 (2018).Google Scholar
- Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., and Yekhanin, S. Erasure Coding in Windows Azure Storage. In 2012 USENIX Annual Technical Conference (USENIX ATC 12) (2012).Google Scholar
- Iorgulescu, C., Azimi, R., Kwon, Y., Elnikety, S., Syamala, M., Narasayya, V., Herodotou, H., Tomita, P., Chen, A., Zhang, J., and Wang, J. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services. In 2018 USENIX Annual Technical Conference (USENIX ATC 18) (2018).Google ScholarDigital Library
- Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., and Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. arXiv preprint arXiv:1712.05877 (2017).Google Scholar
- Jiang, A. H., Wong, D. L.-K., Canel, C., Tang, L., Misra, I., Kaminsky, M., Kozuch, M. A., Pillai, P., Andersen, D. G., and Ganger, G. R. Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18) (2018).Google Scholar
- Joshi, G., Liu, Y., and Soljanin, E. On the Delay-Storage Trade-Off in Content Download From Coded Distributed Storage Systems. IEEE JSAC, 5 (2014), 989--997.Google Scholar
- Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., et al. In-Datacenter Performance Analysis of a Tensor Processing Unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA 17) (2017).Google ScholarDigital Library
- Kim, H., Jiang, Y., Rana, R., Kannan, S., Oh, S., and Viswanath, P. Communication Algorithms via Deep Learning. In International Conference on Learning Representations (ICLR 18) (2018).Google Scholar
- Kosaian, J., Rashmi, K. V., and Venkataraman, S. Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation. arXiv preprint arXiv:1806.01259 (2018).Google Scholar
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS 12) (2012).Google Scholar
- LeCun, Y. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/.Google Scholar
- LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google Scholar
- Lee, K., Lam, M., Pedarsani, R., Papailiopoulos, D., and Ramchandran, K. Speeding Up Distributed Machine Learning Using Codes. IEEE Transactions on Information Theory (July 2018).Google Scholar
- Lee, Y., Scolari, A., Chun, B.-G., Santambrogio, M. D., Weimer, M., and Interlandi, M. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (2018).Google Scholar
- Lee, Y., Scolari, A., Interlandi, M., Weimer, M., and Chun, B.-G. Towards High-Performance Prediction Serving Systems. NIPS ML Systems Workshop (2017).Google Scholar
- Li, S., Maddah-Ali, M. A., and Avestimehr, A. S. A Unified Coding Framework for Distributed Computing With Straggling Servers. In 2016 IEEE Globecom Workshops (GC Wkshps) (2016).Google ScholarCross Ref
- Li, Z. L., Liang, C.-J. M., He, W., Zhu, L., Dai, W., Jiang, J., and Sun, G. Metis: Robustly Tuning Tail Latencies of Cloud Systems. In 2018 USENIX Annual Technical Conference (USENIX ATC 18) (2018).Google Scholar
- Liang, G., and Kozat, U. C. FAST CLOUD: Pushing the Envelope on Delay Performance of Cloud Storage with Coding. arXiv:1301.1294 (Jan. 2013).Google Scholar
- Liu, Y., Wang, Y., Yu, R., Li, M., Sharma, V., and Wang, Y. Optimizing CNN Model Inference on CPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) (2019).Google Scholar
- Mace, J., Bodik, P., Musuvathi, M., Fonseca, R., and Varadarajan, K. 2DFQ: Two-Dimensional Fair Queuing for Multi-Tenant Cloud Services. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM 16) (2016).Google ScholarDigital Library
- Mallick, A., Chaudhari, M., and Joshi, G. Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication. arXiv preprint arXiv:1804.10331 (2018).Google Scholar
- Nachmani, E., Marciano, E., Lugosch, L., Gross, W. J., Burshtein, D., and Be'ery, Y. Deep Learning Methods for Improved Decoding of Linear Codes. IEEE Journal of Selected Topics in Signal Processing 12, 1 (2018), 119--131.Google Scholar
- Olston, C., Fiedel, N., Gorovoy, K., Harmsen, J., Lao, L., Li, F., Rajashekhar, V., Ramesh, S., and Soyke, J. TensorFlow-Serving: Flexible, High-Performance ML Serving. NIPS ML Systems Workshop (2017).Google Scholar
- Park, J., Naumov, M., Basu, P., Deng, S., Kalaiah, A., Khudia, D., Law, J., Malani, P., Malevich, A., Nadathur, S., et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications. arXiv preprint arXiv:1811.09886 (2018).Google Scholar
- Patterson, D. A., Gibson, G., and Katz, R. H. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 88) (1988).Google ScholarDigital Library
- Rashmi, K. V., Chowdhury, M., Kosaian, J., Stoica, I., and Ramchandran, K. EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016).Google Scholar
- Rashmi, K. V., Shah, N. B., Gu, D., Kuang, H., Borthakur, D., and Ramchandran, K. A Hitchhiker's Guide to Fast and Efficient Data Reconstruction in Erasure-Coded Data Centers. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM 14) (2014).Google ScholarDigital Library
- Recht, B., Re, C., Wright, S., and Niu, F. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Advances in Neural Information Processing Systems (NIPS 11) (2011).Google Scholar
- Reed, I. S., and Solomon, G. Polynomial Codes Over Certain Finite Fields. Journal of the society for industrial and applied mathematics 8, 2 (1960), 300--304.Google Scholar
- Reisizadeh, A., Prakash, S., Pedarsani, R., and Avestimehr, S. Coded Computation Over Heterogeneous Clusters. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT 17) (2017).Google ScholarCross Ref
- Richardson, T., and Urbanke, R. Modern Coding Theory. Cambridge University Press, 2008.Google ScholarDigital Library
- Rizzo, L. Effective Erasure Codes for Reliable Computer Communication Protocols. ACM SIGCOMM Computer Communication Review 27, 2 (1997), 24--36.Google ScholarDigital Library
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.Google Scholar
- Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. MobilenetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 18) (2018).Google ScholarCross Ref
- Shah, N. B., Lee, K., and Ramchandran, K. When do Redundant Requests Reduce Latency? IEEE Transactions on Communications 64, 2 (2016), 715--722.Google ScholarCross Ref
- Simonyan, K., and Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR 15) (2015).Google Scholar
- So, J., Guler, B., Avestimehr, A. S., and Mohassel, P. CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning. arXiv preprint arXiv:1902.00641 (2019).Google Scholar
- Suresh, L., Canini, M., Schmid, S., and Feldmann, A. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15) (2015).Google Scholar
- Venkataraman, S., Yang, Z., Franklin, M., Recht, B., and Stoica, I. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16) (2016).Google ScholarDigital Library
- Viola, P., and Jones, M. J. Robust Real-Time Face Detection. International Journal of Computer Vision 57, 2 (2004), 137--154.Google ScholarDigital Library
- Wang, S., Liu, J., and Shroff, N. Coded Sparse Matrix Multiplication. In Proceedings of the International Conference on Machine Learning (ICML 18) (2018).Google Scholar
- Wang, W., Gao, J., Zhang, M., Wang, S., Chen, G., Ng, T. K., Ooi, B. C., Shao, J., and Reyad, M. Rafiki: Machine Learning as an Analytics Service System. Proceedings of the VLDB Endowment 12, 2 (2018), 128--140.Google Scholar
- Wang, X., Luo, Y., Crankshaw, D., Tumanov, A., Yu, F., and Gonzalez, J. E. IDK Cascades: Fast Deep Learning by Learning not to Overthink. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI 18) (2018).Google Scholar
- Warden, P. Speech commands: A Dataset for Limited-Vocabulary Speech Recognition. arXiv preprint arXiv:1804.03209 (2018).Google Scholar
- Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., and Perona, P. Caltech-UCSD Birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology, 2010.Google Scholar
- Xiao, H., Rasul, K., and Vollgraf, R. Fashion-Mnist: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).Google Scholar
- Xu, Y., Musgrave, Z., Noble, B., and Bailey, M. Bobtail: Avoiding Long Tails in the Cloud. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (2013).Google Scholar
- Yadwadkar, N. J., Ananthanarayanan, G., and Katz, R. Wrangler: Predictable and Faster Jobs using Fewer Resources. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 14) (2014).Google ScholarDigital Library
- Yadwadkar, N. J., Hariharan, B., Gonzalez, J. E., Smith, B., and Katz, R. H. Selecting the Best VM Across Multiple Public Clouds: A Data-Driven Performance Modeling Approach. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 17) (2017).Google ScholarDigital Library
- Yan, S., Li, H., Hao, M., Tong, M. H., Sundararaman, S., Chien, A. A., and Gunawi, H. S. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. In 15th USENIX Conference on File and Storage Technologies (FAST 17) (2017).Google ScholarDigital Library
- Yu, Q., Maddah-Ali, M., and Avestimehr, S. Polynomial Codes: An Optimal Design for High-Dimensional Coded Matrix Multiplication. In Advances in Neural Information Processing Systems (NIPS 17) (2017).Google Scholar
- Yu, Q., Raviv, N., So, J., and Avestimehr, A. S. Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 19) (2019).Google Scholar
- Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R., and Stoica, I. Improving MapReduce Performance in Heterogeneous Environments. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI 08) (2008).Google ScholarDigital Library
- Zhang, H., Ananthanarayanan, G., Bodik, P., Philipose, M., Bahl, P., and Freedman, M. J. Live Video Analytics at Scale with Approximation and Delay-Tolerance. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) (2017).Google Scholar
- Zhang, M., Rajbhandari, S., Wang, W., and He, Y. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. In 2018 USENIX Annual Technical Conference (USENIX ATC 18) (2018).Google Scholar
- Zoph, B., and Le, Q. V. Neural Architecture Search with Reinforcement Learning. arXiv preprint arXiv:1611.01578 (2016).Google Scholar
Index Terms
- Parity models: erasure-coded resilience for prediction serving systems
Recommendations
Efficient erasure-coded data updates based on file class predictions and hybrid writes
AbstractA small update write can lead to a partial write to an erasure coding group in erasure-coded storage systems, resulting in a time-consuming write-after-read. This paper presents a data delta and logging based writing approach, named ...
Graphical abstractDisplay Omitted
Highlights- We aim to minimize the execution time of partial writes.
- We use file class ...
A Highly Reliable Storage Systems Based on SSD Array for IoE Environment
Devices in IoE Internet of Everything environment generate massive data from various sensors. To store and process the rapidly incoming large-scale data, SSDs are used for improving performance and reliability of storage systems. However, they have ...
H-V: An Improved Coding Layout Based on Erasure Coded Storage System
Database Systems for Advanced Applications. DASFAA 2022 International WorkshopsAbstractThe failure of a single unreliable commodity components is very common in large-scale distributed storage systems. In order to ensure the reliability of data in large-scale distributed storage systems, a lot of studies have emerged one after ...
Comments