skip to main content
10.1145/3219819.3219861acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Rosetta: Large Scale System for Text Detection and Recognition in Images

Published:19 July 2018Publication History

ABSTRACT

In this paper we present a deployed, scalable optical character recognition (OCR) system, which we call Rosetta , designed to process images uploaded daily at Facebook scale. Sharing of image content has become one of the primary ways to communicate information among internet users within social networks such as Facebook, and the understanding of such media, including its textual information, is of paramount importance to facilitate search and recommendation applications. We present modeling techniques for efficient detection and recognition of text in images and describe Rosetta 's system architecture. We perform extensive evaluation of presented technologies, explain useful practical approaches to build an OCR system at scale, and provide insightful intuitions as to why and how certain components work based on the lessons learnt during the development and deployment of the system.

Skip Supplemental Material Section

Supplemental Material

borisyuk_rosetta.mp4

mp4

291.3 MB

References

  1. 2016. PyTorch. (2016). http://pytorch.org/Google ScholarGoogle Scholar
  2. 2017. Caffe2. (2017). https://caffe2.ai/Google ScholarGoogle Scholar
  3. 2017. ICDAR2017 Robust Reading Challenge on COCO-Text. (2017). http://rrc.cvc.uab.es/?ch=5/Google ScholarGoogle Scholar
  4. 2017. Open Neural Network Exchange (ONNX). (2017). https://onnx.ai/Google ScholarGoogle Scholar
  5. 2018. Detectron: FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet. (2018). https://github.com/facebookresearch/Detectron/Google ScholarGoogle Scholar
  6. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S. Davis. 2017. Improving Object Detection With One Line of Code. CoRR abs/1704.04503 (2017).Google ScholarGoogle Scholar
  8. Dhruba Borthakur. 2013. Under the Hood: Building and open-sourcing RocksDB. (2013). https://code.facebook.com/posts/666746063357648/under-the-hood-building-and-open-sourcing-rocksdb/.Google ScholarGoogle Scholar
  9. Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph. In USENIX Conference on Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jia Deng, Wei Dong, Richard Socher, Li jia Li, Kai Li, and Li Fei-fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR.Google ScholarGoogle Scholar
  11. Piotr Dollar, Ron Appel, Serge Belongie, and Pietro Perona. 2014. Fast Feature Pyramids for Object Detection. TPAMI (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yunze Gao, Yingying Chen, Jinqiao Wang, and Hanqing Lu. 2017. Reading Scene Text with Attention Convolutional Sequence Modeling. CoRR abs/1709.04303 (2017).Google ScholarGoogle Scholar
  13. Priya Goyal, Piotr Dollár, Ross B. Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR abs/1706.02677 (2017).Google ScholarGoogle Scholar
  14. Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. 2016. Synthetic Data for Text Localisation in Natural Images. In CVPR.Google ScholarGoogle Scholar
  16. Dafang He, Xiao Yang, Chen Liang, Zihan Zhou, Alexander G. Ororbia II, Daniel Kifer, and C. Lee Giles. 2017. Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild. In CVPR.Google ScholarGoogle Scholar
  17. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.Google ScholarGoogle Scholar
  18. Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016).Google ScholarGoogle Scholar
  19. Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. In NIPS Deep Learning Workshop.Google ScholarGoogle Scholar
  20. Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2016. Reading Text in the Wild with Convolutional Neural Networks. IJCV (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, and Junjie Yan. 2018. FOTS: Fast Oriented Text Spotting with a Unified Network. CoRR abs/1801.01671 (2018).Google ScholarGoogle Scholar
  22. Zichuan Liu, YIxing Li, Fengbo Ren, Hao Yu, and Wangling Goh. 2018. SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoderdecoder Network. AAAI.Google ScholarGoogle Scholar
  23. George Nagy. 2000. Twenty years of document image analysis in PAMI. (2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. TPAMI (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Baoguang Shi, Xiang Bai, and Serge J. Belongie. 2017. Detecting Oriented Text in Natural Images by Linking Segments. In CVPR.Google ScholarGoogle Scholar
  26. Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. TPAMI (2016).Google ScholarGoogle Scholar
  27. Cooper Smith. 2013. Facebook Users Are Uploading 350 Million New Photos Each Day. (2013). http://www.businessinsider.com/facebook-350-million-photos-each-day-2013-9.Google ScholarGoogle Scholar
  28. Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. 2016. Coco-text: Dataset and benchmark for text detection and recognition in natural images. CoRR abs/1601.07140 (2016). https://vision.cornell.edu/se3/coco-text-2/Google ScholarGoogle Scholar
  29. Kai Wang and Serge Belongie. 2010. Word Spotting in the Wild. In ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fei Yin, Yi-Chao Wu, Xu-Yao Zhang, and Cheng-Lin Liu. 2017. Scene Text Recognition with Sliding Convolutional Character Models. CoRR abs/1709.01727 (2017).Google ScholarGoogle Scholar
  31. Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. CoRR abs/1707.01083 (2017).Google ScholarGoogle Scholar
  32. Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An Efficient and Accurate Scene Text Detector. In CVPR.Google ScholarGoogle Scholar
  33. C. Lawrence Zitnick and Piotr Dollár. 2014. Edge Boxes: Locating Object Proposals from Edges. In ECCV.Google ScholarGoogle Scholar

Index Terms

  1. Rosetta: Large Scale System for Text Detection and Recognition in Images

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      July 2018
      2925 pages
      ISBN:9781450355520
      DOI:10.1145/3219819

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader