skip to main content
10.1145/1815330.1815348acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdasConference Proceedingsconference-collections
research-article

Handwritten Arabic text line segmentation using affinity propagation

Published:09 June 2010Publication History

ABSTRACT

In this paper, we present a novel graph-based method for extracting handwritten text lines in monochromatic Arabic document images. Our approach consists of two steps - Coarse text line estimation using primary components which define the line and assignment of diacritic components which are more difficult to associate with a given line. We first estimate local orientation at each primary component to build a sparse similarity graph. We then, use a shortest path algorithm to compute similarities between non-neighboring components. From this graph, we obtain coarse text lines using two estimates obtained from Affinity propagation and Breadth-first search. In the second step, we assign secondary components to each text line. The proposed method is very fast and robust to non-uniform skew and character size variations, normally present in handwritten text lines. We evaluate our method using a pixel-matching criteria, and report 96% accuracy on a dataset of 125 Arabic document images. We also present a proximity analysis on datasets generated by artificially decreasing the spacings between text lines to demonstrate the robustness of our approach.

References

  1. Manivannan Arivazhagan, Harish Srinivasan, and Sargur Srihari, "A statistical approach to line segmentation in handwritten documents," Volume 6500. SPIE, 2007.Google ScholarGoogle Scholar
  2. Masaki Yamaoka and Osamu Iwaki, "Document layout analysis using pattern classification method," Lecture Notes in Computer Science, Vol. 1024/1995, pp. 524--525 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chih-Hong Kao, Hon-Son Don, "Skew Detection of Document Images Using Line Structural Information," icita, vol. 1, pp. 704--715, Third International Conference on Information Technology and Applications (ICITA'05) Volume 1, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arvind K. R., Jayant Kumar and Ramakrishnan A. G., "Entropy Based Skew Correction of Document Images," Lecture Notes in Computer Science, Vol. 4815/2007, Springer, pp. 495--502, 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U.-V. Marti, H. Bunke, "Text Line Segmentation and Word Recognition in a System for General Writer Independent Handwriting Recognition," pp. 0159, Sixth International Conference on Document Analysis and Recognition (ICDAR'01), 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. Razak, K. Zulkiflee, "Off-line Handwriting textline segmentation: a review," International Journal of Computer Science and Network Security 8(7)(2008) 12--20.Google ScholarGoogle Scholar
  7. Zahour A., Taconet B., Likforman-Sulem L., Bousella W., Overlapping and multi-touching text-line segmentation by block covering analysis, Pattern Analysis & Applications, DOI 10.1007/s10044-008-0127-9, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Yanikoglu, P. A. Sandon, "Segmentation of off-line cursive handwriting using linear programming", Pattern Recognition 31(12) (1998) 1825--1833.Google ScholarGoogle ScholarCross RefCross Ref
  9. Réjean Plamondon, Sargur N. Srihari, "On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 63--84, Jan. 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Lu, "Machine printed character segmentation: an overview," Pattern Recognition 28, 67--80 (1995).Google ScholarGoogle ScholarCross RefCross Ref
  11. Vassilis Papavassiliou, Themos Stafylakis, Vassilis Katsouros, George Carayannis, "Handwritten document image segmentation into textlines and words," Pattern Recognition, Volume 43, Issue 1, January 2010, Pages 369--377 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Louloudis, B. Gatos, C. Halatsis, "Text Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach," ICDAR, vol. 2, pp. 599--603, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Amin A. "Off-line Arabic character recognition: The state of the art", Pattern Recognition, Vol. 31, pp. 517--530, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  14. A. Zahour, B. Taconet, P. Mercy, S. Ramdane, "Arabic handwritten text-line extraction," In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, 2001, pp. 281--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. U. Pal, S. Datta, "Segmentation of Bangla unconstrained handwritten text," In: Proc. of the Seventh International Conference on Document Analysis and Recognition, vol. 2, 2003, pp. 1128--1132 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Venu Govindaraju, Huaigu Cao and Anurag Bhardwaj, "Handwritten Document Retrieval Strategies", Proc. of ICDAR worskhop on Noisy Text Analytics (AND), Spain, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yi Li, Yefeng Zheng and David Doermann, "Detecting Text Line in Handwritten Documents," ICPR'06, pages 1030--1033, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Handwritten Arabic Proximity Datasets. Language and Media Processing Laboratory. http://lampsrv02.umiacs.umd.edu/projdb/project.phpGoogle ScholarGoogle Scholar
  19. W. Boussellaa, A. Zahour, B. Taconet, A. Benabdelhafid, A. Alimi, "Segmentation texte/graphique: Application au manuscrits Arabes Anciens.", Neuvième Colloque International Francophone sur lŠEcrit et le Document, Fribourg, Suisse, 18--21 Septembre 2006, pp. 139--144Google ScholarGoogle Scholar
  20. F. Farooq, V. Govindaraju, and M. Perrone, "Preprocessing Methods for Handwritten Arabic Documents", Proc. Int'l Conf. Document Analysis and Recognition, pp. 267--271, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Du, X., Pan, W. et Bui, T. D., "Text line segmentation in handwritten documents using mumford-shah model," Pattern Recogn., 42(12):3136--3145, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. U. V. Martin and H. Bunke., "Text line segmentation and word recognition in a system for general writer independent handwriting recognition," In Proc. Intl. Conf. on Document Analysis and Recognition, pages 159--163, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Brendan J. Frey and Delbert Dueck, "Clustering by Passing Messages Between Data Points," Science 315, 972--976Google ScholarGoogle ScholarCross RefCross Ref
  24. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Section 24.3: Dijkstra's algorithm," Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Section 22.2: Breadth First Search," Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
    June 2010
    490 pages
    ISBN:9781605587738
    DOI:10.1145/1815330

    Copyright © 2010 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 June 2010

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader