ABSTRACT
In this paper, we present a novel graph-based method for extracting handwritten text lines in monochromatic Arabic document images. Our approach consists of two steps - Coarse text line estimation using primary components which define the line and assignment of diacritic components which are more difficult to associate with a given line. We first estimate local orientation at each primary component to build a sparse similarity graph. We then, use a shortest path algorithm to compute similarities between non-neighboring components. From this graph, we obtain coarse text lines using two estimates obtained from Affinity propagation and Breadth-first search. In the second step, we assign secondary components to each text line. The proposed method is very fast and robust to non-uniform skew and character size variations, normally present in handwritten text lines. We evaluate our method using a pixel-matching criteria, and report 96% accuracy on a dataset of 125 Arabic document images. We also present a proximity analysis on datasets generated by artificially decreasing the spacings between text lines to demonstrate the robustness of our approach.
- Manivannan Arivazhagan, Harish Srinivasan, and Sargur Srihari, "A statistical approach to line segmentation in handwritten documents," Volume 6500. SPIE, 2007.Google Scholar
- Masaki Yamaoka and Osamu Iwaki, "Document layout analysis using pattern classification method," Lecture Notes in Computer Science, Vol. 1024/1995, pp. 524--525 Google ScholarDigital Library
- Chih-Hong Kao, Hon-Son Don, "Skew Detection of Document Images Using Line Structural Information," icita, vol. 1, pp. 704--715, Third International Conference on Information Technology and Applications (ICITA'05) Volume 1, 2005 Google ScholarDigital Library
- Arvind K. R., Jayant Kumar and Ramakrishnan A. G., "Entropy Based Skew Correction of Document Images," Lecture Notes in Computer Science, Vol. 4815/2007, Springer, pp. 495--502, 2007 Google ScholarDigital Library
- U.-V. Marti, H. Bunke, "Text Line Segmentation and Word Recognition in a System for General Writer Independent Handwriting Recognition," pp. 0159, Sixth International Conference on Document Analysis and Recognition (ICDAR'01), 2001 Google ScholarDigital Library
- Z. Razak, K. Zulkiflee, "Off-line Handwriting textline segmentation: a review," International Journal of Computer Science and Network Security 8(7)(2008) 12--20.Google Scholar
- Zahour A., Taconet B., Likforman-Sulem L., Bousella W., Overlapping and multi-touching text-line segmentation by block covering analysis, Pattern Analysis & Applications, DOI 10.1007/s10044-008-0127-9, July 2008. Google ScholarDigital Library
- B. Yanikoglu, P. A. Sandon, "Segmentation of off-line cursive handwriting using linear programming", Pattern Recognition 31(12) (1998) 1825--1833.Google ScholarCross Ref
- Réjean Plamondon, Sargur N. Srihari, "On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 63--84, Jan. 2000 Google ScholarDigital Library
- Y. Lu, "Machine printed character segmentation: an overview," Pattern Recognition 28, 67--80 (1995).Google ScholarCross Ref
- Vassilis Papavassiliou, Themos Stafylakis, Vassilis Katsouros, George Carayannis, "Handwritten document image segmentation into textlines and words," Pattern Recognition, Volume 43, Issue 1, January 2010, Pages 369--377 Google ScholarDigital Library
- G. Louloudis, B. Gatos, C. Halatsis, "Text Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach," ICDAR, vol. 2, pp. 599--603, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007 Google ScholarDigital Library
- Amin A. "Off-line Arabic character recognition: The state of the art", Pattern Recognition, Vol. 31, pp. 517--530, 1998.Google ScholarCross Ref
- A. Zahour, B. Taconet, P. Mercy, S. Ramdane, "Arabic handwritten text-line extraction," In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, 2001, pp. 281--285. Google ScholarDigital Library
- U. Pal, S. Datta, "Segmentation of Bangla unconstrained handwritten text," In: Proc. of the Seventh International Conference on Document Analysis and Recognition, vol. 2, 2003, pp. 1128--1132 Google ScholarDigital Library
- Venu Govindaraju, Huaigu Cao and Anurag Bhardwaj, "Handwritten Document Retrieval Strategies", Proc. of ICDAR worskhop on Noisy Text Analytics (AND), Spain, 2009. Google ScholarDigital Library
- Yi Li, Yefeng Zheng and David Doermann, "Detecting Text Line in Handwritten Documents," ICPR'06, pages 1030--1033, 2006. Google ScholarDigital Library
- Handwritten Arabic Proximity Datasets. Language and Media Processing Laboratory. http://lampsrv02.umiacs.umd.edu/projdb/project.phpGoogle Scholar
- W. Boussellaa, A. Zahour, B. Taconet, A. Benabdelhafid, A. Alimi, "Segmentation texte/graphique: Application au manuscrits Arabes Anciens.", Neuvième Colloque International Francophone sur lŠEcrit et le Document, Fribourg, Suisse, 18--21 Septembre 2006, pp. 139--144Google Scholar
- F. Farooq, V. Govindaraju, and M. Perrone, "Preprocessing Methods for Handwritten Arabic Documents", Proc. Int'l Conf. Document Analysis and Recognition, pp. 267--271, 2005. Google ScholarDigital Library
- Du, X., Pan, W. et Bui, T. D., "Text line segmentation in handwritten documents using mumford-shah model," Pattern Recogn., 42(12):3136--3145, 2009. Google ScholarDigital Library
- U. V. Martin and H. Bunke., "Text line segmentation and word recognition in a system for general writer independent handwriting recognition," In Proc. Intl. Conf. on Document Analysis and Recognition, pages 159--163, 2001. Google ScholarDigital Library
- Brendan J. Frey and Delbert Dueck, "Clustering by Passing Messages Between Data Points," Science 315, 972--976Google ScholarCross Ref
- Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Section 24.3: Dijkstra's algorithm," Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill. Google ScholarDigital Library
- Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Section 22.2: Breadth First Search," Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill.Google Scholar
Recommendations
Text line segmentation of unconstrained handwritten Kannada script
ICCCS '11: Proceedings of the 2011 International Conference on Communication, Computing & SecuritySeparating text lines in handwritten documents remains a challenge because the text lines are often varying skewed and curved. In this paper, we propose a novel method for text line segmentation of unconstrained handwritten Kannada script. The proposed ...
A Hybrid for Line Segmentation in Handwritten Documents
ICFHR '12: Proceedings of the 2012 International Conference on Frontiers in Handwriting RecognitionThis paper presents an approach for text line segmentation which combines connected component based and projection based information to take advantage of aspects of both methods. The proposed system finds baselines of each connected component. Lines are ...
Using Fringe Maps for Text Line Segmentation in Printed or Handwritten Document Images
VCON '10: Proceedings of the 2010 Second Vaagdevi International Conference on Information Technology for Real World ProblemsAccurate segmentation of text lines from printed or handwritten documents is an important task in any document processing system. This becomes a challenging and complex problem due to several reasons. Situations arise when the text from neighboring ...
Comments