research-article

An information extraction system from patient historical documents

Authors:
Eirini Matthaiou

University of the Aegean, Samos, Greece

University of the Aegean, Samos, Greece
View Profile

,
Ergina Kavallieratou

University of the Aegean, Samos, Greece

University of the Aegean, Samos, Greece
View Profile

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied ComputingMarch 2012Pages 787–791https://doi.org/10.1145/2245276.2245428

Published:26 March 2012Publication History

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

Pages 787–791

ABSTRACT

Nowadays, document image retrieval systems are increasingly applicable by various businesses, governmental and academic organizations. ELEPAP (Hellenic Protection and Rehabilitation Centre for Disabled Children) is an organization which needs more efficient ways of managing its huge volume of archived documents. This paper deals with the preprocessing procedures of well-known OCR systems in order to extract specific features from ELEPAP's patients' cards. It is shown that our proposed methodology can provide good IT solutions for ELEPAP in order to extract information from its old archives.

References

V. Govindaraju, H. Cao and A. Bhardwaj. Handwritten Document RetrievalStrategies, Proc. of ICDAR workshop on Noisy Text Analytics (AND), Spain, 2009. Google ScholarDigital Library
N. Nikolaou, M. Makridis, B. Gatos, N. Stamatopoulos and N. Papamarkos. Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths. Image and Vision Computing, vol. 28, no. 4, 590--604, 2010. Google ScholarDigital Library
B. Mund and K-H.Steinke. Processing Handwritten Words by Intelligent Use of OCR Results. Springer, LcNs. in Computer Science, Vol. 6171, Advances in Data Mining, 174--185, 2010. Google ScholarDigital Library
E. Kavallieratou and E. Stamatatos, Improving the quality of degraded document images, in Proc. Int'l Conf. Document Image Analysis for Libraries (DIAL), (Lyon, France), 2006. Google ScholarDigital Library
S. Vavilis, E. Kavallieratou. A tool for Tuning Binarization Techniques, ICDAR 2011.Google Scholar
E. Kavallieratou, N. Fakotakis, and G. Kokkinakis. Skew angle estimation for printed and handwritten documents using the wigner-ville distribution. Image and Vision Computing, 20: 813--824, 2002.Google ScholarCross Ref
A. Rehman, D. Mohammad, T. Saba. Skewed Line Detection and Removal Preserving Handwritten Strokes: A New Approach, College Science in India, 2009.Google Scholar

Index Terms

An information extraction system from patient historical documents
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Video segmentation

Recommendations

Semantics-Based Content Extraction in Typewritten Historical Documents
ICDAR '05: Proceedings of the Eighth International Conference on Document Analysis and Recognition

This paper presents a flexible approach to extracting content from scanned historical documents using semantic information. The final electronic document is the result of a "digital historical document lifecycle" process, where the expert knowledge of ...
Read More
Automatic keyphrase extraction for Arabic news documents based on KEA system

A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document ...
Read More
HistDoc - a toolbox for processing images of historical documents
ICIAR'10: Proceedings of the 7th international conference on Image Analysis and Recognition - Volume Part II

HistDoc is a software tool designed to process images of historical documents. It has two operation modes: standalone mode - one can process one image a time; and batch mode - one can process thousands of documents automatically. This tool automatically ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
March 2012
2179 pages
ISBN:9781450308571
DOI:10.1145/2245276
Conference Chairs:
Sascha Ossowski
University Rey Juan Carlos, Spain
,
Paola Lecca
The Microsoft Research - University of Trento COSBI, Italy
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 March 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
binarization
line segmentation
printed-handwritten text discrimination
skew angle correction
word segmentation
Qualifiers
- research-article
Conference

Acceptance Rates
SAC '12 Paper Acceptance Rate270of1,056submissions,26%Overall Acceptance Rate1,650of6,669submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 102
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An information extraction system from patient historical documents

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantics-Based Content Extraction in Typewritten Historical Documents

Automatic keyphrase extraction for Arabic news documents based on KEA system

HistDoc - a toolbox for processing images of historical documents