ABSTRACT
In this paper, we propose a practical approach for extracting the most relevant paragraphs from the original document to form a summary for Thai text. The idea of our approach is to exploit both the local and global properties of paragraphs. The local property can be considered as clusters of significant words within each paragraph, while the global property can be though of as relations of all paragraphs in a document. These two properties are combined for ranking and extracting summaries. Experimental results on real-world data sets are encouraging.
- Banko, M., Mittal, V., Kantrowitz, M., and Goldstein, J. 1999. Generating extraction-based summaries from hand-written summaries by aligning text spans. In Proceedings of PACLING'99.Google Scholar
- Buyukkokten, O., Garcia-Molina, H., and Paepcke, A. 2001. Seeing the whole in parts: Text summarization for web browsing on handheld devices. WWW10. Google ScholarDigital Library
- Chuang, W. T., and Yang, J. 2000. Extracting sentence segments for text summarization: A machine learning approach. In Proceedings of the 23rd ACM SIGIR, 152--159. Google ScholarDigital Library
- Edmundson, H. P. 1969. New methods in automatic extraction. Journal of the ACM, 16(2):264--285. Google ScholarDigital Library
- Goldstein, J., Kantrowitz, M., Mittal, V., and Carbonell, J. 1999. Summarizing text documents: Sentence selection and evaluation metrics. In Proceedings of the 22nd ACM SIGIR, 121--128. Google ScholarDigital Library
- Hahn, U., and Mani, I. 2000. The challenges of automatic summarization. IEEE Computer, 33(11):29--35. Google ScholarDigital Library
- Jaruskulchai, C., Khanthong, A., and Tantiprasongchai, W. 2003. A Framework for Delivery of Thai Content through Mobile Devices. Closing Gaps in the Digital Divide Regional Conference on Digital GMS. Asian Institute of Technology, 190--194.Google Scholar
- Jing, H., Barzilay, R., McKeown, K., and Elhadad, M. 1998. Summarization evaluation methods: Experiments and analysis. AAAI Intelligent Text Summarization Workshop, 60--68.Google Scholar
- Jing, H., and McKeown, K. 2000. Cut and paste based text summarization. In Proceedings of the 1st Conference of the North American Chapter of the Association for Computational Linguistics. Google ScholarDigital Library
- Kupiec, J., Pedersen, J., and Chen, F. 1995. A trainable document summarizer. In Proceedings of the 18th ACM SIGIR, 68--73. Google ScholarDigital Library
- Lam-Adesina, M., and Jones, G. J. F. 2001. Applying summarization techniques for term selection in relevance feedback. In Proceedings of the 24th ACM SIGIR, 1--9. Google ScholarDigital Library
- Luhn, H. P. 1959. The automatic creation of literature abstracts. IBM Journal of Research and Development, 159--165.Google Scholar
- Mani, I., Firmin, T., House, D., Klein, G., Sundheim, B., Hirschman, L. 1999. The TIPSTER SUMMAC Text Summarization Evaluation. In Proceedings of EACL'99. Google ScholarDigital Library
- Mani, I., and Maybury, M. T. 1999. Advances in actomatic text summarization. MIT Press. Google ScholarDigital Library
- Ohsawa, Y., Benson, N. E., and Yachida, M. 1998. Key-Graph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings of EAdvanced Digital Library Conference. Google ScholarDigital Library
- Salton, G., and Buckley, C. 1988. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523. Google ScholarDigital Library
- Salton, G., Singhal, A., Mitra, M., and Buckley, C. 1999. Automatic text structuring and summarization. In Mani, I. and Maybury, M. (Eds.), Advances in automatic text summarization. MIT Press.Google Scholar
- Sornlertlamvanich, V. 1993. Word segmentation for Thai in machine translation system. Machine Translation, National Electronics and Computer Technology Center, 50--56.Google Scholar
- A practical text summarizer by paragraph extraction for Thai
Recommendations
A text-extraction based summarizer
TIPSTER '98: Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998We present an automated method of generating human-readable summaries from a variety of text documents including newspaper articles, business reports, government documents, even broadcast news transcripts. Our approach exploits an empirical observation ...
Paragraph Ranking Based on Eigen Analysis
AbstractThe information contained in the document can be retrieved from its most significant paragraph, rather than by reading the whole document. The proposed work ranks the paragraphs of a text document using eigen analysis and returns the most ...
Towards high-quality text stream extraction from PDF: technical background to the ACL 2012 contributed task
ACL '12: Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of DiscoveriesExtracting textual content and document structure from PDF presents a surprisingly (depressingly, to some, in fact) difficult challenge, owing to the purely display-oriented design of the PDF document standard. While a variety of lower-level PDF ...
Comments