skip to main content
Tabular abstraction, editing, and formatting
Publisher:
  • University of Waterloo
  • Computer Science Dept. University Avenue Waterloo, Ont. N2L 3G1
  • Canada
ISBN:978-0-612-09397-3
Order Number:AAINN09397
Pages:
205
Bibliometrics
Skip Abstract Section
Abstract

This dissertation investigates the composition of high-quality tables with the use of electronic tools. A generic model is designed to support the different stages of tabular composition, including the editing of logical structure, the specification of layout structure, and the formatting of concrete tables. The model separates table's logical structure from its layout structure, which consists of tabular topology and typographic style. The notion of an abstract table, which describes the logical relationships among tabular items, is formally defined and a set of logical operations is proposed to manipulate tables based on these logical relationships. An abstract table can be visualized through a layout structure specified by a set of topological rules, which determine the relative placement of tabular items in two dimensions, and a set of style rules, which determine the final appearance of different items. The absolute placement of a concrete table can be automatically generated by applying a layout specification to an abstract line. An NP-complete problem arises in the formatting process that uses automatic line breaking and determines the physical dimension of a table to satisfy user-specified size constraints. An algorithm has been designed to solve the formatting problem in polynomial time for typical tables. Based on the tabular model, a prototype tabular composition system has been implemented in a UNIX, X Windows environment. This prototype provides an interactive interface to edit the logical structure, the topology and the styles of tables. It allows us to manipulate tables based on the logical relationships tabular items, regardless of where the items are placed in the layout structure, and capable of presenting a table in different topologies and styles so that we can select a high-quality layout structure.

Cited By

  1. Embley D, Krishnamoorthy M, Nagy G and Seth S (2016). Converting heterogeneous statistical tables on the web to searchable databases, International Journal on Document Analysis and Recognition, 19:2, (119-138), Online publication date: 1-Jun-2016.
  2. Shigarov A (2015). Table understanding using a rule engine, Expert Systems with Applications: An International Journal, 42:2, (929-937), Online publication date: 1-Feb-2015.
  3. ACM
    Rastan R, Paik H and Shepherd J TEXUS Proceedings of the 2015 ACM Symposium on Document Engineering, (25-34)
  4. ACM
    Chen J and Lopresti D Ruling-based table analysis for noisy handwritten documents Proceedings of the 4th International Workshop on Multilingual OCR, (1-5)
  5. ACM
    Bilauca M and Healy P Splitting wide tables optimally Proceedings of the 2013 ACM symposium on Document engineering, (249-252)
  6. ACM
    Göbel M, Hassan T, Oro E and Orsi G A methodology for evaluating algorithms for table understanding in PDF documents Proceedings of the 2012 ACM symposium on Document engineering, (45-48)
  7. ACM
    Bilauca M and Healy P Building table formatting tools Proceedings of the 11th ACM symposium on Document engineering, (13-22)
  8. ACM
    Chiousemoglou M and Jürgensen H Setting the table for the blind Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, (1-8)
  9. Embley D, Krishnamoorthy M, Nagy G and Seth S Factoring web tables Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I, (253-263)
  10. ACM
    Seth S, Jandhyala R, Krishnamoorthy M and Nagy G Analysis and taxonomy of column header categories for web tables Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, (81-88)
  11. ACM
    Doush I and Pontelli E Detecting and recognizing tables in spreadsheets Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, (471-478)
  12. ACM
    Bilauca M and Healy P A new model for automated table layout Proceedings of the 10th ACM symposium on Document engineering, (169-176)
  13. Jandhyala R, Krishnamoorthy M, Nagy G, Padmanabhan R, Seth S and Silversmith W From Tessellations to Table Interpretation Proceedings of the 16th Symposium, 8th International Conference. Held as Part of CICM '09 on Intelligent Computer Mathematics, (422-437)
  14. Padmanabhan R, Jandhyala R, Krishnamoorthy M, Nagy G, Seth S and Silversmith W Interactive conversion of web tables Proceedings of the 8th international conference on Graphics recognition: achievements, challenges, and evolution, (25-36)
  15. ACM
    Gatterbauer W, Bohunsky P, Herzog M, Krüpl B and Pollak B Towards domain-independent information extraction from web tables Proceedings of the 16th international conference on World Wide Web, (71-80)
  16. ACM
    Liu Y, Bai K, Mitra P and Giles C TableSeer Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, (91-100)
  17. Tao C and Embley D Automatic hidden-web table interpretation by sibling page comparison Proceedings of the 26th international conference on Conceptual modeling, (566-581)
  18. Pivk A, Cimiano P, Sure Y, Gams M, Rajkovič V and Studer R (2007). Transforming arbitrary tables into logical form with TARTAR, Data & Knowledge Engineering, 60:3, (567-595), Online publication date: 1-Mar-2007.
  19. Xue Y, Hu Y, Xin G, Song R, Shi S, Cao Y, Lin C and Li H (2007). Web page title extraction and its application, Information Processing and Management: an International Journal, 43:5, (1332-1347), Online publication date: 1-Sep-2007.
  20. Wu D and Lee K A grammatical approach to understanding textual tables using two-dimensional SCFGs Proceedings of the COLING/ACL on Main conference poster sessions, (905-912)
  21. Embley D, Lopresti D and Nagy G Notes on contemporary table recognition Proceedings of the 7th international conference on Document Analysis Systems, (164-175)
  22. Holzinger W, Krüpl B and Herzog M Using ontologies for extracting product features from web pages Proceedings of the 5th international conference on The Semantic Web, (286-299)
  23. Xia S, Sun D, Sun C and Chen D A collaborative table editing technique based on transparent adaptation Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I, (576-592)
  24. Lin X Active Document Layout Synthesis Proceedings of the Eighth International Conference on Document Analysis and Recognition, (86-90)
  25. ACM
    Jacobs C, Li W, Schrier E, Bargeron D and Salesin D Adaptive grid-based document layout ACM SIGGRAPH 2003 Papers, (838-847)
  26. ACM
    Jacobs C, Li W, Schrier E, Bargeron D and Salesin D (2003). Adaptive grid-based document layout, ACM Transactions on Graphics, 22:3, (838-847), Online publication date: 1-Jul-2003.
  27. ACM
    Cohen W, Hurst M and Jensen L A flexible learning system for wrapping tables and lists in HTML documents Proceedings of the 11th international conference on World Wide Web, (232-241)
  28. ACM
    Silberhorn H TabulaMagica Proceedings of the 2001 ACM Symposium on Document engineering, (68-75)
  29. ACM
    Wang H, Wu S, Wang I, Sung C, Hsu W and Shih W Semantic search on Internet tabular information extraction for answering queries Proceedings of the ninth international conference on Information and knowledge management, (243-249)
  30. ACM
    Anderson R and Sobti S The table layout problem Proceedings of the fifteenth annual symposium on Computational geometry, (115-123)
  31. Hurst M and Douglas S Layout & language Proceedings of the fifth conference on Applied natural language processing, (217-220)
Contributors
  • University of Waterloo
  • University of Waterloo

Recommendations