skip to main content
Subtopic Structuring for Full-Length Document AccessJanuary 1993
1993 Technical Report
Publisher:
  • University of California at Berkeley
  • Computer Science Division 571 Evans Hall Berkeley, CA
  • United States
Published:01 January 1993
Bibliometrics
Skip Abstract Section
Abstract

We argue that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure on full- length text documents; that is, a partition of the text into coherent multi-paragraph units that represent the pattern of sub- topics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard 1R measure.

Contributors
  • University of California, Berkeley
  • NASA Ames Research Center

Recommendations