skip to main content
Hardware for searching very large text databases
Publisher:
  • University of Illinois at Urbana-Champaign
  • Champaign, IL
  • United States
Order Number:AAI8108534
Pages:
155
Bibliometrics
Skip Abstract Section
Abstract

The problems involved in searching very large text databases are discussed. It is shown that conventional techniques for searching current databases cannot be scaled up to larger ones, and that it is necessary to build hardware to search the database in parallel if reasonable search times are expected. The part of the search process requiring the highest bandwidth is scanning the database to detect instances of search terms. Methods from the literature of doing this in hardware are examined, problems with using them in large systems are discussed, and design criteria to be met by a successful search architecture are defined.

A new model for text searching, using a nondeterministic finite state automaton (NFSA) to control matching, is introduced. First the NFSA model itself is discussed. Examples are given showing how it can be used to search for a wide variety of textual patterns. Next, implementation of the NFSA Searcher in hardware is discussed. By partitioning the nondeterministic state table on the basis of pairwise compatibilities and assigning blocks of states to interconnected sub-machines, it is shown how the NFSA can be built with simple logic in a manner that lends itself to LSI implementation.

It is of critical importance that it be possible to quickly partition that state table for a group of search patterns. Methods for partitioning tables efficiently are developed and their performance is analyzed. Methods of detecting instances of higher-level search expressions from instances of their component patterns detected by the NFSA Searcher are also discussed. Finally, the configuration and performance of the search system as a function of user load and other paramenters is discussed. By comparing the hardware required and response time afforded by the NFSA Searcher with that for an alternative implementation, it is seen that the NFSA Searcher is a significant advance in architecture for text search applications.

Contributors
  • IBM Research - Almaden

Recommendations