skip to main content
10.5555/1109557.1109690acmconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
Article

Efficient algorithms for substring near neighbor problem

Published:22 January 2006Publication History

ABSTRACT

In this paper we consider the problem of finding the approximate nearest neighbor when the data set points are the substrings of a given text T. Specifically, for a string T of length n, we present a data structure which does the following: given a pattern P, if there is a substring of T within the distance R from P, it reports a (possibly different) substring of T within distance cR from P. The length of the pattern P, denoted by m, is not known in advance. For the case where the distances are measured using the Hamming distance, we present a data structure which uses Õ(n1+1/c) space1 and with Õ(n1/c + mno(1)) query time. This essentially matches the earlier bounds of [Ind98], which assumed that the pattern length m is fixed in advance. In addition, our data structure can be constructed in time Õ(n1+1/c + n1+o(1)M1/3), where M is an upper bound for m. This essentially matches the preprocessing bound of [Ind98] as long as the term Õ(n1+1/c) dominates the running time, which is the case when, e.g., c < 3.We also extend our results to the case where the distances are measured according to the l1 distance. The query time and the space bound are essentially the same, while the preprocessing time becomes Õ(n1+1/c + n1+o(1)M2/3).

References

References are not available

Index Terms

  1. Efficient algorithms for substring near neighbor problem

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
            January 2006
            1261 pages
            ISBN:0898716055

            Publisher

            Society for Industrial and Applied Mathematics

            United States

            Publication History

            • Published: 22 January 2006

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate411of1,322submissions,31%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader