skip to main content
research-article
Free Access

The sequence memoizer

Published:01 February 2011Publication History
Skip Abstract Section

Abstract

Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spell-checking, and gene identification applications to name but a few. Unfortunately, real-world sequence data often exhibit long range dependencies which can only be captured by computationally challenging, complex models. Sequence data arising from natural processes also often exhibits power-law properties, yet common sequence models do not capture such properties. The sequence memoizer is a new hierarchical Bayesian model for discrete sequence data that captures long range dependencies and power-law characteristics, while remaining computationally attractive. Its utility as a language model and general purpose lossless compressor is demonstrated.

References

  1. Bartlett, N., Pfau, D., Wood, F. Forgetting counts: Constant memory inference for a dependent hierarchical Pitman--Yor process. In 27th International Conference on Machine Learning, to appear (2010).Google ScholarGoogle Scholar
  2. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3 (2003), 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chen, S.F., Goodman, J. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 4 (1999), 359--394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cleary, J.G., Teahan, W.J. Unbounded length contexts for PPM. Comput. J. 40 (1997), 67--75.Google ScholarGoogle ScholarCross RefCross Ref
  5. Doucet, A., de Freitas, N., Gordon, N.J. Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. Springer-Verlag, New York, May 2001.Google ScholarGoogle Scholar
  6. Gasthaus, J., Teh, Y.W. Improvements to the sequence memoizer. In Advances in Neural Information Processing Systems 23, to appear (2010).Google ScholarGoogle Scholar
  7. Gasthaus, J., Wood, F., Teh, Y.W. Lossless compression based on the sequence memoizer. Data Compression Conference 2010. J.A. Storer, M.W. Marcellin, eds. Los Alamitos, CA, USA, 2010, 337--345. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. Bayesian data analysis. Chapman & Hall, CRC, 2nd edn, 2004.Google ScholarGoogle Scholar
  9. Giegerich, R., Kurtz, S. From Ukkonen to McCreight and Weiner: A unifying view of linear-lime suffix tree construction. Algorithmica 19, 3 (1997), 331--353.Google ScholarGoogle ScholarCross RefCross Ref
  10. Goldwater, S., Griffiths, T.L., Johnson, M. Interpolating between types and tokens by estimating power law generators. In Advances in Neural Information Processing Systems 18 (2006), MIT Press, 459--466.Google ScholarGoogle Scholar
  11. MacKay, D.J.C., Peto, L.B. A hierarchical Dirichlet language model. Nat. Lang. Eng. 1, 2 (1995), 289--307.Google ScholarGoogle ScholarCross RefCross Ref
  12. Mahoney, M. Large text compression benchmark. URL: http://www.mattmahoney.net/text/text.html (2009).Google ScholarGoogle Scholar
  13. Mnih, A., Yuecheng, Z., Hinton, G. Improving a statistical language model through non-linear prediction. Neurocomputing 72, 7--9 (2009), 1414--1418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Pitman, J. Coalescents with multiple collisions. Ann. Probab. 27 (1999), 1870--1902.Google ScholarGoogle ScholarCross RefCross Ref
  15. Pitman, J., Yor, M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 (1997), 855--900.Google ScholarGoogle ScholarCross RefCross Ref
  16. Robert, C.P., Casella, G. Monte Carlo Statistical Methods. Springer Verlag, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  17. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. (reprinted in ACM SIGMOBILE Mobile Computing and Communications Review 2001) (1948). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Teh, Y.W. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the Association for Computational Linguistics (2006), 985--992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Willems, F.M.J. The context-tree weighting method: Extensions. IEEE Trans. Inform. Theory 44, 2 (1998), 792--798. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Willems, F.M.J. CTW website. URL: http://www.ele.tue.nl/ctw/ (2009).Google ScholarGoogle Scholar
  21. Wood, F., Archambeau, C., Gasthaus, J., James, L., Teh, Y.W. A stochastic memoizer for sequence data. In 26th International Conference on Machine Learning (2009), 1129--1136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Wu, P., Teahan, W.J. A new PPM variant for Chinese text compression. Nat. Lang. Eng. 14, 3 (2007), 417--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zipf, G. Selective Studies and the Principle of Relative Frequency in Language. Harvard University Press, Cambridge, MA, 1932.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The sequence memoizer

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Communications of the ACM
              Communications of the ACM  Volume 54, Issue 2
              February 2011
              115 pages
              ISSN:0001-0782
              EISSN:1557-7317
              DOI:10.1145/1897816
              Issue’s Table of Contents

              Copyright © 2011 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 February 2011

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Popular
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format