Abstract
Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spell-checking, and gene identification applications to name but a few. Unfortunately, real-world sequence data often exhibit long range dependencies which can only be captured by computationally challenging, complex models. Sequence data arising from natural processes also often exhibits power-law properties, yet common sequence models do not capture such properties. The sequence memoizer is a new hierarchical Bayesian model for discrete sequence data that captures long range dependencies and power-law characteristics, while remaining computationally attractive. Its utility as a language model and general purpose lossless compressor is demonstrated.
- Bartlett, N., Pfau, D., Wood, F. Forgetting counts: Constant memory inference for a dependent hierarchical Pitman--Yor process. In 27th International Conference on Machine Learning, to appear (2010).Google Scholar
- Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3 (2003), 1137--1155. Google ScholarDigital Library
- Chen, S.F., Goodman, J. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 4 (1999), 359--394.Google ScholarDigital Library
- Cleary, J.G., Teahan, W.J. Unbounded length contexts for PPM. Comput. J. 40 (1997), 67--75.Google ScholarCross Ref
- Doucet, A., de Freitas, N., Gordon, N.J. Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. Springer-Verlag, New York, May 2001.Google Scholar
- Gasthaus, J., Teh, Y.W. Improvements to the sequence memoizer. In Advances in Neural Information Processing Systems 23, to appear (2010).Google Scholar
- Gasthaus, J., Wood, F., Teh, Y.W. Lossless compression based on the sequence memoizer. Data Compression Conference 2010. J.A. Storer, M.W. Marcellin, eds. Los Alamitos, CA, USA, 2010, 337--345. IEEE Computer Society. Google ScholarDigital Library
- Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. Bayesian data analysis. Chapman & Hall, CRC, 2nd edn, 2004.Google Scholar
- Giegerich, R., Kurtz, S. From Ukkonen to McCreight and Weiner: A unifying view of linear-lime suffix tree construction. Algorithmica 19, 3 (1997), 331--353.Google ScholarCross Ref
- Goldwater, S., Griffiths, T.L., Johnson, M. Interpolating between types and tokens by estimating power law generators. In Advances in Neural Information Processing Systems 18 (2006), MIT Press, 459--466.Google Scholar
- MacKay, D.J.C., Peto, L.B. A hierarchical Dirichlet language model. Nat. Lang. Eng. 1, 2 (1995), 289--307.Google ScholarCross Ref
- Mahoney, M. Large text compression benchmark. URL: http://www.mattmahoney.net/text/text.html (2009).Google Scholar
- Mnih, A., Yuecheng, Z., Hinton, G. Improving a statistical language model through non-linear prediction. Neurocomputing 72, 7--9 (2009), 1414--1418. Google ScholarDigital Library
- Pitman, J. Coalescents with multiple collisions. Ann. Probab. 27 (1999), 1870--1902.Google ScholarCross Ref
- Pitman, J., Yor, M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 (1997), 855--900.Google ScholarCross Ref
- Robert, C.P., Casella, G. Monte Carlo Statistical Methods. Springer Verlag, 2004. Google ScholarCross Ref
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. (reprinted in ACM SIGMOBILE Mobile Computing and Communications Review 2001) (1948). Google ScholarDigital Library
- Teh, Y.W. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the Association for Computational Linguistics (2006), 985--992. Google ScholarDigital Library
- Willems, F.M.J. The context-tree weighting method: Extensions. IEEE Trans. Inform. Theory 44, 2 (1998), 792--798. Google ScholarDigital Library
- Willems, F.M.J. CTW website. URL: http://www.ele.tue.nl/ctw/ (2009).Google Scholar
- Wood, F., Archambeau, C., Gasthaus, J., James, L., Teh, Y.W. A stochastic memoizer for sequence data. In 26th International Conference on Machine Learning (2009), 1129--1136. Google ScholarDigital Library
- Wu, P., Teahan, W.J. A new PPM variant for Chinese text compression. Nat. Lang. Eng. 14, 3 (2007), 417--430. Google ScholarDigital Library
- Zipf, G. Selective Studies and the Principle of Relative Frequency in Language. Harvard University Press, Cambridge, MA, 1932.Google ScholarCross Ref
Index Terms
- The sequence memoizer
Recommendations
Improvements to the sequence memoizer
NIPS'10: Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-...
Zero-Correlation Zone Sequence Set Constructed from a Perfect Sequence and a Complementary Sequence Pair
The present paper introduces the construction of a class of sequence sets with zero-correlation zones called zero-correlation zone sequence sets. The proposed zero-correlation zone sequence set can be generated from an arbitrary perfect sequence and an ...
Binary Zero-Correlation Zone Sequence Set Constructed from an M-Sequence
The present paper introduces an improved construction of a class of binary sequences having a zero-correlation zone (hereafter binary ZCZ sequence set). The cross-correlation function and the side-lobe of the auto-correlation function of the proposed ...
Comments