research-article

Statistical machine translation enhancements through linguistic levels: A survey

Authors:
Marta R. Costa-Jussà

Institute for Infocomm Research, Singapore

Institute for Infocomm Research, Singapore
View Profile

,
Mireia Farrús

Universitat Pompeu Fabra, Barcelona

Universitat Pompeu Fabra, Barcelona
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 46 Issue 3Article No.: 42pp 1–28https://doi.org/10.1145/2518130

Published:01 January 2014Publication History

ACM Computing Surveys

Abstract

Machine translation can be considered a highly interdisciplinary and multidisciplinary field because it is approached from the point of view of human translators, engineers, computer scientists, mathematicians, and linguists. One of the most popular approaches is the Statistical Machine Translation (smt) approach, which tries to cover translation in a holistic manner by learning from parallel corpus aligned at the sentence level. However, with this basic approach, there are some issues at each written linguistic level (i.e., orthographic, morphological, lexical, syntactic and semantic) that remain unsolved. Research in smt has continuously been focused on solving the different linguistic levels challenges. This article represents a survey of how the smt has been enhanced to perform translation correctly at all linguistic levels.

References

A. Ahmed and G. Hanneman. 2005. Syntax-based Statistical Machine Translation: A Review. Technical Report. Carnegie Mellon University. Retrieved from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-55/lti/Courses/734/Spring-08/Amr&percnt;2BGreg-survey-SSMT.pdfGoogle Scholar
Y. Al-Onaizan and K. Knight. 2002. Translating named entities using monolingual and bilingual resources. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL'02). Association for Computational Linguistics, Stroudsburg, PA, USA, 400--408. DOI: http://dx.doi.org/10.3115/1073083.1073150 Google ScholarDigital Library
H. Alshawi, S. Douglas, and S. Bangalore. 2000. Learning dependency translation models as collections of finite-state head transducers. Comput. Linguist. 26, 1 (March 2000), 45--60. DOI: http://dx.doi.org/10.1162/089120100561629 Google ScholarDigital Library
A. Aue, A. Menezes, B. Moore, C. Quirk, and E. Ringger. 2004. Statistical Machine Translation Using Labeled Semantic Dependency Graphs. In Proceedings of TMI 2004. 125--134.Google Scholar
E. Avramidis and P. Koehn. 2008. Enriching morphologically poor languages for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics and Human Language Technology (ACL-HLT'08). Association for Computational Linguistics, Stroudsburg, PA, 763--770.Google Scholar
A. Aw, M. Zhang, J. Xiao, and J. Su. 2006. A phrase-based statistical model for SMS text normalization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computer Linguistics, Stroudsburg, PA. DOI: http://dx.doi.org/P/P06/P06-2005.pdf Google ScholarDigital Library
N. Bach. 2012. Dependency Structures for Statistical Machine Translation. PhD dissertation. Carnegie Mellon University. Google ScholarDigital Library
I. Badr, R. Zbib, and J. Glass. 2009. Syntactic phrase reordering for English-to-Arabic statistical machine translation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL'09). Association for Computational Linguistics, Stroudsburg, PA, 86--93. Google ScholarDigital Library
L. Banarescu, C. Bonial, S. Cai, M. Georgescu, K. Griffitt, U. Hermjakob, K. Knight, P. Koehn, M. Palmer, and N. Schneider. 2013. Abstract meaning representation for sembanking. In Proceedings of the Linguistic Annotation Workshop. Association for Computational Linguistics, Stroudsburg, PA.Google Scholar
R. E. Banchs and M. R. Costa-jussà. 2011. A semantic feature for statistical machine translation. In Proceedings of the 5th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-5). Association for Computational Linguistics, Stroudsburg, PA, 126--134. Google ScholarDigital Library
S. Bangalore, P. Haffner, and S. Kanthak. 2007. Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07). Association for Computational Linguistics, Stroudsburg, PA, 152--159.Google Scholar
A. L. Berger, S. A. D. Pietra, and V. J. D. Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics 22, 1 (March 1996), 39--72. Google ScholarDigital Library
N. Bertoldi, M. Cettolo, and M. Federico. 2010. Statistical machine translation of texts with misspelled words. In Proceedings of the NAACL. 412--419. Google ScholarDigital Library
J. A. Bilmes and K. Kirchhoff. 2003. Factored language models and generalized parallel backoff. In Proceedings of the Conference of the Association for Computational Linguistics and Human Language Technology (NAACL-HLT'03). Association for Computational Linguistics, Stroudsburg, PA, 4--6. Google ScholarDigital Library
A. Birch and M. Osborne. 2010. LRscore for evaluating lexical and reordering quality in MT. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (WMT'10). Association for Computational Linguistics, Stroudsburg, PA, 327--332. Google ScholarDigital Library
A. Birch, M. Osborne, and P. Koehn. 2007. CCG Supertags in Factored Translation Models. In Proceedings of the Workshop on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA. Google ScholarDigital Library
H. C. Boas. 2002. Bilingual FrameNet dictionaries for machine translation. In Proceedings of the 3rd International Conference on Language Resources and Evaluation. 1364--1371.Google Scholar
O. Bojar, M. Ercegovčević, M. Popel, and O. Zaidan. 2011. A grain of salt for the WMT manual evaluation output. In Proceedings of the EMNLP 6th Workshop on Statistical Machine Translation (WMT'11). 1--11. Google ScholarDigital Library
O. Bojar and A. Tamchyna. 2011. Forms wanted: Training SMT on monolingual Data. In Proceedings of the Workshop of Machine Translation and Morphologically-Rich Languages.Google Scholar
T. Brants. 2000. A statistical part-of-speech tagger. In Proceedings of the 6th Applied Natural Language Processing Conference. Google ScholarDigital Library
P. F. Brown, S. A. D. Pietra, V. J. D. Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 2 (1993), 263--311. Google ScholarDigital Library
M. Carpuat and D. Wu. 2007. Context-dependent phrasal translation lexicons for statistical machine translation. In Proceedings of the Machine Translation Summit XI.Google Scholar
M. Carpuat and D. Wu. 2008. Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'08).Google Scholar
Y. S. Chan, H. T. Ng, and D. Chiang. 2007. Word Sense disambiguation improves statistical machine translation. In Proceedings of the 45th Annual meeting of the Association for Computational Linguistics (ACL'07). Association for Computational Linguistics, Stroudsburg, PA, 33--40.Google Scholar
P. Charoenpornsawat, V. Sornlertlamvanich, and T. Charoenporn. 2002. Improving translation quality of rule-based machine translation. In Proceedings of the 2002 COLING Workshop on Machine translation in Asia, Volume 16 (COLING-MTIA'02). Association for Computational Linguistics, Stroudsburg, PA, 1--6. DOI: http://dx.doi.org/10.3115/1118794.1118799 Google ScholarDigital Library
S. F. Chen and J. Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL'96). Association for Computational Linguistics, Stroudsburg, PA, 310--318. DOI: http://dx.doi.org/10.3115/981863.981904 Google ScholarDigital Library
D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL'05). Association for Computational Linguistics, Stroudsburg, PA, 263--270. Google ScholarDigital Library
D. Chiang. 2007. Hierarchical phrase-based translation. Comput. Linguist. 33, 2 (June 2007), 201--228. DOI: http://dx.doi.org/10.1162/coli.2007.33.2.201 Google ScholarDigital Library
D. Chiang, K. Knight, and W. Wang. 2009. 11,001 new features for statistical machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'09). Association for Computational Linguistics, Stroudsburg, PA, 218--226. Google ScholarDigital Library
M. Collins, P. Koehn, and I. Kucerova. 2005. Clause restructuring for statistical machine translation. In Proceedings of the Annual Conference of the Association for Computational Lingusitics (ACL'05). Association for Computational Linguistics, Stroudsburg, PA. Google ScholarDigital Library
M. R. Costa-Jussà. 2012. An overview of the phrase-based statistical machine translation techniques. Knowledge Eng. Review 27, 4 (2012), 413--431. Google ScholarDigital Library
M. R. Costa-jussà, R. E. Banchs, E. Rapp, P. Lambert, K. Eberle, and B. Babych. 2013. Workshop on hybrid approaches to translation: Overview and developments. In Proceedings of the ACL 2nd Workshop on Hybrid Approaches to Translation (HyTra'13). Association for Computational Linguistics, Stroudsburg, PA.Google Scholar
M. R. Costa-Jussà and J. A. R. Fonollosa. 2009. State-of-the-art word reordering approaches in statistical machine translation: A survey. IEICE Transactions on Information and Systems 92, 11 (2009), 2179--2185.Google ScholarCross Ref
B. A. Cowan. 2008. A Tree-to-Tree Model for Statistical Machine Translation. Ph.D. Dissertation. Standford University. Google ScholarDigital Library
M. Creutz and K. Lagus. 2005. Inducing the morphological lexicon of a natural language from unannotated text. In Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'05).Google Scholar
A. de Gispert, S. Virpioja, M. Kurimo, and W. Byrne. 2009. Minimum Bayes risk combination of translation hypotheses from alternative morphological decompositions. In Proceedings of the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers. Association for Computational Linguistics, Stroudsburg, PA, 73--76. Google ScholarDigital Library
M. Diab, M. Ghoneim, and N. Habash. 2007. Arabic diacritization in the context of statistical machine translation. In Proceedings of the Machine Translation Summit XI. 143--149.Google Scholar
Y. Ding and M. Palmer. 2005. Machine translation using probabilistic synchronous dependency insertion grammars. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL'05). Association for Computational Linguistics, Stroudsburg, PA, 541--548. DOI: http://dx.doi.org/10.3115/1219840.1219907 Google ScholarDigital Library
A. Eisele, C. Federmann, H. Saint-Amand, M. Jellinghaus, T. Herrmann, and Y. Chen. 2008. Using Moses to integrate multiple rule-based machine translation engines into a hybrid system. In Proceedings of the 3rd Workshop on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA, 179--182. Google ScholarDigital Library
I. D. El-Kahlout and K. Oflazer. 2010. Exploiting morphology and local wword reordering in English-to-Turkish phrase-based statistical machine translation. IEEE Transactions on Audio, Speech & Language Processing 18, 6 (2010), 1313--1322. Google ScholarDigital Library
A. El Kholy and N. Habash. 2012. Orthographic and morphological processing for English-Arabic statistical machine translation. Machine Translation 26, 1--2 (2012), 25--45. DOI: http://dx.doi.org/10.1007/s10590-011-9110-0 Google ScholarDigital Library
J. Elming. 2008. Syntactic Reordering in Statistical Machine Translation. PhD dissertation. Copenhaguen Business School.Google Scholar
C. España-Bonet, J. Giménez, and L. Màrquez. 2009. Discriminative phrase-based models for Arabic machine yranslation. ACM Transactions on Asian Language Information Processing Journal 8, 4 (March 2009), Article 15. 20 pages. DOI: http://dx.doi.org/10.1145/1644879.1644882 Google ScholarDigital Library
C. España-Bonet, G. Labaka, A. D. de Ilarraza, L. Màrquez, and K. Sarasola. 2011. Hybrid Machine Translation Guided by a Rule-Based System. In Proceedings of the 13th Machine Translation Summit. 554--561.Google Scholar
M. Farrús, M. R. Costa-Jussà, J. B. Marino, M. Poch, A. Hernandez, C. Henríquez, and J. A. R. Fonollosa. 2011. Overcoming statistical machine translation limitations: Error analysis and proposed solutions for the Catalan-Spanish language pair. Language Resources and Evaluation (2011), 181--208. Google ScholarDigital Library
M. Farrús, M. R. Costa-jussà, J. B. Marino, and J. A. R. Fonollosa. 2010. Linguistic-based evaluation criteria to identify statistical machine translation errors. In Proceedings of the 14th Annual Conference of the European Association for Machine Translation (EAMT'10). 167--173.Google Scholar
M. Farrús, M. R. Costa-jussà, and M. Popović. 2012. Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations. J. Am. Soc. Inf. Sci. Technol. 63, 1 (Jan. 2012), 174--184. DOI: http://dx.doi.org/10.1002/asi.21674 Google ScholarDigital Library
M. Felice and L. Specia. 2012. Linguistic features for quality estimation. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA, 96--103. Google ScholarDigital Library
M. Flanagan. 1994. Error classification for MT evaluation. In Proceedings of the 1st Conference of the Association for Machine Translation in the Americas (1994), 65--72.Google Scholar
M. L. Forcada, F. M. Tyers, and G. Ramírez-Sánchez. 2009. The Apertium machine translation platform: Five years on. In Proceedings of the 1st International Workshop on Free/Open-Source Rule-Based Machine Translation, Juan Antonio Prez-Ortiz, Felipe Snchez-Martnez, and Francis M. Tyers (Eds.). Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, 3--10.Google Scholar
Ll. Formiga, A. Hernández, J. B. Mariño, and E. Monte. 2012. Improving English to Spanish out-of-domain translations by morphology generalization and generation. In Proceedings of the AMTA Workshop on Monolingual Machine Translation.Google Scholar
G. Foster, P. Isabelle, and R. Kuhn. 2010. Translating structured documents. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas.Google Scholar
A. Fraser and D. Marcu. 2007. Measuring word alignment quality for statistical machine translation. Computational Linguistics (2007), 293--303. Google ScholarDigital Library
P. Fung and P. Cheung. 2004. Mining very-non-parallel corpora: Parallel sentence and lexicon extraction via bootstrapping and E. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP'04). 57--63.Google Scholar
M. Galley, J. Graehl, K. Knight, D. Marcu, S. DeNeefe, W. Wang, and I. Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44). Association for Computational Linguistics, Stroudsburg, PA, 961--968. DOI: http://dx.doi.org/10.3115/1220175.1220296 Google ScholarDigital Library
M. Galley, M. Hopkins, K. Knight, and D. Marcu. 2004. What's in a translation rule&quest; In Proceedings of the 2004 Annual Conference of the North American Chapter of the Association for Computational Linsuitics (NAACL HLT 2004), Daniel Marcu Susan Dumais and Salim Roukos (Eds.). Association for Computational Linguistics, Stroudsburg, PA, 273--280.Google Scholar
I. García-Varea, F. J. Och, H. Ney, and F. Casacuberta. 2001. Refined lexicon models for statistical machine translation using a maximum entropy approach. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics and 10th Conference of the European Chapter of the ASsociation for Computational Linguistics (ACL/EACL'01). Association for Computational Linguistics, Stroudsburg, PA. Google ScholarDigital Library
D. Genzel. 2010. Automatically learning source-side reordering rules for large scale machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING'10). Association for Computational Linguistics, Stroudsburg, PA, 376--384. Google ScholarDigital Library
U. Germann. 2012. Syntax-aware phrase-based statistical machine translation: System description. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA, 292--297. Google ScholarDigital Library
J. Giménez and L. Màrquez. 2007. Linguistic features for automatic evaluation of heterogenous MT systems. In Proceedings of the 2nd Workshop on Statistical Machine Translation (StatMT'07). Association for Computational Linguistics, Stroudsburg, PA, 256--264. Google ScholarDigital Library
J. Graehl, K. Knight, and J. May. 2008. Training tree transducers. Comput. Linguist. 34, 3 (Sept. 2008), 391--427. DOI: http://dx.doi.org/10.1162/coli.2008.07-051-R2-03-57 Google ScholarDigital Library
S. Green and J. DeNero. 2012. A Class-based agreement model for generating accurately inflected translations. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA. Google ScholarDigital Library
R. Haque. 2011. Integrating Source-Language Context into Log-linear Models of Statistical Machine Translation. Ph.D. Dissertation. Dublin City University.Google Scholar
C. Hardmeier. 2012. Discourse in Statistical Machine Translation: A Survey and a Case Study. Discours 11 (2012). http://discours.revues.org/8726.Google Scholar
C. Hardmeier and M. Federico. 2010. Modelling pronominal anaphora in statistical machine translation. In Proceedings of the 7th International Workshop on Spoken Language Translation (IWSLT'10), Marcello Federico, Ian Lane, Michael Paul, and François Yvon (Eds.). 283--289.Google Scholar
R. R. Hausser. 2001. Foundations of Computational Linguistics: Human-Computer Communication in Natural Language. Springer. Google ScholarDigital Library
S. Helmreich and D. Farwell. 1998. Translation differences and pragmatics-based MT. Machine Translation 13, 1 (1998), 17--39. DOI: http://dx.doi.org/10.1023/A:1008062303478 Google ScholarDigital Library
H. Hoang and A. Lopez. 2009. A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT'09). 152--159.Google Scholar
C. Huang, H. Yen, P. Yang, S. Huang, and J. S. Chang. 2011. Using sublexical translations to handle the OOV problem in machine translation. 10, 3, Article 16 (Sept. 2011), 20 pages. http://dx.doi.org/10.1145/2002980.2002986 Google ScholarDigital Library
L. Huang, K. Knight, and A. Joshi. 2006. A syntax-directed translator with extended domain of locality. In Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing (CHSLP'06). Association for Computational Linguistics, Stroudsburg, PA, 1--8. Google ScholarDigital Library
W. J. Hutchins. 1995. Machine translation: A brief history. In Concise History of the Language Sciences: From the Sumerians to the Cognitivists. Pergamon Press, 431--445.Google Scholar
W. J. Hutchins. 2005. The History of Machine Translation in a Nutshell. Retrieved from http://ourworld.compuserve.com/homepages/WJHutchins/Nutshell.htm.Google Scholar
V. Istvan and Y. Shoichi. 2009. Bilingual dictionary generation for low-resourced language pairs. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Vol. 2. 862--870. Google ScholarDigital Library
P. Karageorgakis, A. Potamianos, and K. Ioannis. 2005. Towards incorporating language morphology into statistical machine translation systems. In Proceedings of the Automatic Speech Recognition and Understanding Workshop.Google Scholar
M. Khalilov and J. A. R. Fonollosa. 2011. Syntax-based reordering for statistical machine translation. Computer Speech and Language Journal 25, 4 (October 2011). Google ScholarDigital Library
R. Kneser and H. Ney. 1995. Improved backing-off for n-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 49--52.Google Scholar
K. Knight and J. Graehl. 1998. Machine transliteration. Comput. Linguist. 24, 4 (Dec. 1998), 599--612. Google ScholarDigital Library
Catherine Kobus, François Yvon, and Géraldine Damnati. 2008. Normalizing SMS: Are two metaphors better than one&quest; In Proceedings of the 22nd International Conference on Computational Linguistics, Proceedings of the Conference (COLING'08). 441--448. DOI: http://dx.doi.org/anthology/C08-1056 Google ScholarDigital Library
P. Koehn and H. Hoang. 2007. Factored translation models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL'07). Association for Computational Linguistics, Stroudsburg, PA, 868--876.Google Scholar
P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL'07). Association for Computational Linguistics, Stroudsburg, PA, 177--180. Google ScholarDigital Library
P. Koehn and K. Knight. 2003. Empirical methods for compound splitting. In Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics, Volume 1 (EACL'03). Association for Computational Linguistics, Stroudsburg, PA, 187--193. DOI: http://dx.doi.org/10.3115/1067807.1067833 Google ScholarDigital Library
P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Annual Conference of the Association for Computational Lingusitics (ACL03). Association for Computational Linguistics, Stroudsburg, PA, USA. Google ScholarDigital Library
G. Kondrak. 2005. Cognates and word alignment in bitexts. In Proceedings of the 10th Machine Translation Summit. 305--312.Google Scholar
G. Kondrak, D. Marcu, and K. Knight. 2003. Cognates can improve statistical translation models. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume of the Proceedings of HLT-NAACL 2003--Short Papers, Volume 2 (NAACL-Short'03). Association for Computational Linguistics, Stroudsburg, PA, 46--48. DOI: http://dx.doi.org/10.3115/1073483.1073499 Google ScholarDigital Library
A. Kumaran and T. Kellner. 2007. A generic framework for machine transliteration. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). ACM, New York, NY, 721--722. DOI: http://dx.doi.org/10.1145/1277741.1277876 Google ScholarDigital Library
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML'01). Morgan Kaufmann, San Francisco, CA, 282--289. Google ScholarDigital Library
T. K. Landauer, D. Laham, and P. Foltz. 1998. Learning human-like knowledge by singular value decomposition: A progress report. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 45--51. Google ScholarDigital Library
P. Langlais and F. Gotti. 2006. Phrase-based SMT with shallow tree-phrases. In Proceedings of the Workshop on Statistical Machine Translation (StatMT'06). Association for Computational Linguistics, Stroudsburg, PA, 39--46. Google ScholarDigital Library
P. Langlais and A. Patry. 2007. Translating unknown words by analogical learning. In EMNLP-CoNLL (2010-06-04). ACL, 877--886.Google Scholar
A. Lavie and A. Agarwal. 2007. METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the 2nd Workshop on Statistical Machine Translation (StatMT'07). Association for Computational Linguistics, Stroudsburg, PA, 228--231. Google ScholarDigital Library
R. L. Nagard and P. Koehn. 2010. Aiding pronoun translation with co-reference resolution. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics (MATR'10). Association for Computational Linguistics, Stroudsburg, PA, 258--267. Google ScholarDigital Library
C. Li, N. Duan, Y. Zhao, S. Liu, L. Cui, M. Hwang, A. Axelrod, J. Gao, Y. Zhang, and L. Deng. 2010. The MSRA machine translation system for IWSLT 2010. In Proceedings of the 7th International Workshop on Spoken Language Translation (IWSLT'10), 135--138.Google Scholar
C. Li, D. Zhang, M. Li, M. Zhou, M. Li, and Y. Guan. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the Annual Conference of the Association for Computational Lingusitics (ACL'07). Association for Computational Linguistics, Stroudsburg, PA, 720--727.Google Scholar
Z. Li and D. Yarowsky. 2008. Unsupervised translation induction for Chinese abbreviations using monolingual corpora. In ACL, Kathleen McKeown, Johanna D. Moore, Simone Teufel, James Allan, and Sadaoki Furui (Eds.). Association for Computer Linguistics, Stroudsburg, PA, 425--433.Google Scholar
L. V. Lita, A. Ittycheriah, S. Roukos, and N. Kambhatla. 2003. tRuEcasIng. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 152--159. Google ScholarDigital Library
Y. Liu, Q. Liu, and S. Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (ACL-44). Association for Computational Linguistics, Stroudsburg, PA, 609--616. DOI: http://dx.doi.org/10.3115/1220175.1220252 Google ScholarDigital Library
C. Lo and D. Wu. 2011. MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (HLT'11). Association for Computational Linguistics, Stroudsburg, PA, 220--229. Google ScholarDigital Library
LSA. 2013. Linguistic Society of America Homepage. Retrieved from http://www.linguisticsociety.org.Google Scholar
J. B. Mariño, R. E. Banchs, J. M. Crego, A. de Gispert, P. Lambert, J. A. R. Fonollosa, and M. R. Costa-jussà. 2006. N-gram-based Machine Translation. Comput. Linguist. 32, 4 (Dec. 2006), 527--549. DOI: http://dx.doi.org/10.1162/coli.2006.32.4.527 Google ScholarDigital Library
Y. Marton, C. Callison-Burch, and P. Resnik. 2009. Improved statistical machine translation using monolingually-derived paraphrases. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1 (EMNLP'09). Association for Computational Linguistics, Stroudsburg, PA, 381--390. Google ScholarDigital Library
I. A. McCowan, D. Moore, J. Dines, D. Gatica-Perez, M. Flynn, P. Wellner, and H. Bourlard. 2004. On the Use of Information Retrieval Measures for Speech Recognition Evaluation. Idiap-RR Idiap-RR-73-2004. IDIAP, Martigny, Switzerland.Google Scholar
A. Menezes and C. Quirk. 2008. Syntactic models for structural word insertion and deletion. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'08). Association for Computational Linguistics, Stroudsburg, PA, 735--744. Google ScholarDigital Library
A. Menezes and S. D. Richardson. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In Proceedings of the Workshop on Data-driven Methods in Machine Translation, Volume 14 (DMMT'01). Association for Computational Linguistics, Stroudsburg, PA, 1--8. DOI: http://dx.doi.org/10.3115/1118037.1118043 Google ScholarDigital Library
T. Meyer, A. Popescu-Belis, N. Hajlaoui, and A. Gesmundo. 2012. Machine translation of labeled discourse connectives. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas (AMTA'12). Retrieved from http://www.mt-archive.info/AMTA-2012-Meyer.pdf.Google Scholar
E. Minkov, K. Toutanova, and H. Suzuki. 2007. Generating complex morphology for machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA.Google Scholar
S. Mirkin, L. Specia, N. Cancedda, I. Dagan, M. Dymetman, and I. Szpektor. 2009. Source-language entailment modeling for translating unknown terms. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 (ACL'09). Association for Computational Linguistics, Stroudsburg, PA, 791--799. http://dl.acm.org/citation.cfm&quest;id=1690219.1690257 Google ScholarDigital Library
R. Mitkov, V. Pekar, D. Blagoev, and A. Mulloni. 2007. Methods for extracting and classifying pairs of cognates and false friends. Machine Translation 21, 1 (March 2007), 29--53. DOI: http://dx.doi.org/10.1007/s10590-008-9034-5 Google ScholarDigital Library
A. Mulloni and A. Pekar. 2006. Automatic detection of orthographic cues for cognate recognition. In Proceedings of the Conference on Language Resources and Evaluation.Google Scholar
P. Nakov and H. T. Ng. 2009. Improved statistical machine translation for resource-poor languages using related resource-rich languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP'09). ACL, 1358--1367. DOI: http://dx.doi.org/anthology/D09-1141 Google ScholarDigital Library
F. J. Och. 2003. Minimum Error Rate Training In Statistical Machine Translation. In 41th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 160--167. Google ScholarDigital Library
F. J. Och and H. Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 295--302. Google ScholarDigital Library
F. J. Och and H. Ney. 2004. The alignment template approach to statistical machine translation. Comput. Linguist. 30, 4 (Dec. 2004), 417--449. DOI: http://dx.doi.org/10.1162/0891201042544884 Google ScholarDigital Library
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL'02). Association for Computational Linguistics, Stroudsburg, PA, 311--318. DOI: http://dx.doi.org/10.3115/1073083.1073135 Google ScholarDigital Library
M. Popović, A. de Gispert, D. Gupta, P. Lambert, H. Ney, J. B. Mariño, M. Federico, and R. Banchs. 2006. Morpho-syntactic information for automatic rrror analysis of statistical machine translation output. In Proceedings on the Workshop on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA, 1--6. Google ScholarDigital Library
M. Popović and H. Ney. 2007. Word error rates: Decomposition over POS classes and applications for error analysis. In Proceedings of the 2nd Workshop on Statistical Machine Translation (StatMT'07). Association for Computational Linguistics, Stroudsburg, PA, 48--55. Google ScholarDigital Library
M. Popović and H. Ney. 2009. Syntax-oriented evaluation measures for machine translation output. In Proceedings of the 4th Workshop on Statistical Machine Translation (StatMT'09). Association for Computational Linguistics, Stroudsburg, PA, 29--32. Google ScholarDigital Library
M. Popović and H. Ney. 2011. Towards automatic error analysis of machine translation output. Comput. Linguist. 37, 4 (Dec. 2011), 657--688. DOI: http://dx.doi.org/10.1162/COLI_a_00072 Google ScholarDigital Library
C. Quirk, A. Menezes, and C. Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL'05). Association for Computational Linguistics, Stroudsburg, PA, 271--279. DOI: http://dx.doi.org/10.3115/1219840.1219874 Google ScholarDigital Library
A. Razmara. 2011. Application of Tree Transducers in Statistical Machine Translation. Technical Report. Depth Report, Simon Fraser University.Google Scholar
J. Riesa, B. Mohit, K. Knight, and D. Marcu. 2006. Building an English-Iraqi Arabic machine translation system for spoken utterances with limited resources. In Proceedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH'06).Google Scholar
E. Ringger, M. Gamon, R. C. Moore, D. Rojas, M. Smets, and S. Corston-Oliver. 2004. Linguistically informed statistical models of constituent structure for ordering in sentence realization. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04). Association for Computational Linguistics, Stroudsburg, PA, article 673. DOI: http://dx.doi.org/10.3115/1220355.1220452 Google ScholarDigital Library
R. Rosa, D. Mareček, and O. Dušek. 2012. DEPFIX: A system for automatic correction of Czech MT outputs. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA, 362--368. Google ScholarDigital Library
G. Salton and M. McGill. 1983. Introduction to Modern Information Retrieval. McGraw-Hill. Google ScholarDigital Library
L. Shao and H. T. Ng. 2004. Mining new word translations from comparable corpora. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04). Association for Computational Linguistics, Stroudsburg, PA, article 618. DOI: http://dx.doi.org/10.3115/1220355.1220444 Google ScholarDigital Library
L. Shen, J. Xu, and R. Weischedel. 2010. String-to-dependency statistical machine translation. Comput. Linguist. 36, 4 (Dec. 2010), 649--671. DOI: http://dx.doi.org/10.1162/coli_a_00015 Google ScholarDigital Library
L. Shen, B. Zhang, S. Matsoukas, and R. Weischedel. 2009. Effective use of linguistic and contextual information for statistical machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP'09). Association for Computational Linguistics, Stroudsburg, PA, 72--80. Google ScholarDigital Library
M. Simard, N. Cancedda, B. Cavestro, M. Dymetman, E. Gaussier, C. Goutte, K. Yamada, P. Langlais, and A. Mauser. 2005. Translating with non-contiguous phrases. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT'05). Association for Computational Linguistics, Stroudsburg, PA, 755--762. DOI: http://dx.doi.org/10.3115/1220575.1220670 Google ScholarDigital Library
D. A. Smith and J. Eisner. 2006. Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies. In Proceedings of the Workshop on Statistical Machine Translation (StatMT'06). Association for Computational Linguistics, Stroudsburg, PA, 23--30. Google ScholarDigital Library
M. G. Snover, N. Madnani, B. Dorr, and R. Schwartz. 2009. TER-Plus: Paraphrase, semantic, and alignment enhancements to translation edit rate. Machine Translation 23, 2--3 (Sept. 2009), 117--127. DOI: http://dx.doi.org/10.1007/s10590-009-9062-9 Google ScholarDigital Library
S. Stymne. 2011. BLAST: A tool for error analysis of machine translation output. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations. Association for Computer Linguistics, 56--61. Google ScholarDigital Library
2009 Thrumair. 2009. Comparing different architectures of hybrid machine translation systems. In Proceedings of the MT-Summit XII.Google Scholar
C. Tillman. 2004. A block orientation model for statistical machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL'04). Association for Computational Linguistics, Stroudsburg, PA. Google ScholarDigital Library
J. P. Turian, B. Wellington, and I. D. Melamed. 2006. Scalable Discriminative learning for natural language parsing and translation. In Proceedings of the 2006 Neural Information Processing Systems (NIPS'06). Bernhard Schlkopf, John Platt, and Thomas Hoffman (Eds.). MIT Press, 1409--1416.Google Scholar
N. Ueffing and H. Ney. 2003. Using POS information for statistical machine translation into morphologically rich languages. In Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics (EACL'03). Association for Computational Linguistics, Stroudsburg, PA, 347--354. Google ScholarDigital Library
A. Venugopal and A. Zollmann. 2009. Grammar based statistical MT on Hadoop: An end-to-end toolkit for large scale PSCFG based MT. In The Prague Bulletin of Mathematical Linguistics No. 91. 67--78.Google Scholar
D. Vilar. 2011. Investigations on Hierarchical Phrase-based Machine Translation. Ph.D. Dissertation. RWTH Aachen University, Aachen, Germany.Google Scholar
D. Vilar, J. Xu, L. Fernando-D'Haro, and H. Ney. 2006. Error analysis of statistical machine translation output. In Proceedings of the International Conference on Language Resources and Evaluation (LREC'06). 697--702.Google Scholar
P. Virga and S. Khudanpur. 2003. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition, Volume 15 (MultiNER'03). Association for Computational Linguistics, Stroudsburg, PA, 57--64. DOI: http://dx.doi.org/10.3115/1119384.1119392 Google ScholarDigital Library
S. Virpioja, J. J. Väyrynen, M. Creutz, and M. Sadeniemi. 2007. Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. In Proceedings of the Machine Translation Summit XI. 491--498.Google Scholar
C. Wang, M. Collins, and P. Koehn. 2007. Chinese syntactic reordering for statistical machinetranslation. In Empirical Methods in Natural Language Processing (EMNLP'07). Association for Computational Linguistics, Stroudsburg, PA.Google Scholar
W. Wang, K. Knight, and D. Marcu. 2006. Capitalizing machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. Association for Computational Linguistics, New York, NY, 1--8. Google ScholarDigital Library
B. Webber. 2012. Discourse and SMT: Where and How&quest; (Sept. 2012). Seventh Machine Translation Marathon 2012. Invited talk.Google Scholar
D. Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguist. 23, 3 (Sept. 1997), 377--403. Google ScholarDigital Library
D. Wu. 2009. Toward machine translation with statistics and syntax and semantics. In Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU'09). 12--21.Google ScholarCross Ref
D. Wu and P. Fung. 2009. Semantic roles for SMT: A hybrid two pass model. In Proceedings of the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT'09). Association for Computational Linguistics, Stroudsburg, PA. Google ScholarDigital Library
F. Xia and M. McCord. 2004. Improving a Statistical MT System with Automatically Learned Rewrite Patterns. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04). Association for Computational Linguistics, Stroudsburg, PA. Google ScholarDigital Library
K. Yamada and K. Knight. 2002. A decoder for syntax-based statistical MT. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL'02). Association for Computational Linguistics, Stroudsburg, PA, 303--310. DOI: http://dx.doi.org/10.3115/1073083.1073134 Google ScholarDigital Library
R. Zens, F. J. Och, and H. Ney. 2002. Phrase-Based Statistical Machine Translation. In Proceedings of the German Conference on Artificial Intelligence (KI'02). Springer-Verlag. Google ScholarDigital Library
H. Zhang and D. Gildea. 2005. Stochastic lexicalized inversion transduction grammar for alignment. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL'05). Association for Computational Linguistics, Stroudsburg, PA, USA, 475--482. DOI: http://dx.doi.org/10.3115/1219840.1219899 Google ScholarDigital Library
J. Zhang, F. Zhai, and C. Zhing. 2012. Handling unknown words in statistical machine translation from a new perspective. In Proceedings of the NLPCC.Google Scholar
M. Zhang, A. Aw H. Jiang, J. Sun, S. Li, and C. Tan. 2007. A tree-to-tree alignment-based model for SMT. In Proceedings of the MT-Summit. 535--542.Google Scholar
M. Zhang, H. Li, and J. Su. 2004. Direct orthographical mapping for machine transliteration. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04). Association for Computational Linguistics, Stroudsburg, PA, article 716. DOI: http://dx.doi.org/10.3115/1220355.1220458 Google ScholarDigital Library

Index Terms

Statistical machine translation enhancements through linguistic levels: A survey
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Read More
Dependency treelet translation: the convergence of statistical and example-based machine-translation?

We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with ...
Read More
A Survey of Orthographic Information in Machine Translation
Abstract
Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Computing Surveys Volume 46, Issue 3
January 2014
507 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2578702
Issue’s Table of Contents

Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 2014
- Accepted: 1 September 2013
- Revised: 1 June 2013
- Received: 1 March 2013
Published in csur Volume 46, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Lexis
Linguistics
morphology
orthography
semantics
statistical machine translation
syntax
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 1,160
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Syntactic discriminative language model rerankers for statistical machine translation

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

A Survey of Orthographic Information in Machine Translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Syntactic discriminative language model rerankers for statistical machine translation

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

A Survey of Orthographic Information in Machine Translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media