skip to main content
Skip header Section
Statistical Machine TranslationJanuary 2010
Publisher:
  • Cambridge University Press
  • 40 W. 20 St. New York, NY
  • United States
ISBN:978-0-521-87415-1
Published:18 January 2010
Pages:
446
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

This introductory text to statistical machine translation (SMT) provides all of the theories and methods needed to build a statistical machine translator, such as Google Language Tools and Babelfish. In general, statistical techniques allow automatic translation systems to be built quickly for any language-pair using only translated texts and generic software. With increasing globalization, statistical machine translation will be central to communication and commerce. Based on courses and tutorials, and classroom-tested globally, it is ideal for instruction or self-study, for advanced undergraduates and graduate students in computer science and/or computational linguistics, and researchers in natural language processing. The companion website provides open-source corpora and tool-kits.

Cited By

  1. ACM
    Lai H and Nissim M (2024). A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models, ACM Computing Surveys, 56:10, (1-34), Online publication date: 31-Oct-2024.
  2. ACM
    Huang Z, Chen J, Jiang J, Liang Y, You H and Li F (2024). Mapping APIs in Dynamic-typed Programs by Leveraging Transfer Learning, ACM Transactions on Software Engineering and Methodology, 33:4, (1-29), Online publication date: 31-May-2024.
  3. Mondal S, Zhang H, Kabir H, Ni K and Dai H (2023). Machine translation and its evaluation: a study, Artificial Intelligence Review, 56:9, (10137-10226), Online publication date: 1-Sep-2023.
  4. ACM
    Chakrabarty A, Dabre R, Ding C, Utiyama M and Sumita E (2023). Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms, ACM Transactions on Asian and Low-Resource Language Information Processing, 22:7, (1-36), Online publication date: 31-Jul-2023.
  5. ACM
    Bala Das S, Biradar A, Kumar Mishra T and Kr. Patra B (2023). Improving Multilingual Neural Machine Translation System for Indic Languages, ACM Transactions on Asian and Low-Resource Language Information Processing, 22:6, (1-24), Online publication date: 30-Jun-2023.
  6. ACM
    Shi X, Huang H, Jian P and Tang Y (2023). Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods, ACM Transactions on Asian and Low-Resource Language Information Processing, 22:5, (1-26), Online publication date: 31-May-2023.
  7. Liu F, Li J and Zhang L Syntax and Domain Aware Model for Unsupervised Program Translation Proceedings of the 45th International Conference on Software Engineering, (755-767)
  8. ACM
    Li D, Chen T, Zadikian A, Tung A and Chilton L Improving Automatic Summarization for Browsing Longform Spoken Dialog Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, (1-20)
  9. Kumar A, Mundotiya R, Pratap A and Singh A (2022). TLSPG, Journal of King Saud University - Computer and Information Sciences, 34:9, (6552-6563), Online publication date: 1-Oct-2022.
  10. Rivera-Trigueros I (2022). Machine translation systems and quality assessment: a systematic review, Language Resources and Evaluation, 56:2, (593-619), Online publication date: 1-Jun-2022.
  11. Satir E and Bulut H (2021). Preventing translation quality deterioration caused by beam search decoding in neural machine translation using statistical machine translation, Information Sciences: an International Journal, 581:C, (791-807), Online publication date: 1-Dec-2021.
  12. ACM
    Premjith B and Soman K (2021). Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level, ACM Transactions on Asian and Low-Resource Language Information Processing, 20:6, (1-17), Online publication date: 30-Nov-2021.
  13. ACM
    Wang Y, Wang Y, Dang K, Liu J and Liu Z (2021). A Comprehensive Survey of Grammatical Error Correction, ACM Transactions on Intelligent Systems and Technology, 12:5, (1-51), Online publication date: 31-Oct-2021.
  14. Shi X, Huang H, Jian P and Tang Y Reducing Length Bias in Scoring Neural Machine Translation via a Causal Inference Method Chinese Computational Linguistics, (3-15)
  15. ACM
    Lalrempuii C, Soni B and Pakray P (2021). An Improved English-to-Mizo Neural Machine Translation, ACM Transactions on Asian and Low-Resource Language Information Processing, 20:4, (1-21), Online publication date: 31-Jul-2021.
  16. Chakravarthi B, Rani P, Arcan M and McCrae J (2021). A Survey of Orthographic Information in Machine Translation, SN Computer Science, 2:4, Online publication date: 1-Jul-2021.
  17. ACM
    Dhamala J, Sun T, Kumar V, Krishna S, Pruksachatkun Y, Chang K and Gupta R BOLD Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, (862-872)
  18. Wang R and Ding B (2021). Research on Intelligent English Translation Method Based on the Improved Attention Mechanism Model, Scientific Programming, 2021, Online publication date: 1-Jan-2021.
  19. Duan C, Chen K, Wang R, Utiyama M, Sumita E, Zhu C and Zhao T (2021). Modeling Future Cost for Neural Machine Translation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 29, (770-781), Online publication date: 1-Jan-2021.
  20. ACM
    Duanzhu S, Zhang R and Jia C Bidirectional Boost: On Improving Tibetan-Chinese Neural Machine Translation With Back-Translation and Self-Learning Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence, (1-6)
  21. ACM
    Gros D, Sezhiyan H, Devanbu P and Yu Z Code to comment "translation" Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, (746-757)
  22. Daneshgar N and Sarmad M (2020). word.alignment: an R package for computing statistical word alignment and its evaluation, Computational Statistics, 35:4, (1597-1619), Online publication date: 1-Dec-2020.
  23. ACM
    Phan H and Jannesari A Statistical machine translation outperforms neural machine translation in software engineering: why and how Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages, (3-12)
  24. Ameur M, Meziane F and Guessoum A (2020). Arabic Machine Translation, Computer Science Review, 38:C, Online publication date: 1-Nov-2020.
  25. Yang M, Wang X, Zhang M and Zhao T Incorporating Phrase-Level Agreement into Neural Machine Translation Natural Language Processing and Chinese Computing, (416-428)
  26. Deng Y, Huang H, Chen X, Liu Z, Wu S, Xuan J and Li Z From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning Database Systems for Advanced Applications, (352-368)
  27. Jónsson H, Símonarson H, Snæbjarnarson V, Steingrímsson S and Loftsson H Experimenting with Different Machine Translation Models in Medium-Resource Settings Text, Speech, and Dialogue, (95-103)
  28. Balashov Y (2020). The Translator’s Extended Mind, Minds and Machines, 30:3, (349-383), Online publication date: 1-Sep-2020.
  29. Yao K, Li H, Shang W and Hassan A (2020). A study of the performance of general compressors on log files, Empirical Software Engineering, 25:5, (3043-3085), Online publication date: 1-Sep-2020.
  30. Sulubacak U, Caglayan O, Grönroos S, Rouhe A, Elliott D, Specia L and Tiedemann J (2020). Multimodal machine translation through visuals and speech, Machine Translation, 34:2-3, (97-147), Online publication date: 1-Sep-2020.
  31. Lyons S (2020). A review of Thai–English machine translation, Machine Translation, 34:2-3, (197-230), Online publication date: 1-Sep-2020.
  32. Jabeen S, Gao X and Andreae P (2019). Semantic association computation: a comprehensive survey, Artificial Intelligence Review, 53:6, (3849-3899), Online publication date: 1-Aug-2020.
  33. Modarresi K Detecting the Most Insightful Parts of Documents Using a Regularized Attention-Based Model Computational Science – ICCS 2020, (272-281)
  34. Ranta A, Angelov K, Gruzitis N and Kolachina P (2020). Abstract Syntax as Interlingua, Computational Linguistics, 46:2, (425-486), Online publication date: 1-Jun-2020.
  35. Prates M, Avelar P and Lamb L (2019). Assessing gender bias in machine translation: a case study with Google Translate, Neural Computing and Applications, 32:10, (6363-6381), Online publication date: 1-May-2020.
  36. Li H, Huang G, Cai D and Liu L (2020). Neural Machine Translation With Noisy Lexical Constraints, IEEE/ACM Transactions on Audio, Speech and Language Processing, 28, (1864-1874), Online publication date: 1-Jan-2020.
  37. Mehndiratta A and Asawa K Recent Advances and Challenges in Design of Non-goal-Oriented Dialogue Systems Big Data Analytics, (33-43)
  38. Lacomis J, Yin P, Schwartz E, Allamanis M, Goues C, Neubig G and Vasilescu B DIRE Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, (628-639)
  39. Chinea-Rios M, Sanchis-Trilles G and Casacuberta F (2019). Discriminative ridge regression algorithm for adaptation in statistical machine translation, Pattern Analysis & Applications, 22:4, (1293-1305), Online publication date: 1-Nov-2019.
  40. Pathak A, Pakray P and Bentham J (2019). English–Mizo Machine Translation using neural and statistical approaches, Neural Computing and Applications, 31:11, (7615-7631), Online publication date: 1-Nov-2019.
  41. ACM
    Tufano M, Watson C, Bavota G, Penta M, White M and Poshyvanyk D (2019). An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation, ACM Transactions on Software Engineering and Methodology, 28:4, (1-29), Online publication date: 31-Oct-2019.
  42. ACM
    Berrichi S and Mazroui A Guiding word alignment with prior knowledge to improve English-Arabic Machine Translation Proceedings of the 4th International Conference on Big Data and Internet of Things, (1-5)
  43. Azpeitia A and Etchegoyhen T (2019). Efficient document alignment across scenarios, Machine Translation, 33:3, (205-237), Online publication date: 1-Sep-2019.
  44. Fan H, Wang J, Zhuang B, Wang S and Xiao J Automatic Acrostic Couplet Generation with Three-Stage Neural Network Pipelines PRICAI 2019: Trends in Artificial Intelligence, (314-324)
  45. ACM
    Chen M, Lee B, Bansal G, Cao Y, Zhang S, Lu J, Tsay J, Wang Y, Dai A, Chen Z, Sohn T and Wu Y Gmail Smart Compose Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2287-2295)
  46. Chinea-Rios M, Sanchis-Trilles G and Casacuberta F (2019). Vector sentences representation for data selection in statisticalmachine translation, Computer Speech and Language, 56:C, (1-16), Online publication date: 1-Jul-2019.
  47. Khan Jadoon N, Anwar W, Bajwa U and Ahmad F (2019). Statistical machine translation of Indian languages: a survey, Neural Computing and Applications, 31:7, (2455-2467), Online publication date: 1-Jul-2019.
  48. Marzouk S and Hansen-Schirra S (2019). Evaluation of the impact of controlled language on neural machine translation compared to other MT architectures, Machine Translation, 33:1-2, (179-203), Online publication date: 1-Jun-2019.
  49. Calixto I and Liu Q (2019). An error analysis for image-based multi-modal neural machine translation, Machine Translation, 33:1-2, (155-177), Online publication date: 1-Jun-2019.
  50. Rahman M, Palani D and Rigby P Natural software revisited Proceedings of the 41st International Conference on Software Engineering, (37-48)
  51. Tran N, Tran H, Nguyen S, Nguyen H and Nguyen T Does BLEU score work for code migration? Proceedings of the 27th International Conference on Program Comprehension, (165-176)
  52. Ruder S, Vulić I and Søgaard A (2019). A survey of cross-lingual word embedding models, Journal of Artificial Intelligence Research, 65:1, (569-630), Online publication date: 1-May-2019.
  53. Ostaszewski M, Miszczak J, Banchi L and Sadowski P (2019). Approximation of quantum control correction scheme using deep neural networks, Quantum Information Processing, 18:5, (1-13), Online publication date: 1-May-2019.
  54. Kinghorn P, Zhang L and Shao L (2019). A hierarchical and regional deep learning architecture for image description generation, Pattern Recognition Letters, 119:C, (77-85), Online publication date: 1-Mar-2019.
  55. Schluter R, Beck E and Ney H (2019). Upper and Lower Tight Error Bounds for Feature Omission with an Extension to Context Reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41:2, (502-514), Online publication date: 1-Feb-2019.
  56. Xia M, Huang G, Liu L and Shi S Graph based translation memory for neural machine translation Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, (7297-7304)
  57. Jain P, Mishra A, Azad A and Sankaranarayanan K Unsupervised controllable text formalization Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, (6554-6561)
  58. Mauă?Ec M and Brest J (2019). Slavic languages in phrase-based statistical machine translation, Artificial Intelligence Review, 51:1, (77-117), Online publication date: 1-Jan-2019.
  59. ACM
    Coughlin R, Setthawong R and Setthawong P An Improved English-Thai Translation Framework for Non-timing Aligned Parallel Corpora Using Bleualign with Explicit Feedback Proceedings of the 10th International Conference on Advances in Information Technology, (1-8)
  60. Anderson P, Gould S and Johnson M Partially-supervised image captioning Proceedings of the 32nd International Conference on Neural Information Processing Systems, (1879-1890)
  61. ACM
    Barmpoutis A Learning Programming Languages as Shortcuts to Natural Language Token Replacements Proceedings of the 18th Koli Calling International Conference on Computing Education Research, (1-10)
  62. ACM
    He P, Chen Z, He S and Lyu M Characterizing the natural language descriptions in software logging statements Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, (178-189)
  63. ACM
    Chen M Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network Proceedings of the 2018 1st International Conference on Mathematics and Statistics, (69-73)
  64. ACM
    Yin P, Deng B, Chen E, Vasilescu B and Neubig G Learning to mine aligned code and natural language pairs from stack overflow Proceedings of the 15th International Conference on Mining Software Repositories, (476-486)
  65. ACM
    Phan H, Nguyen H, Tran N, Truong L, Nguyen A and Nguyen T Statistical learning of API fully qualified names in code snippets of online forums Proceedings of the 40th International Conference on Software Engineering, (632-642)
  66. Munigala V, Mishra A, Tamilselvam S, Khare S, Dasgupta R and Sankaran A PersuAIDE ! An Adaptive Persuasive Text Generation System for Fashion Domain Companion Proceedings of the The Web Conference 2018, (335-342)
  67. ACM
    Fujita A and Isabelle P (2018). Expanding Paraphrase Lexicons by Exploiting Generalities, ACM Transactions on Asian and Low-Resource Language Information Processing, 17:2, (1-36), Online publication date: 5-Feb-2018.
  68. Zhou Q, Yang N, Wei F and Zhou M Sequential copying networks Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, (4987-4994)
  69. ACM
    Grami G, Alkazemi B, Nour M, Naseer A and Al-Doobi H A Proposed Model to Address Current Errors in English into Arabic Machine Translation Proceedings of the 3rd International Conference on Robotics and Artificial Intelligence, (116-120)
  70. ACM
    Chen C, Xing Z and Liu Y (2017). By the Community & For the Community, Proceedings of the ACM on Human-Computer Interaction, 1:CSCW, (1-21), Online publication date: 6-Dec-2017.
  71. ACM
    Revanuru K, Turlapaty K and Rao S Neural Machine Translation of Indian Languages Proceedings of the 10th Annual ACM India Compute Conference, (11-20)
  72. Kazemi A, Toral A, Way A, Monadjemi A and Nematbakhsh M (2017). Syntax- and semantic-based reordering in hierarchical phrase-based statistical machine translation, Expert Systems with Applications: An International Journal, 84:C, (186-199), Online publication date: 30-Oct-2017.
  73. Liu L, Fujita A, Utiyama M, Finch A, Sumita E, Lemao Liu , Fujita A, Utiyama M, Finch A and Sumita E (2017). Translation Quality Estimation Using Only Bilingual Corpora, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:9, (1762-1772), Online publication date: 1-Sep-2017.
  74. Gulcehre C, Firat O, Xu K, Cho K and Bengio Y (2017). On integrating a language model into neural machine translation, Computer Speech and Language, 45:C, (137-148), Online publication date: 1-Sep-2017.
  75. Dauphin Y, Fan A, Auli M and Grangier D Language modeling with gated convolutional networks Proceedings of the 34th International Conference on Machine Learning - Volume 70, (933-941)
  76. Phan H, Nguyen H, Nguyen T and Rajan H Statistical learning for inference between implementations and documentation Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track, (27-30)
  77. Phan H, Nguyen A, Nguyen T and Nguyen T Statistical migration of API usages Proceedings of the 39th International Conference on Software Engineering Companion, (47-50)
  78. Kim K, Park E, Shin J, Kwon O and Kim Y (2017). Divergence-based fine pruning of phrase-based statistical translation model, Computer Speech and Language, 41:C, (146-160), Online publication date: 1-Jan-2017.
  79. ACM
    Poirier É Meaning-based content word alignment heuristic Proceedings of the 8th International Conference on Management of Digital EcoSystems, (208-214)
  80. ACM
    Nguyen T, Rigby P, Nguyen A, Karanfil M and Nguyen T T2API: synthesizing API code usage templates from English texts with statistical translation Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, (1013-1017)
  81. ACM
    Nguyen T Code migration with statistical machine translation Proceedings of the 5th International Workshop on Software Mining, (2-2)
  82. Saha Roy R, Agarwal S, Ganguly N and Choudhury M (2016). Syntactic complexity of Web search queries through the lenses of language models, networks and users, Information Processing and Management: an International Journal, 52:5, (923-948), Online publication date: 1-Sep-2016.
  83. Maletti A Compositions of Tree-to-Tree Statistical Machine Translation Models Proceedings of the 20th International Conference on Developments in Language Theory - Volume 9840, (293-305)
  84. ACM
    Hindle A, Barr E, Gabel M, Su Z and Devanbu P (2016). On the naturalness of software, Communications of the ACM, 59:5, (122-131), Online publication date: 26-Apr-2016.
  85. Abdul-Rauf S, Schwenk H, Lambert P and Nawaz M (2016). Empirical use of information retrieval to build synthetic data for SMT domain adaptation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:4, (745-754), Online publication date: 1-Apr-2016.
  86. ACM
    Chu C, Nakazawa T and Kurohashi S (2015). Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora, ACM Transactions on Asian and Low-Resource Language Information Processing, 15:2, (1-22), Online publication date: 1-Feb-2016.
  87. Bentivogli L, Bertoldi N, Cettolo M, Federico M, Negri M and Turchi M (2016). On the evaluation of adaptive machine translation for human post-editing, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:2, (388-399), Online publication date: 1-Feb-2016.
  88. Nguyen A, Nguyen T and Nguyen T Divide-and-conquer approach for multi-phase statistical migration for source code Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, (585-596)
  89. Oda Y, Fudaba H, Neubig G, Hata H, Sakti S, Toda T and Nakamura S Learning to generate pseudo-code from source code using statistical machine translation Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, (574-584)
  90. Fudaba H, Oda Y, Akabe K, Neubig G, Hata H, Sakti S, Toda T and Nakamura S Pseudogen Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, (824-829)
  91. ACM
    Sordoni A, Bengio Y, Vahabi H, Lioma C, Grue Simonsen J and Nie J A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, (553-562)
  92. Wołk K and Marasek K Tuned and GPU-Accelerated Parallel Data Mining from Comparable Corpora Proceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 9302, (32-40)
  93. Guo J, Liu J, Chen X, Han Q and Zhou K Tunable Discounting Mechanisms for Language Modeling Revised Selected Papers, Part II, of the 5th International Conference on Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques - Volume 9243, (585-594)
  94. ACM
    Liu X, Duh K and Matsumoto Y (2015). Multilingual Topic Models for Bilingual Dictionary Extraction, ACM Transactions on Asian and Low-Resource Language Information Processing, 14:3, (1-22), Online publication date: 12-Jun-2015.
  95. White M, Vendome C, Linares-Vásquez M and Poshyvanyk D Toward deep learning software repositories Proceedings of the 12th Working Conference on Mining Software Repositories, (334-345)
  96. White M Deep representations for software engineering Proceedings of the 37th International Conference on Software Engineering - Volume 2, (781-783)
  97. Tambouratzis G (2015). Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data, Pattern Recognition Letters, 53:C, (44-52), Online publication date: 1-Feb-2015.
  98. Turner A, Brownstein M, Cole K, Karasz H and Kirchhoff K (2015). Modeling workflow to design machine translation applications for public health practice, Journal of Biomedical Informatics, 53:C, (136-146), Online publication date: 1-Feb-2015.
  99. Piqueras S, Del-Agua M, Giménez A, Civera J and Juan A Statistical Text-to-Speech Synthesis of Spanish Subtitles Proceedings of the Second International Conference on Advances in Speech and Language Technologies for Iberian Languages - Volume 8854, (40-48)
  100. ACM
    Bredin H, Roy A, Pécheux N and Allauzen A "Sheldon speaking, Bonjour!" Proceedings of the 22nd ACM international conference on Multimedia, (137-146)
  101. ACM
    Ture F and Lin J (2014). Exploiting Representations from Statistical Machine Translation for Cross-Language Information Retrieval, ACM Transactions on Information Systems, 32:4, (1-32), Online publication date: 28-Oct-2014.
  102. ACM
    Karaivanov S, Raychev V and Vechev M Phrase-Based Statistical Translation of Programming Languages Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, (173-184)
  103. ACM
    Green S, Chuang J, Heer J and Manning C Predictive translation memory Proceedings of the 27th annual ACM symposium on User interface software and technology, (177-187)
  104. ACM
    Nguyen A, Nguyen H, Nguyen T and Nguyen T Statistical learning approach for mining API usage mappings for code migration Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, (457-468)
  105. ACM
    Igarashi T TransDocument Proceedings of the 5th ACM international conference on Collaboration across boundaries: culture, distance & technology, (53-62)
  106. ACM
    Sokolov A, Hieber F and Riezler S Learning to translate queries for CLIR Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, (1179-1182)
  107. ACM
    Nguyen A, Nguyen H, Nguyen T and Nguyen T Statistical learning of API mappings for language migration Companion Proceedings of the 36th International Conference on Software Engineering, (618-619)
  108. ACM
    Nguyen A, Nguyen T and Nguyen T Migrating code with statistical machine translation Companion Proceedings of the 36th International Conference on Software Engineering, (544-547)
  109. Tiedemann J Improved Text Extraction from PDF Documents for Large-Scale Natural Language Processing Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8403, (102-112)
  110. Lohar P, Bhaskar P, Pal S and Bandyopadhyay S Cross Lingual Snippet Generation Using Snippet Translation System Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8404, (331-342)
  111. Alabau V, Sanchis A and Casacuberta F (2014). Improving on-line handwritten recognition in interactive machine translation, Pattern Recognition, 47:3, (1217-1228), Online publication date: 1-Mar-2014.
  112. ACM
    Sokolov A, Wisniewski G and Yvon F (2014). Lattice BLEU oracles in machine translation, ACM Transactions on Speech and Language Processing , 10:4, (1-29), Online publication date: 1-Dec-2013.
  113. ACM
    Nguyen A, Nguyen T and Nguyen T Lexical statistical machine translation for language migration Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, (651-654)
  114. ACM
    Sudoh K, Wu X, Duh K, Tsukada H and Nagata M (2013). Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation, ACM Transactions on Asian Language Information Processing, 12:3, (1-15), Online publication date: 1-Aug-2013.
  115. ACM
    Madnani N and Dorr B (2013). Generating targeted paraphrases for improved translation, ACM Transactions on Intelligent Systems and Technology, 4:3, (1-25), Online publication date: 1-Jun-2013.
  116. Brkić M, Seljan S and Vičić T Automatic and human evaluation on english-croatian legislative test set Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2, (311-317)
  117. Guo J, Liu J, Walsh M and Schmid H Class-Based language models for chinese-english parallel corpus Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2, (264-275)
  118. Pinnis M, Skadiņa I and Vasiļjevs A Domain adaptation in statistical machine translation using comparable corpora Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2, (224-235)
  119. Rychtyckyj N and Plesco C (2013). Applying Automated Language Translation at a Global Enterprise Level, AI Magazine, 34:1, (43-54), Online publication date: 1-Mar-2013.
  120. LóPez-LudeñA V, San-Segundo R, GonzáLez Morcillo C, LóPez J and Pardo MuñOz J (2013). Increasing adaptability of a speech into sign language translation system, Expert Systems with Applications: An International Journal, 40:4, (1312-1322), Online publication date: 1-Mar-2013.
  121. Xiao T, Zhu J and Liu T (2013). Bagging and Boosting statistical machine translation systems, Artificial Intelligence, 195, (496-527), Online publication date: 1-Feb-2013.
  122. Wróblewska A and Przepiórkowski A Induction of dependency structures based on weighted projection Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I, (364-374)
  123. ACM
    Isozaki H, Sudoh K, Tsukada H and Duh K (2012). HPSG-Based Preprocessing for English-to-Japanese Translation, ACM Transactions on Asian Language Information Processing, 11:3, (1-16), Online publication date: 1-Sep-2012.
  124. Büchse M, Maletti A and Vogler H Unidirectional derivation semantics for synchronous tree-adjoining grammars Proceedings of the 16th international conference on Developments in Language Theory, (368-379)
  125. Sofianopoulos S, Vassiliou M and Tambouratzis G Implementing a language-independent MT methodology Proceedings of the First Workshop on Multilingual Modeling, (1-10)
  126. Balahur A and Turchi M Multilingual sentiment analysis using machine translation? Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, (52-60)
  127. Wang R, Osenova P and Simov K Linguistically-enriched models for Bulgarian-to-English machine translation Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, (10-19)
  128. Fujita A, Isabelle P and Kuhn R Enlarging paraphrase collections through generalization and instantiation Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (631-642)
  129. Pinnis M, Ion R, Ştefănescu D, Su F, Skadiņa I, Vasiļjevs A and Babych B ACCURAT toolkit for multi-level alignment and information extraction from comparable corpora Proceedings of the ACL 2012 System Demonstrations, (91-96)
  130. Ganitkevitch J, Van Durme B and Callison-Burch C Monolingual distributional similarity for text-to-text generation Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, (256-264)
  131. Maletti A Every sensible extended top-down tree transducer is a multi bottom-up tree transducer Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (263-273)
  132. Hindle A, Barr E, Su Z, Gabel M and Devanbu P On the naturalness of software Proceedings of the 34th International Conference on Software Engineering, (837-847)
  133. Mayer T and Cysouw M Language comparison through sparse multilingual word alignment Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH, (54-62)
  134. Vu Hoang C and Aw A An unsupervised and data-driven approach for spell checking in Vietnamese OCR-scanned texts Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, (36-44)
  135. Wang R, Osenova P and Simov K Linguistically-augmented Bulgarian-to-English statistical machine translation model Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra), (119-128)
  136. Dandapat S, Morrissey S, Way A and van Genabith J Combining EBMT, SMT, TM and IR technologies for quality and scale Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra), (48-58)
  137. Harriehausen-Mühlbauer B and Heuss T Semantic web based machine translation Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra), (1-9)
  138. Nikoulina V, Kovachev B, Lagos N and Monz C Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, (109-119)
  139. Martzoukos S and Monz C Power-law distributions for paraphrases extracted from bilingual corpora Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, (2-11)
  140. Okita T and van Genabith J Minimum bayes risk decoding with enlarged hypothesis space in system combination Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II, (40-51)
  141. ACM
    Carpineto C and Romano G (2012). A Survey of Automatic Query Expansion in Information Retrieval, ACM Computing Surveys, 44:1, (1-50), Online publication date: 1-Jan-2012.
  142. ACM
    Xiao T, Zhu J and Zhu M (2011). Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars, ACM Transactions on Asian Language Information Processing, 10:4, (1-29), Online publication date: 1-Dec-2011.
  143. Maletti A Tree transformations and dependencies Proceedings of the 12th biennial conference on The mathematics of language, (1-20)
  144. ACM
    Greengard S (2011). Life, translated, Communications of the ACM, 54:8, (19-21), Online publication date: 1-Aug-2011.
  145. Galanis D and Androutsopoulos I A new sentence compression dataset and its use in an abstractive generate-and-rank sentence compressor Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop, (1-11)
  146. López-Ludeña V, San-Segundo R, Lutfi S, Lucas-Cuesta J, Echevarry J and Martínez-González B Source language categorization for improving a speech into sign language translation system Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, (84-93)
  147. Sánchez-Cartagena V, Sánchez-Martínez F and Pérez-Ortiz J The Universitat d'Alacant hybrid machine translation system for WMT 2011 Proceedings of the Sixth Workshop on Statistical Machine Translation, (457-463)
  148. López-Ludeña V and San-Segundo R UPM system for the translation task Proceedings of the Sixth Workshop on Statistical Machine Translation, (420-425)
  149. Zhang Y and Clark S Syntax-based grammaticality improvement using CCG and guided search Proceedings of the Conference on Empirical Methods in Natural Language Processing, (1147-1157)
  150. Malakasiotis P and Androutsopoulos I A generate and rank approach to sentence paraphrasing Proceedings of the Conference on Empirical Methods in Natural Language Processing, (96-106)
  151. ACM
    Ture F, Elsayed T and Lin J No free lunch Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, (943-952)
  152. ACM
    Na S and Ng H Enriching document representation via translation for improved monolingual information retrieval Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, (853-862)
  153. McCrae J, Espinoza M, Montiel-Ponsoda E, Aguado-de-Cea G and Cimiano P Combining statistical and semantic approaches to the translation of ontologies and taxonomies Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation, (116-125)
  154. Attardi G, Chanev A and Miceli Barone A A dependency based statistical translation model Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation, (79-87)
  155. Liu Z, Chen X, Zheng Y and Sun M Automatic keyphrase extraction by bridging vocabulary gap Proceedings of the Fifteenth Conference on Computational Natural Language Learning, (135-144)
  156. Schwartz L, Callison-Burch C, Schuler W and Wu S Incremental syntactic language models for phrase-based translation Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, (620-631)
  157. Ravi S and Knight K Deciphering foreign language Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, (12-21)
  158. Silvestre-Cerdà J, Andrés-Ferrer J and Civera J Explicit length modelling for statistical machine translation Proceedings of the 5th Iberian conference on Pattern recognition and image analysis, (273-280)
  159. Maletti A (2011). Survey: Weighted Extended Top-down Tree Transducers Part II—Application in Machine Translation, Fundamenta Informaticae, 112:2-3, (239-261), Online publication date: 1-Apr-2011.
  160. Fülöp Z, Maletti A and Vogler H (2011). Weighted Extended Tree Transducers, Fundamenta Informaticae, 111:2, (163-202), Online publication date: 1-Apr-2011.
  161. Lagoutte A and Maletti A Survey Algebraic Foundations in Computer Science, (272-308)
  162. Son L, Allauzen A, Wisniewski G and Yvon F Training continuous space language models Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, (778-788)
  163. de Gispert A, Pino J and Byrne W Hierarchical phrase-based translation grammars extracted from alignment posterior probabilities Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, (545-554)
  164. Kuhn R, Chen B, Foster G and Stratford E Phrase clustering for smoothing TM probabilities Proceedings of the 23rd International Conference on Computational Linguistics, (608-616)
  165. Isozaki H, Sudoh K, Tsukada H and Duh K Head finalization Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, (244-251)
  166. Abney S and Bird S The human language project Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, (88-97)
  167. ACM
    Lopez A (2008). Statistical machine translation, ACM Computing Surveys, 40:3, (1-49), Online publication date: 1-Aug-2008.
  168. ACM
    Liebeskind C, Liebeskind S and Bouhnik D Machine Translation for Historical Research: A case study of Aramaic-Ancient Hebrew Translations, Journal on Computing and Cultural Heritage , 0:0
  169. Tambouratzis G Applying PSO to natural language processing tasks: Optimizing the identification of syntactic phrases 2016 IEEE Congress on Evolutionary Computation (CEC), (1831-1838)
Contributors
  • Johns Hopkins University

Recommendations

Reviews

Jeffrey B. Putnam

Natural languages-for example, English or French-are messy. In fact, they are really messy. Programming languages usually have syntax that was designed to be easy for computers to understand, and the meaning of any bit of program text is usually well defined. In contrast, natural languages are hard to parse, and different languages are often very different when it comes to word order and sentence structure. To compound the difficulties, it may even be hard to figure out what a word is. And, naturally enough, natural languages are almost always changing-for example, the addition of new words, such as LOL (laughing out loud), and new punctuation, like the smiley emoticon. And just when you think things can't get any worse, there are all these odd idioms that don't mean what they look like they should mean. Basically, it's raining cats and dogs with difficulties. Still, we'd like to have programs that can do something reasonable with natural language. In particular, given the deluge of written texts that are now available electronically (and growing fast), and the number of languages involved, we'd like to be able to automatically translate one language to another. Statistical machine translation methods take parallel corpora-essentially the same content in two languages-and, using statistical models and some knowledge of how languages are generally structured, derive translation engines. Frequently, these parallel corpora are representative of a single knowledge domain. As just one example, the Europarl corpus is the proceedings of the European Parliament. These translation engines are built specifically to translate; they don't necessarily get at (or even need) the meaning of the texts in question. This effort has been met with some success, and this text provides an excellent overview into how it is done. The book is broken up into three parts. The first part introduces the topic, provides some basic notions-what is a sentence, for example-and provides a basic overview of the math required. The second part, "Core Methods," is the real meat of the book. One chapter is devoted to words and the problem of deciding which words in parallel corpora correspond to each other. One chapter covers phrases (which may require reordering of the constituent words), and another discusses decoding-the process by which the best translation is generated-a problem that is nondeterministic polynomial-time (NP) complete. The next chapter covers language models that mark sentences on how likely they are to occur in text generated by humans fluent in the target language. Finally, there is a chapter on how to evaluate constructed translation engines. Part 3, "Advanced Topics," covers discriminative training, adding linguistic information (transliteration, capitalization, and other factors), and tree-based models that construct parallel tree rewriting systems-that is, this kind of phrase as a tree translates to that kind of phrase as a tree with nodes labeled to correspond. Each chapter comes with an extensive bibliographic summary and exercises, which are usually based on software and corpora that are available online. There are some oddities. Entropy is defined twice-once in the preliminaries and once in a later chapter. A number of words and phrases are in boldface and noted in the margins; while this is a nice touch, it would have been even more useful to list the words in a comprehensive index. This is an excellent introduction for someone interested in statistical translation. It is quite readable, although a few dense sections could use a bit more explanation and a few more examples. As it is, it might be difficult to use for self-study. While the book might be appropriate for advanced undergraduates, it is certainly suitable for graduate-level courses. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.