fbpx
Wikipedia

Machine translation

Machine translation is use of either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches to translation of text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.

A mobile phone app translating Spanish text into English

History edit

Origins edit

The origins of machine translation can be traced back to the work of Al-Kindi, a ninth-century Arabic cryptographer who developed techniques for systemic language translation, including cryptanalysis, frequency analysis, and probability and statistics, which are used in modern machine translation.[1] The idea of machine translation later appeared in the 17th century. In 1629, René Descartes proposed a universal language, with equivalent ideas in different tongues sharing one symbol.[2]

The idea of using digital computers for translation of natural languages was proposed as early as 1947 by England's A. D. Booth[3] and Warren Weaver at Rockefeller Foundation in the same year. "The memorandum written by Warren Weaver in 1949 is perhaps the single most influential publication in the earliest days of machine translation."[4][5] Others followed. A demonstration was made in 1954 on the APEXC machine at Birkbeck College (University of London) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (for example an article by Cleave and Zacharov in the September 1955 issue of Wireless World). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer.

1950s edit

The first researcher in the field, Yehoshua Bar-Hillel, began his research at MIT (1951). A Georgetown University MT research team, led by Professor Michael Zarechnak, followed (1951) with a public demonstration of its Georgetown-IBM experiment system in 1954. MT research programs popped up in Japan[6][7] and Russia (1955), and the first MT conference was held in London (1956).[8][9]

David G. Hays "wrote about computer-assisted language processing as early as 1957" and "was project leader on computational linguistics at Rand from 1955 to 1968."[10]

1960–1975 edit

Researchers continued to join the field as the Association for Machine Translation and Computational Linguistics was formed in the U.S. (1962) and the National Academy of Sciences formed the Automatic Language Processing Advisory Committee (ALPAC) to study MT (1964). Real progress was much slower, however, and after the ALPAC report (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced.[11] According to a 1972 report by the Director of Defense Research and Engineering (DDR&E), the feasibility of large-scale MT was reestablished by the success of the Logos MT system in translating military manuals into Vietnamese during that conflict.

The French Textile Institute also used MT to translate abstracts from and into French, English, German and Spanish (1970); Brigham Young University started a project to translate Mormon texts by automated translation (1971).

1975 and beyond edit

SYSTRAN, which "pioneered the field under contracts from the U.S. government"[12] in the 1960s, was used by Xerox to translate technical manuals (1978). Beginning in the late 1980s, as computational power increased and became less expensive, more interest was shown in statistical models for machine translation. MT became more popular after the advent of computers.[13] SYSTRAN's first implementation system was implemented in 1988 by the online service of the French Postal Service called Minitel.[14] Various computer based translation companies were also launched, including Trados (1984), which was the first to develop and market Translation Memory technology (1989), though this is not the same as MT. The first commercial MT system for Russian / English / German-Ukrainian was developed at Kharkov State University (1991).

By 1998, "for as little as $29.95" one could "buy a program for translating in one direction between English and a major European language of your choice" to run on a PC.[12]

MT on the web started with SYSTRAN offering free translation of small texts (1996) and then providing this via AltaVista Babelfish,[12] which racked up 500,000 requests a day (1997).[15] The second free translation service on the web was Lernout & Hauspie's GlobaLink.[12] Atlantic Magazine wrote in 1998 that "Systran's Babelfish and GlobaLink's Comprende" handled "Don't bank on it" with a "competent performance."[16]

Franz Josef Och (the future head of Translation Development AT Google) won DARPA's speed MT competition (2003).[17] More innovations during this time included MOSES, the open-source statistical MT engine (2007), a text/SMS translation service for mobiles in Japan (2008), and a mobile phone with built-in speech-to-speech translation functionality for English, Japanese and Chinese (2009). In 2012, Google announced that Google Translate translates roughly enough text to fill 1 million books in one day.

Approaches edit

Before the advent of deep learning methods, statistical methods required a lot of rules accompanied by morphological, syntactic, and semantic annotations.

Rule-based edit

The rule-based machine translation approach was used mostly in the creation of dictionaries and grammar programs. Its biggest downfall was that everything had to be made explicit: orthographical variation and erroneous input must be made part of the source language analyser in order to cope with it, and lexical selection rules must be written for all instances of ambiguity.

Transfer-based machine translation edit

Transfer-based machine translation was similar to interlingual machine translation in that it created a translation from an intermediate representation that simulated the meaning of the original sentence. Unlike interlingual MT, it depended partially on the language pair involved in the translation.

Interlingual edit

Interlingual machine translation was one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, was transformed into an interlingual language, i.e. a "language neutral" representation that is independent of any language. The target language was then generated out of the interlingua. The only interlingual machine translation system that was made operational at the commercial level was the KANT system (Nyberg and Mitamura, 1992), which was designed to translate Caterpillar Technical English (CTE) into other languages.

Dictionary-based edit

Machine translation used a method based on dictionary entries, which means that the words were translated as they are by a dictionary.

Statistical edit

Statistical machine translation tried to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Where such corpora were available, good results were achieved translating similar texts, but such corpora were rare for many language pairs. The first statistical machine translation software was CANDIDE from IBM. In 2005, Google improved its internal translation capabilities by using approximately 200 billion words from United Nations materials to train their system; translation accuracy improved.[18]

SMT's biggest downfall included it being dependent upon huge amounts of parallel texts, its problems with morphology-rich languages (especially with translating into such languages), and its inability to correct singleton errors.

Neural MT edit

A deep learning-based approach to MT, neural machine translation has made rapid progress in recent years. However, current consensus is that the so-called human parity achieved is not real, being based wholly on limited domains, language pairs, and certain test benchmarks[19] i.e., it lacks statistical significance power.[20]

Translations by neural MT tools like DeepL Translator, which is thought to usually deliver the best machine translation results as of 2022, typically still need post-editing by a human.[21][22][23]

Prompt engineering is required in order to steer the GPT-3-generated translations.[24][25]

Major issues edit

 
Machine translation could produce some non-understandable phrases, such as "鸡枞" (Macrolepiota albuminosa) being rendered as "Wikipedia".
 
Broken Chinese "沒有進入" from machine translation in Bali, Indonesia. The broken Chinese sentence sounds like "there does not exist an entry" or "have not entered yet".

Studies using human evaluation (e.g. by professional literary translators or human readers) have systematically identified various issues with the latest advanced MT outputs.[25] Common issues include the translation of ambiguous parts whose correct translation requires common sense-like semantic language processing or context.[25] There can also be errors in the source texts, missing high-quality training data and the severity of frequency of several types of problems may not get reduced with techniques used to date, requiring some level of human active participation.

Disambiguation edit

Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel.[26] He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word.[27] Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.

Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.[28]

Claude Piron, a long-time translator for the United Nations and the World Health Organization, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved:

Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six [more] hours of work. There are ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoners of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia.[29]

The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of AI than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.

Non-standard speech edit

One of the major pitfalls of MT is its inability to translate non-standard language with the same accuracy as standard language. Heuristic or statistical based MT takes input from various sources in standard form of a language. Rule-based translation, by nature, does not include common non-standard usages. This causes errors in translation from a vernacular source or into colloquial language. Limitations on translation from casual speech present issues in the use of machine translation in mobile devices.

Named entities edit

In information extraction, named entities, in a narrow sense, refer to concrete or abstract entities in the real world such as people, organizations, companies, and places that have a proper name: George Washington, Chicago, Microsoft. It also refers to expressions of time, space and quantity such as 1 July 2011, $500.

In the sentence "Smith is the president of Fabrionix" both Smith and Fabrionix are named entities, and can be further qualified via first name or other information; "president" is not, since Smith could have earlier held another position at Fabrionix, e.g. Vice President. The term rigid designator is what defines these usages for analysis in statistical machine translation.

Named entities must first be identified in the text; if not, they may be erroneously translated as common nouns, which would most likely not affect the BLEU rating of the translation but would change the text's human readability.[30] They may be omitted from the output translation, which would also have implications for the text's readability and message.

Transliteration includes finding the letters in the target language that most closely correspond to the name in the source language. This, however, has been cited as sometimes worsening the quality of translation.[31] For "Southern California" the first word should be translated directly, while the second word should be transliterated. Machines often transliterate both because they treated them as one entity. Words like these are hard for machine translators, even those with a transliteration component, to process.

Use of a "do-not-translate" list, which has the same end goal – transliteration as opposed to translation.[32] still relies on correct identification of named entities.

A third approach is a class-based model. Named entities are replaced with a token to represent their "class"; "Ted" and "Erica" would both be replaced with "person" class token. Then the statistical distribution and use of person names, in general, can be analyzed instead of looking at the distributions of "Ted" and "Erica" individually, so that the probability of a given name in a specific language will not affect the assigned probability of a translation. A study by Stanford on improving this area of translation gives the examples that different probabilities will be assigned to "David is going for a walk" and "Ankit is going for a walk" for English as a target language due to the different number of occurrences for each name in the training data. A frustrating outcome of the same study by Stanford (and other attempts to improve named recognition translation) is that many times, a decrease in the BLEU scores for translation will result from the inclusion of methods for named entity translation.[32]

Somewhat related are the phrases "drinking tea with milk" vs. "drinking tea with Molly."

Translation from multiparallel sources edit

Some work has been done in the utilization of multiparallel corpora, that is a body of text that has been translated into 3 or more languages. Using these methods, a text that has been translated into 2 or more languages may be utilized in combination to provide a more accurate translation into a third language compared with if just one of those source languages were used alone.[33][34][35]

Ontologies in MT edit

An ontology is a formal representation of knowledge that includes the concepts (such as objects, processes etc.) in a domain and some relations between them. If the stored information is of linguistic nature, one can speak of a lexicon.[36] In NLP, ontologies can be used as a source of knowledge for machine translation systems. With access to a large knowledge base, systems can be enabled to resolve many (especially lexical) ambiguities on their own. In the following classic examples, as humans, we are able to interpret the prepositional phrase according to the context because we use our world knowledge, stored in our lexicons:

I saw a man/star/molecule with a microscope/telescope/binoculars.[36]

A machine translation system initially would not be able to differentiate between the meanings because syntax does not change. With a large enough ontology as a source of knowledge however, the possible interpretations of ambiguous words in a specific context can be reduced. Other areas of usage for ontologies within NLP include information retrieval, information extraction and text summarization.[36]

Building ontologies edit

The ontology generated for the PANGLOSS knowledge-based machine translation system in 1993 may serve as an example of how an ontology for NLP purposes can be compiled:[37][38]

  • A large-scale ontology is necessary to help parsing in the active modules of the machine translation system.
  • In the PANGLOSS example, about 50,000 nodes were intended to be subsumed under the smaller, manually-built upper (abstract) region of the ontology. Because of its size, it had to be created automatically.
  • The goal was to merge the two resources LDOCE online and WordNet to combine the benefits of both: concise definitions from Longman, and semantic relations allowing for semi-automatic taxonomization to the ontology from WordNet.
    • A definition match algorithm was created to automatically merge the correct meanings of ambiguous words between the two online resources, based on the words that the definitions of those meanings have in common in LDOCE and WordNet. Using a similarity matrix, the algorithm delivered matches between meanings including a confidence factor. This algorithm alone, however, did not match all meanings correctly on its own.
    • A second hierarchy match algorithm was therefore created which uses the taxonomic hierarchies found in WordNet (deep hierarchies) and partially in LDOCE (flat hierarchies). This works by first matching unambiguous meanings, then limiting the search space to only the respective ancestors and descendants of those matched meanings. Thus, the algorithm matched locally unambiguous meanings (for instance, while the word seal as such is ambiguous, there is only one meaning of seal in the animal subhierarchy).
  • Both algorithms complemented each other and helped constructing a large-scale ontology for the machine translation system. The WordNet hierarchies, coupled with the matching definitions of LDOCE, were subordinated to the ontology's upper region. As a result, the PANGLOSS MT system was able to make use of this knowledge base, mainly in its generation element.

Applications edit

While no system provides the ideal of fully automatic high-quality machine translation of unrestricted text, many fully automated systems produce reasonable output.[39][40][41] The quality of machine translation is substantially improved if the domain is restricted and controlled.[42] This enables using machine translation as a tool to speed up and simplify translations, as well as producing flawed but useful low-cost or ad-hoc translations.

Travel edit

Machine translation applications have also been released for most mobile devices, including mobile telephones, pocket PCs, PDAs, etc. Due to their portability, such instruments have come to be designated as mobile translation tools enabling mobile business networking between partners speaking different languages, or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator.

For example, the Google Translate app allows foreigners to quickly translate text in their surrounding via augmented reality using the smartphone camera that overlays the translated text onto the text.[43] It can also recognize speech and then translate it.[44]

Public administration edit

Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission. In the 2012, with an aim to replace a rule-based MT by newer, statistical-based MT@EC, The European Commission contributed 3.072 million euros (via its ISA programme).[45]

Wikipedia edit

Machine translation has also been used for translating Wikipedia articles and could play a larger role in creating, updating, expanding, and generally improving articles in the future, especially as the MT capabilities may improve. There is a "content translation tool" which allows editors to more easily translate articles across several select languages.[46][47][48] English-language articles are thought to usually be more comprehensive and less biased than their non-translated equivalents in other languages.[49] As of 2022, English Wikipedia has over 6.5 million articles while the German and Swedish Wikipedias each only have over 2.5 million articles,[50] each often far less comprehensive.

Surveillance and military edit

Following terrorist attacks in Western countries, including 9-11, the U.S. and its allies have been most interested in developing Arabic machine translation programs, but also in translating Pashto and Dari languages.[citation needed] Within these languages, the focus is on key phrases and quick communication between military members and civilians through the use of mobile phone apps.[51] The Information Processing Technology Office in DARPA hosted programs like TIDES and Babylon translator. US Air Force has awarded a $1 million contract to develop a language translation technology.[52]

Social media edit

The notable rise of social networking on the web in recent years has created yet another niche for the application of machine translation software – in utilities such as Facebook, or instant messaging clients such as Skype, GoogleTalk, MSN Messenger, etc. – allowing users speaking different languages to communicate with each other.

Online games edit

Lineage W gained popularity in Japan because of its machine translation features allowing players from different countries to communicate.[53]

Medicine edit

Despite being labelled as an unworthy competitor to human translation in 1966 by the Automated Language Processing Advisory Committee put together by the United States government,[54] the quality of machine translation has now been improved to such levels that its application in online collaboration and in the medical field are being investigated. The application of this technology in medical settings where human translators are absent is another topic of research, but difficulties arise due to the importance of accurate translations in medical diagnoses.[55]

Ancient languages edit

The advancements in convolutional neural networks in recent years and in low resource machine translation (when only a very limited amout of data and examples are available for training) enabled machine translation for ancient languages, such as Akkadian and its dialects Babylonian and Assyrian.[56]

Evaluation edit

There are many factors that affect how machine translation systems are evaluated. These factors include the intended use of the translation, the nature of the machine translation software, and the nature of the translation process.

Different programs may work well for different purposes. For example, statistical machine translation (SMT) typically outperforms example-based machine translation (EBMT), but researchers found that when evaluating English to French translation, EBMT performs better.[57] The same concept applies for technical documents, which can be more easily translated by SMT because of their formal language.

In certain applications, however, e.g., product descriptions written in a controlled language, a dictionary-based machine-translation system has produced satisfactory translations that require no human intervention save for quality inspection.[58]

There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges[59] to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable method to compare different systems such as rule-based and statistical systems.[60] Automated means of evaluation include BLEU, NIST, METEOR, and LEPOR.[61]

Relying exclusively on unedited machine translation ignores the fact that communication in human language is context-embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human.[62] The late Claude Piron wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved. Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be meaningless.[63]

In addition to disambiguation problems, decreased accuracy can occur due to varying levels of training data for machine translating programs. Both example-based and statistical machine translation rely on a vast array of real example sentences as a base for translation, and when too many or too few sentences are analyzed accuracy is jeopardized. Researchers found that when a program is trained on 203,529 sentence pairings, accuracy actually decreases.[57] The optimal level of training data seems to be just over 100,000 sentences, possibly because as training data increases, the number of possible sentences increases, making it harder to find an exact translation match.

Flaws in machine translation have been noted for their entertainment value. Two videos uploaded to YouTube in April 2017 involve two Japanese hiragana characters えぐ (e and gu) being repeatedly pasted into Google Translate, with the resulting translations quickly degrading into nonsensical phrases such as "DECEARING EGG" and "Deep-sea squeeze trees", which are then read in increasingly absurd voices;[64][65] the full-length version of the video currently has 6.9 million views as of March 2022.[66]

Machine translation and signed languages edit

In the early 2000s, options for machine translation between spoken and signed languages were severely limited. It was a common belief that deaf individuals could use traditional translators. However, stress, intonation, pitch, and timing are conveyed much differently in spoken languages compared to signed languages. Therefore, a deaf individual may misinterpret or become confused about the meaning of written text that is based on a spoken language.[67]

Researchers Zhao, et al. (2000), developed a prototype called TEAM (translation from English to ASL by machine) that completed English to American Sign Language (ASL) translations. The program would first analyze the syntactic, grammatical, and morphological aspects of the English text. Following this step, the program accessed a sign synthesizer, which acted as a dictionary for ASL. This synthesizer housed the process one must follow to complete ASL signs, as well as the meanings of these signs. Once the entire text is analyzed and the signs necessary to complete the translation are located in the synthesizer, a computer generated human appeared and would use ASL to sign the English text to the user.[67]

Copyright edit

Only works that are original are subject to copyright protection, so some scholars claim that machine translation results are not entitled to copyright protection because MT does not involve creativity.[68] The copyright at issue is for a derivative work; the author of the original work in the original language does not lose his rights when a work is translated: a translator must have permission to publish a translation.

See also edit

Notes edit

  1. ^ DuPont, Quinn (January 2018). . Amodern. Archived from the original on 14 August 2019. Retrieved 2 September 2019.
  2. ^ Knowlson, James (1975). Universal Language Schemes in England and France, 1600-1800. Toronto: University of Toronto Press. ISBN 0-8020-5296-7.
  3. ^ Booth, Andrew D. (1 May 1953). "MECHANICAL TRANSLATION". Computers and Automation 1953-05: Vol 2 Iss 4. Internet Archive. Berkeley Enterprises. p. 6.
  4. ^ J. Hutchins (2000). "Warren Weaver and the launching of MT". (PDF). Studies in the History of the Language Sciences. Vol. 97. p. 17. doi:10.1075/sihols.97.05hut. ISBN 978-90-272-4586-1. S2CID 163460375. Archived from the original (PDF) on 28 February 2020 – via Semantic Scholar.
  5. ^ "Warren Weaver, American mathematician". 13 July 2020. from the original on 6 March 2021. Retrieved 7 August 2020.
  6. ^ 上野, 俊夫 (13 August 1986). パーソナルコンピュータによる機械翻訳プログラムの制作 (in Japanese). Tokyo: (株)ラッセル社. p. 16. ISBN 494762700X. わが国では1956年、当時の電気試験所が英和翻訳専用機「ヤマト」を実験している。この機械は1962年頃には中学1年の教科書で90点以上の能力に達したと報告されている。(translation (assisted by Google Translate): In 1959 Japan, the National Institute of Advanced Industrial Science and Technology(AIST) tested the proper English-Japanese translation machine Yamato, which reported in 1964 as that reached the power level over the score of 90-point on the textbook of first grade of junior hi-school.)
  7. ^ "機械翻訳専用機「やまと」-コンピュータ博物館". from the original on 19 October 2016. Retrieved 4 April 2017.
  8. ^ Nye, Mary Jo (2016). "Speaking in Tongues: Science's centuries-long hunt for a common language". Distillations. 2 (1): 40–43. from the original on 3 August 2020. Retrieved 20 March 2018.
  9. ^ Gordin, Michael D. (2015). Scientific Babel: How Science Was Done Before and After Global English. Chicago, Illinois: University of Chicago Press. ISBN 9780226000299.
  10. ^ Wolfgang Saxon (28 July 1995). "David G. Hays, 66, a Developer Of Language Study by Computer". The New York Times. from the original on 7 February 2020. Retrieved 7 August 2020. wrote about computer-assisted language processing as early as 1957.. was project leader on computational linguistics at Rand from 1955 to 1968.
  11. ^ 上野, 俊夫 (13 August 1986). パーソナルコンピュータによる機械翻訳プログラムの制作 (in Japanese). Tokyo: (株)ラッセル社. p. 16. ISBN 494762700X.
  12. ^ a b c d Budiansky, Stephen (December 1998). "Lost in Translation". Atlantic Magazine. pp. 81–84.
  13. ^ Schank, Roger C. (2014). Conceptual Information Processing. New York: Elsevier. p. 5. ISBN 9781483258799.
  14. ^ Farwell, David; Gerber, Laurie; Hovy, Eduard (29 June 2003). Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA'98, Langhorne, PA, USA, October 28–31, 1998 Proceedings. Berlin: Springer. p. 276. ISBN 3540652590.
  15. ^ Barron, Brenda (18 November 2019). "Babel Fish: What Happened To The Original Translation Application?: We Investigate". Digital.com. from the original on 20 November 2019. Retrieved 22 November 2019.
  16. ^ and gave other examples too
  17. ^ Chan, Sin-Wai (2015). Routledge Encyclopedia of Translation Technology. Oxon: Routledge. p. 385. ISBN 9780415524841.
  18. ^ "Google Translator: The Universal Language". Blog.outer-court.com. 25 January 2007. from the original on 20 November 2008. Retrieved 12 June 2012.
  19. ^ Antonio Toral, Sheila Castilho, Ke Hu, and Andy Way. 2018. Attaining the unattainable? reassessing claims of human parity in neural machine translation. CoRR, abs/1808.10432.
  20. ^ Yvette, Graham; Barry, Haddow; Koehn, Philipp (2019). "Translationese in Machine Translation Evaluation". arXiv:1906.09833 [cs.CL].
  21. ^ Katsnelson, Alla (29 August 2022). "Poor English skills? New AIs help researchers to write better". Nature. 609 (7925): 208–209. Bibcode:2022Natur.609..208K. doi:10.1038/d41586-022-02767-9. PMID 36038730. S2CID 251931306.
  22. ^ Korab, Petr (18 February 2022). "DeepL: An Exceptionally Magnificent Language Translator". Medium. Retrieved 9 January 2023.
  23. ^ "DeepL outperforms Google Translate – DW – 12/05/2018". Deutsche Welle. Retrieved 9 January 2023.
  24. ^ Fadelli, Ingrid. "Study assesses the quality of AI literary translations by comparing them with human translations". techxplore.com. Retrieved 18 December 2022.
  25. ^ a b c Thai, Katherine; Karpinska, Marzena; Krishna, Kalpesh; Ray, Bill; Inghilleri, Moira; Wieting, John; Iyyer, Mohit (25 October 2022). "Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature". arXiv:2210.14250 [cs.CL].
  26. ^ Milestones in machine translation – No.6: Bar-Hillel and the nonfeasibility of FAHQT 12 March 2007 at the Wayback Machine by John Hutchins
  27. ^ Bar-Hillel (1960), "Automatic Translation of Languages". Available online at http://www.mt-archive.info/Bar-Hillel-1960.pdf 28 September 2011 at the Wayback Machine
  28. ^ Hybrid approaches to machine translation. Costa-jussà, Marta R., Rapp, Reinhard, Lambert, Patrik, Eberle, Kurt, Banchs, Rafael E., Babych, Bogdan. Switzerland. 21 July 2016. ISBN 9783319213101. OCLC 953581497.{{cite book}}: CS1 maint: location missing publisher (link) CS1 maint: others (link)
  29. ^ Claude Piron, Le défi des langues (The Language Challenge), Paris, L'Harmattan, 1994.
  30. ^ Babych, Bogdan; Hartley, Anthony (2003). (PDF). Paper presented at the 7th International EAMT Workshop on MT and Other Language Technology Tools... Archived from the original (PDF) on 14 May 2006. Retrieved 4 November 2013.
  31. ^ Hermajakob, U., Knight, K., & Hal, D. (2008). Name Translation in Statistical Machine Translation Learning When to Transliterate 4 January 2018 at the Wayback Machine. Association for Computational Linguistics. 389–397.
  32. ^ a b Neeraj Agrawal; Ankush Singla. Using Named Entity Recognition to improve Machine Translation (PDF). (PDF) from the original on 21 May 2013. Retrieved 4 November 2013.
  33. ^ Schwartz, Lane (2008). Multi-Source Translation Methods (PDF). Paper presented at the 8th Biennial Conference of the Association for Machine Translation in the Americas. (PDF) from the original on 29 June 2016. Retrieved 3 November 2017.
  34. ^ Cohn, Trevor; Lapata, Mirella (2007). Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora (PDF). Paper presented at the 45th Annual Meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague, Czech Republic. (PDF) from the original on 10 October 2015. Retrieved 3 February 2015.
  35. ^ Nakov, Preslav; Ng, Hwee Tou (2012). "Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages". Journal of Artificial Intelligence Research. 44: 179–222. doi:10.1613/jair.3540.
  36. ^ a b c Vossen, Piek: Ontologies. In: Mitkov, Ruslan (ed.) (2003): Handbook of Computational Linguistics, Chapter 25. Oxford: Oxford University Press.
  37. ^ Knight, Kevin (1993). "Building a Large Ontology for Machine Translation". Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21–24, 1993. Princeton, New Jersey: Association for Computational Linguistics. pp. 185–190. doi:10.3115/1075671.1075713. ISBN 978-1-55860-324-0.
  38. ^ Knight, Kevin; Luk, Steve K. (1994). Building a Large-Scale Knowledge Base for Machine Translation. Paper presented at the Twelfth National Conference on Artificial Intelligence. arXiv:cmp-lg/9407029.
  39. ^ Melby, Alan. The Possibility of Language (Amsterdam:Benjamins, 1995, 27–41). Benjamins.com. 1995. ISBN 9789027216144. from the original on 25 May 2011. Retrieved 12 June 2012.
  40. ^ Wooten, Adam (14 February 2006). "A Simple Model Outlining Translation Technology". T&I Business. Archived from the original on 16 July 2012. Retrieved 12 June 2012.
  41. ^ (PDF). Archived from the original (PDF) on 28 September 2018. Retrieved 12 June 2012.
  42. ^ "Human quality machine translation solution by Ta with you" (in Spanish). Tauyou.com. 15 April 2009. from the original on 22 September 2009. Retrieved 12 June 2012.
  43. ^ "Google Translate Adds 20 Languages To Augmented Reality App". Popular Science. 30 July 2015. Retrieved 9 January 2023.
  44. ^ Whitney, Lance. "Google Translate app update said to make speech-to-text even easier". CNET. Retrieved 9 January 2023.
  45. ^ "Machine Translation Service". 5 August 2011. from the original on 8 September 2013. Retrieved 13 September 2013.
  46. ^ Wilson, Kyle (8 May 2019). "Wikipedia has a Google Translate problem". The Verge. Retrieved 9 January 2023.
  47. ^ "Wikipedia taps Google to help editors translate articles". VentureBeat. 9 January 2019. Retrieved 9 January 2023.
  48. ^ "Content translation tool helps create over half a million Wikipedia articles". Wikimedia Foundation. 23 September 2019. Retrieved 10 January 2023.
  49. ^ Magazine, Undark (12 August 2021). "Wikipedia Has a Language Problem. Here's How To Fix It". Undark Magazine. Retrieved 9 January 2023.
  50. ^ "List of Wikipedias - Meta". meta.wikimedia.org. Retrieved 9 January 2023.
  51. ^ Gallafent, Alex (26 April 2011). "Machine Translation for the Military". PRI's the World. from the original on 9 May 2013. Retrieved 17 September 2013.
  52. ^ Jackson, William (9 September 2003). "GCN – Air force wants to build a universal translator". Gcn.com. from the original on 16 June 2011. Retrieved 12 June 2012.
  53. ^ Young-sil, Yoon (26 June 2023). "Korean Games Growing in Popularity in Tough Japanese Game Market". BusinessKorea. Retrieved 8 August 2023.
  54. ^ Automatic Language Processing Advisory Committee, Division of Behavioral Sciences, National Academy of Sciences, National Research Council (1966). Language and Machines: Computers in Translation and Linguistics (PDF) (Report). Washington, D. C.: National Research Council, National Academy of Sciences. (PDF) from the original on 21 October 2013. Retrieved 21 October 2013.{{cite report}}: CS1 maint: multiple names: authors list (link)
  55. ^ Randhawa, Gurdeeshpal; Ferreyra, Mariella; Ahmed, Rukhsana; Ezzat, Omar; Pottie, Kevin (April 2013). "Using machine translation in clinical practice". Canadian Family Physician. 59 (4): 382–383. PMC 3625087. PMID 23585608. from the original on 4 May 2013. Retrieved 21 October 2013.
  56. ^ Gutherz, Gai; Gordin, Shai; Sáenz, Luis; Levy, Omer; Berant, Jonathan (2 May 2023). Kearns, Michael (ed.). "Translating Akkadian to English with neural machine translation". PNAS Nexus. 2 (5): pgad096. doi:10.1093/pnasnexus/pgad096. ISSN 2752-6542. PMC 10153418. PMID 37143863.
  57. ^ a b Way, Andy; Nano Gough (20 September 2005). "Comparing Example-Based and Statistical Machine Translation". Natural Language Engineering. 11 (3): 295–309. doi:10.1017/S1351324905003888. S2CID 3242163.
  58. ^ Muegge (2006), "Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study 17 October 2011 at the Wayback Machine," in Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, 16–17 November 2006, London, London: Aslib. ISBN 978-0-85142-483-5.
  59. ^ . Morphologic.hu. Archived from the original on 19 April 2012. Retrieved 12 June 2012.
  60. ^ Anderson, D.D. (1995). Machine translation as a tool in second language learning 4 January 2018 at the Wayback Machine. CALICO Journal. 13(1). 68–96.
  61. ^ Han et al. (2012), "LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors 4 January 2018 at the Wayback Machine," in Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, India.
  62. ^ J.M. Cohen observes (p.14): "Scientific translation is the aim of an age that would reduce all activities to techniques. It is impossible however to imagine a literary-translation machine less complex than the human brain itself, with all its knowledge, reading, and discrimination."
  63. ^ See the annually performed NIST tests since 2001 22 March 2009 at the Wayback Machine and Bilingual Evaluation Understudy
  64. ^ Abadi, Mark. "4 times Google Translate totally dropped the ball". Business Insider.
  65. ^ "回数を重ねるほど狂っていく Google翻訳で「えぐ」を英訳すると奇妙な世界に迷い込むと話題に". ねとらぼ.
  66. ^ "えぐ" – via www.youtube.com.
  67. ^ a b Zhao, L., Kipper, K., Schuler, W., Vogler, C., & Palmer, M. (2000). A Machine Translation System from English to American Sign Language 20 July 2018 at the Wayback Machine. Lecture Notes in Computer Science, 1934: 54–67.
  68. ^ "Machine Translation: No Copyright On The Result?". SEO Translator, citing Zimbabwe Independent. from the original on 29 November 2012. Retrieved 24 November 2012.

Further reading edit

  • Cohen, J. M. (1986), "Translation", Encyclopedia Americana, vol. 27, pp. 12–15
  • Hutchins, W. John; Somers, Harold L. (1992). An Introduction to Machine Translation. London: Academic Press. ISBN 0-12-362830-X.
  • Lewis-Kraus, Gideon (7 June 2015). "Tower of Babble". New York Times Magazine. pp. 48–52.
  • Weber, Steven; Mehandru, Nikita (2022). "The 2020s Political Economy of Machine Translation". Business and Politics. 24 (1): 96–112. arXiv:2011.01007. doi:10.1017/bap.2021.17. S2CID 226236853.

External links edit

  • The Advantages and Disadvantages of Machine Translation
  • International Association for Machine Translation (IAMT) 24 June 2010 at the Wayback Machine
  • Machine Translation Archive 1 April 2019 at the Wayback Machine by John Hutchins. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology
  • Machine translation (computer-based translation) – Publications by John Hutchins (includes PDFs of several books on machine translation)
  • John Hutchins 1999 7 September 2007 at the Wayback Machine
  • Slator News & analysis of the latest developments in machine translation
  • From Classroom to Real World: How Machine Translation is Changing the Landscape of Foreign Language Learning

machine, translation, confused, with, computer, assisted, translation, interactive, machine, translation, translator, computing, either, rule, based, probabilistic, statistical, most, recently, neural, network, based, machine, learning, approaches, translation. Not to be confused with Computer assisted translation Interactive machine translation or Translator computing Machine translation is use of either rule based or probabilistic i e statistical and most recently neural network based machine learning approaches to translation of text or speech from one language to another including the contextual idiomatic and pragmatic nuances of both languages A mobile phone app translating Spanish text into English Contents 1 History 1 1 Origins 1 2 1950s 1 3 1960 1975 1 4 1975 and beyond 2 Approaches 2 1 Rule based 2 1 1 Transfer based machine translation 2 1 2 Interlingual 2 1 3 Dictionary based 2 2 Statistical 2 3 Neural MT 3 Major issues 3 1 Disambiguation 3 2 Non standard speech 3 3 Named entities 4 Translation from multiparallel sources 5 Ontologies in MT 5 1 Building ontologies 6 Applications 6 1 Travel 6 2 Public administration 6 3 Wikipedia 6 4 Surveillance and military 6 5 Social media 6 5 1 Online games 6 6 Medicine 6 7 Ancient languages 7 Evaluation 8 Machine translation and signed languages 9 Copyright 10 See also 11 Notes 12 Further reading 13 External linksHistory editMain article History of machine translation Origins edit The origins of machine translation can be traced back to the work of Al Kindi a ninth century Arabic cryptographer who developed techniques for systemic language translation including cryptanalysis frequency analysis and probability and statistics which are used in modern machine translation 1 The idea of machine translation later appeared in the 17th century In 1629 Rene Descartes proposed a universal language with equivalent ideas in different tongues sharing one symbol 2 The idea of using digital computers for translation of natural languages was proposed as early as 1947 by England s A D Booth 3 and Warren Weaver at Rockefeller Foundation in the same year The memorandum written by Warren Weaver in 1949 is perhaps the single most influential publication in the earliest days of machine translation 4 5 Others followed A demonstration was made in 1954 on the APEXC machine at Birkbeck College University of London of a rudimentary translation of English into French Several papers on the topic were published at the time and even articles in popular journals for example an article by Cleave and Zacharov in the September 1955 issue of Wireless World A similar application also pioneered at Birkbeck College at the time was reading and composing Braille texts by computer 1950s edit The first researcher in the field Yehoshua Bar Hillel began his research at MIT 1951 A Georgetown University MT research team led by Professor Michael Zarechnak followed 1951 with a public demonstration of its Georgetown IBM experiment system in 1954 MT research programs popped up in Japan 6 7 and Russia 1955 and the first MT conference was held in London 1956 8 9 David G Hays wrote about computer assisted language processing as early as 1957 and was project leader on computational linguistics at Rand from 1955 to 1968 10 1960 1975 edit Researchers continued to join the field as the Association for Machine Translation and Computational Linguistics was formed in the U S 1962 and the National Academy of Sciences formed the Automatic Language Processing Advisory Committee ALPAC to study MT 1964 Real progress was much slower however and after the ALPAC report 1966 which found that the ten year long research had failed to fulfill expectations funding was greatly reduced 11 According to a 1972 report by the Director of Defense Research and Engineering DDR amp E the feasibility of large scale MT was reestablished by the success of the Logos MT system in translating military manuals into Vietnamese during that conflict The French Textile Institute also used MT to translate abstracts from and into French English German and Spanish 1970 Brigham Young University started a project to translate Mormon texts by automated translation 1971 1975 and beyond edit SYSTRAN which pioneered the field under contracts from the U S government 12 in the 1960s was used by Xerox to translate technical manuals 1978 Beginning in the late 1980s as computational power increased and became less expensive more interest was shown in statistical models for machine translation MT became more popular after the advent of computers 13 SYSTRAN s first implementation system was implemented in 1988 by the online service of the French Postal Service called Minitel 14 Various computer based translation companies were also launched including Trados 1984 which was the first to develop and market Translation Memory technology 1989 though this is not the same as MT The first commercial MT system for Russian English German Ukrainian was developed at Kharkov State University 1991 By 1998 for as little as 29 95 one could buy a program for translating in one direction between English and a major European language of your choice to run on a PC 12 MT on the web started with SYSTRAN offering free translation of small texts 1996 and then providing this via AltaVista Babelfish 12 which racked up 500 000 requests a day 1997 15 The second free translation service on the web was Lernout amp Hauspie s GlobaLink 12 Atlantic Magazine wrote in 1998 that Systran s Babelfish and GlobaLink s Comprende handled Don t bank on it with a competent performance 16 Franz Josef Och the future head of Translation Development AT Google won DARPA s speed MT competition 2003 17 More innovations during this time included MOSES the open source statistical MT engine 2007 a text SMS translation service for mobiles in Japan 2008 and a mobile phone with built in speech to speech translation functionality for English Japanese and Chinese 2009 In 2012 Google announced that Google Translate translates roughly enough text to fill 1 million books in one day Approaches editSee also Hybrid machine translation and Example based machine translation Before the advent of deep learning methods statistical methods required a lot of rules accompanied by morphological syntactic and semantic annotations Rule based edit Main article Rule based machine translation The rule based machine translation approach was used mostly in the creation of dictionaries and grammar programs Its biggest downfall was that everything had to be made explicit orthographical variation and erroneous input must be made part of the source language analyser in order to cope with it and lexical selection rules must be written for all instances of ambiguity Transfer based machine translation edit Main article Transfer based machine translation Transfer based machine translation was similar to interlingual machine translation in that it created a translation from an intermediate representation that simulated the meaning of the original sentence Unlike interlingual MT it depended partially on the language pair involved in the translation Interlingual edit Main article Interlingual machine translation Interlingual machine translation was one instance of rule based machine translation approaches In this approach the source language i e the text to be translated was transformed into an interlingual language i e a language neutral representation that is independent of any language The target language was then generated out of the interlingua The only interlingual machine translation system that was made operational at the commercial level was the KANT system Nyberg and Mitamura 1992 which was designed to translate Caterpillar Technical English CTE into other languages Dictionary based edit Main article Dictionary based machine translation Machine translation used a method based on dictionary entries which means that the words were translated as they are by a dictionary Statistical edit Statistical machine translation tried to generate translations using statistical methods based on bilingual text corpora such as the Canadian Hansard corpus the English French record of the Canadian parliament and EUROPARL the record of the European Parliament Where such corpora were available good results were achieved translating similar texts but such corpora were rare for many language pairs The first statistical machine translation software was CANDIDE from IBM In 2005 Google improved its internal translation capabilities by using approximately 200 billion words from United Nations materials to train their system translation accuracy improved 18 SMT s biggest downfall included it being dependent upon huge amounts of parallel texts its problems with morphology rich languages especially with translating into such languages and its inability to correct singleton errors Neural MT edit Main article Neural machine translation A deep learning based approach to MT neural machine translation has made rapid progress in recent years However current consensus is that the so called human parity achieved is not real being based wholly on limited domains language pairs and certain test benchmarks 19 i e it lacks statistical significance power 20 Translations by neural MT tools like DeepL Translator which is thought to usually deliver the best machine translation results as of 2022 typically still need post editing by a human 21 22 23 Prompt engineering is required in order to steer the GPT 3 generated translations 24 25 Major issues edit nbsp Machine translation could produce some non understandable phrases such as 鸡枞 Macrolepiota albuminosa being rendered as Wikipedia nbsp Broken Chinese 沒有進入 from machine translation in Bali Indonesia The broken Chinese sentence sounds like there does not exist an entry or have not entered yet Studies using human evaluation e g by professional literary translators or human readers have systematically identified various issues with the latest advanced MT outputs 25 Common issues include the translation of ambiguous parts whose correct translation requires common sense like semantic language processing or context 25 There can also be errors in the source texts missing high quality training data and the severity of frequency of several types of problems may not get reduced with techniques used to date requiring some level of human active participation Disambiguation edit Main articles Word sense disambiguation and Syntactic disambiguation Word sense disambiguation concerns finding a suitable translation when a word can have more than one meaning The problem was first raised in the 1950s by Yehoshua Bar Hillel 26 He pointed out that without a universal encyclopedia a machine would never be able to distinguish between the two meanings of a word 27 Today there are numerous approaches designed to overcome this problem They can be approximately divided into shallow approaches and deep approaches Shallow approaches assume no knowledge of the text They simply apply statistical methods to the words surrounding the ambiguous word Deep approaches presume a comprehensive knowledge of the word So far shallow approaches have been more successful 28 Claude Piron a long time translator for the United Nations and the World Health Organization wrote that machine translation at its best automates the easier part of a translator s job the harder and more time consuming part usually involves doing extensive research to resolve ambiguities in the source text which the grammatical and lexical exigencies of the target language require to be resolved Why does a translator need a whole workday to translate five pages and not an hour or two About 90 of an average text corresponds to these simple conditions But unfortunately there s the other 10 It s that part that requires six more hours of work There are ambiguities one has to resolve For instance the author of the source text an Australian physician cited the example of an epidemic which was declared during World War II in a Japanese prisoners of war camp Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners The English has two senses It s necessary therefore to do research maybe to the extent of a phone call to Australia 29 The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own but this would require a higher degree of AI than has yet been attained A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions based perhaps on which kind of prisoner of war camp is more often mentioned in a given corpus would have a reasonable chance of guessing wrong fairly often A shallow approach that involves ask the user about each ambiguity would by Piron s estimate only automate about 25 of a professional translator s job leaving the harder 75 still to be done by a human Non standard speech edit One of the major pitfalls of MT is its inability to translate non standard language with the same accuracy as standard language Heuristic or statistical based MT takes input from various sources in standard form of a language Rule based translation by nature does not include common non standard usages This causes errors in translation from a vernacular source or into colloquial language Limitations on translation from casual speech present issues in the use of machine translation in mobile devices Named entities edit Main article Named entity In information extraction named entities in a narrow sense refer to concrete or abstract entities in the real world such as people organizations companies and places that have a proper name George Washington Chicago Microsoft It also refers to expressions of time space and quantity such as 1 July 2011 500 In the sentence Smith is the president of Fabrionix both Smith and Fabrionix are named entities and can be further qualified via first name or other information president is not since Smith could have earlier held another position at Fabrionix e g Vice President The term rigid designator is what defines these usages for analysis in statistical machine translation Named entities must first be identified in the text if not they may be erroneously translated as common nouns which would most likely not affect the BLEU rating of the translation but would change the text s human readability 30 They may be omitted from the output translation which would also have implications for the text s readability and message Transliteration includes finding the letters in the target language that most closely correspond to the name in the source language This however has been cited as sometimes worsening the quality of translation 31 For Southern California the first word should be translated directly while the second word should be transliterated Machines often transliterate both because they treated them as one entity Words like these are hard for machine translators even those with a transliteration component to process Use of a do not translate list which has the same end goal transliteration as opposed to translation 32 still relies on correct identification of named entities A third approach is a class based model Named entities are replaced with a token to represent their class Ted and Erica would both be replaced with person class token Then the statistical distribution and use of person names in general can be analyzed instead of looking at the distributions of Ted and Erica individually so that the probability of a given name in a specific language will not affect the assigned probability of a translation A study by Stanford on improving this area of translation gives the examples that different probabilities will be assigned to David is going for a walk and Ankit is going for a walk for English as a target language due to the different number of occurrences for each name in the training data A frustrating outcome of the same study by Stanford and other attempts to improve named recognition translation is that many times a decrease in the BLEU scores for translation will result from the inclusion of methods for named entity translation 32 Somewhat related are the phrases drinking tea with milk vs drinking tea with Molly Translation from multiparallel sources editSome work has been done in the utilization of multiparallel corpora that is a body of text that has been translated into 3 or more languages Using these methods a text that has been translated into 2 or more languages may be utilized in combination to provide a more accurate translation into a third language compared with if just one of those source languages were used alone 33 34 35 Ontologies in MT editAn ontology is a formal representation of knowledge that includes the concepts such as objects processes etc in a domain and some relations between them If the stored information is of linguistic nature one can speak of a lexicon 36 In NLP ontologies can be used as a source of knowledge for machine translation systems With access to a large knowledge base systems can be enabled to resolve many especially lexical ambiguities on their own In the following classic examples as humans we are able to interpret the prepositional phrase according to the context because we use our world knowledge stored in our lexicons I saw a man star molecule with a microscope telescope binoculars 36 A machine translation system initially would not be able to differentiate between the meanings because syntax does not change With a large enough ontology as a source of knowledge however the possible interpretations of ambiguous words in a specific context can be reduced Other areas of usage for ontologies within NLP include information retrieval information extraction and text summarization 36 Building ontologies edit The ontology generated for the PANGLOSS knowledge based machine translation system in 1993 may serve as an example of how an ontology for NLP purposes can be compiled 37 38 A large scale ontology is necessary to help parsing in the active modules of the machine translation system In the PANGLOSS example about 50 000 nodes were intended to be subsumed under the smaller manually built upper abstract region of the ontology Because of its size it had to be created automatically The goal was to merge the two resources LDOCE online and WordNet to combine the benefits of both concise definitions from Longman and semantic relations allowing for semi automatic taxonomization to the ontology from WordNet A definition match algorithm was created to automatically merge the correct meanings of ambiguous words between the two online resources based on the words that the definitions of those meanings have in common in LDOCE and WordNet Using a similarity matrix the algorithm delivered matches between meanings including a confidence factor This algorithm alone however did not match all meanings correctly on its own A second hierarchy match algorithm was therefore created which uses the taxonomic hierarchies found in WordNet deep hierarchies and partially in LDOCE flat hierarchies This works by first matching unambiguous meanings then limiting the search space to only the respective ancestors and descendants of those matched meanings Thus the algorithm matched locally unambiguous meanings for instance while the word seal as such is ambiguous there is only one meaning of seal in the animal subhierarchy Both algorithms complemented each other and helped constructing a large scale ontology for the machine translation system The WordNet hierarchies coupled with the matching definitions of LDOCE were subordinated to the ontology s upper region As a result the PANGLOSS MT system was able to make use of this knowledge base mainly in its generation element Applications editWhile no system provides the ideal of fully automatic high quality machine translation of unrestricted text many fully automated systems produce reasonable output 39 40 41 The quality of machine translation is substantially improved if the domain is restricted and controlled 42 This enables using machine translation as a tool to speed up and simplify translations as well as producing flawed but useful low cost or ad hoc translations Travel edit Machine translation applications have also been released for most mobile devices including mobile telephones pocket PCs PDAs etc Due to their portability such instruments have come to be designated as mobile translation tools enabling mobile business networking between partners speaking different languages or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator For example the Google Translate app allows foreigners to quickly translate text in their surrounding via augmented reality using the smartphone camera that overlays the translated text onto the text 43 It can also recognize speech and then translate it 44 Public administration edit Despite their inherent limitations MT programs are used around the world Probably the largest institutional user is the European Commission In the 2012 with an aim to replace a rule based MT by newer statistical based MT EC The European Commission contributed 3 072 million euros via its ISA programme 45 Wikipedia edit Machine translation has also been used for translating Wikipedia articles and could play a larger role in creating updating expanding and generally improving articles in the future especially as the MT capabilities may improve There is a content translation tool which allows editors to more easily translate articles across several select languages 46 47 48 English language articles are thought to usually be more comprehensive and less biased than their non translated equivalents in other languages 49 As of 2022 English Wikipedia has over 6 5 million articles while the German and Swedish Wikipedias each only have over 2 5 million articles 50 each often far less comprehensive Surveillance and military edit Following terrorist attacks in Western countries including 9 11 the U S and its allies have been most interested in developing Arabic machine translation programs but also in translating Pashto and Dari languages citation needed Within these languages the focus is on key phrases and quick communication between military members and civilians through the use of mobile phone apps 51 The Information Processing Technology Office in DARPA hosted programs like TIDES and Babylon translator US Air Force has awarded a 1 million contract to develop a language translation technology 52 Social media edit The notable rise of social networking on the web in recent years has created yet another niche for the application of machine translation software in utilities such as Facebook or instant messaging clients such as Skype GoogleTalk MSN Messenger etc allowing users speaking different languages to communicate with each other Online games edit Lineage W gained popularity in Japan because of its machine translation features allowing players from different countries to communicate 53 Medicine edit Despite being labelled as an unworthy competitor to human translation in 1966 by the Automated Language Processing Advisory Committee put together by the United States government 54 the quality of machine translation has now been improved to such levels that its application in online collaboration and in the medical field are being investigated The application of this technology in medical settings where human translators are absent is another topic of research but difficulties arise due to the importance of accurate translations in medical diagnoses 55 Ancient languages edit The advancements in convolutional neural networks in recent years and in low resource machine translation when only a very limited amout of data and examples are available for training enabled machine translation for ancient languages such as Akkadian and its dialects Babylonian and Assyrian 56 Evaluation editMain article Evaluation of machine translation There are many factors that affect how machine translation systems are evaluated These factors include the intended use of the translation the nature of the machine translation software and the nature of the translation process Different programs may work well for different purposes For example statistical machine translation SMT typically outperforms example based machine translation EBMT but researchers found that when evaluating English to French translation EBMT performs better 57 The same concept applies for technical documents which can be more easily translated by SMT because of their formal language In certain applications however e g product descriptions written in a controlled language a dictionary based machine translation system has produced satisfactory translations that require no human intervention save for quality inspection 58 There are various means for evaluating the output quality of machine translation systems The oldest is the use of human judges 59 to assess a translation s quality Even though human evaluation is time consuming it is still the most reliable method to compare different systems such as rule based and statistical systems 60 Automated means of evaluation include BLEU NIST METEOR and LEPOR 61 Relying exclusively on unedited machine translation ignores the fact that communication in human language is context embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability It is certainly true that even purely human generated translations are prone to error Therefore to ensure that a machine generated translation will be useful to a human being and that publishable quality translation is achieved such translations must be reviewed and edited by a human 62 The late Claude Piron wrote that machine translation at its best automates the easier part of a translator s job the harder and more time consuming part usually involves doing extensive research to resolve ambiguities in the source text which the grammatical and lexical exigencies of the target language require to be resolved Such research is a necessary prelude to the pre editing necessary in order to provide input for machine translation software such that the output will not be meaningless 63 In addition to disambiguation problems decreased accuracy can occur due to varying levels of training data for machine translating programs Both example based and statistical machine translation rely on a vast array of real example sentences as a base for translation and when too many or too few sentences are analyzed accuracy is jeopardized Researchers found that when a program is trained on 203 529 sentence pairings accuracy actually decreases 57 The optimal level of training data seems to be just over 100 000 sentences possibly because as training data increases the number of possible sentences increases making it harder to find an exact translation match Flaws in machine translation have been noted for their entertainment value Two videos uploaded to YouTube in April 2017 involve two Japanese hiragana characters えぐ e and gu being repeatedly pasted into Google Translate with the resulting translations quickly degrading into nonsensical phrases such as DECEARING EGG and Deep sea squeeze trees which are then read in increasingly absurd voices 64 65 the full length version of the video currently has 6 9 million views as of March 2022 66 Machine translation and signed languages editMain article Machine translation of sign languages In the early 2000s options for machine translation between spoken and signed languages were severely limited It was a common belief that deaf individuals could use traditional translators However stress intonation pitch and timing are conveyed much differently in spoken languages compared to signed languages Therefore a deaf individual may misinterpret or become confused about the meaning of written text that is based on a spoken language 67 Researchers Zhao et al 2000 developed a prototype called TEAM translation from English to ASL by machine that completed English to American Sign Language ASL translations The program would first analyze the syntactic grammatical and morphological aspects of the English text Following this step the program accessed a sign synthesizer which acted as a dictionary for ASL This synthesizer housed the process one must follow to complete ASL signs as well as the meanings of these signs Once the entire text is analyzed and the signs necessary to complete the translation are located in the synthesizer a computer generated human appeared and would use ASL to sign the English text to the user 67 Copyright editOnly works that are original are subject to copyright protection so some scholars claim that machine translation results are not entitled to copyright protection because MT does not involve creativity 68 The copyright at issue is for a derivative work the author of the original work in the original language does not lose his rights when a work is translated a translator must have permission to publish a translation See also editAI complete Cache language model Comparison of machine translation applications Comparison of different machine translation approaches Computational linguistics Computer assisted translation and Translation memory Controlled language in machine translation Controlled natural language Foreign language writing aid Fuzzy matching History of machine translation Human language technology Humour in translation howlers Language and Communication Technologies Language barrier List of emerging technologies List of research laboratories for machine translation Mobile translation Neural machine translation OpenLogos Phraselator Postediting Pseudo translation Round trip translation Statistical machine translation Translation Machine translation Translation memory ULTRA machine translation system Universal Networking Language Universal translatorNotes edit DuPont Quinn January 2018 The Cryptological Origins of Machine Translation From al Kindi to Weaver Amodern Archived from the original on 14 August 2019 Retrieved 2 September 2019 Knowlson James 1975 Universal Language Schemes in England and France 1600 1800 Toronto University of Toronto Press ISBN 0 8020 5296 7 Booth Andrew D 1 May 1953 MECHANICAL TRANSLATION Computers and Automation 1953 05 Vol 2 Iss 4 Internet Archive Berkeley Enterprises p 6 J Hutchins 2000 Warren Weaver and the launching of MT Early Years in Machine Translation PDF Studies in the History of the Language Sciences Vol 97 p 17 doi 10 1075 sihols 97 05hut ISBN 978 90 272 4586 1 S2CID 163460375 Archived from the original PDF on 28 February 2020 via Semantic Scholar Warren Weaver American mathematician 13 July 2020 Archived from the original on 6 March 2021 Retrieved 7 August 2020 上野 俊夫 13 August 1986 パーソナルコンピュータによる機械翻訳プログラムの制作 in Japanese Tokyo 株 ラッセル社 p 16 ISBN 494762700X わが国では1956年 当時の電気試験所が英和翻訳専用機 ヤマト を実験している この機械は1962年頃には中学1年の教科書で90点以上の能力に達したと報告されている translation assisted by Google Translate In 1959 Japan the National Institute of Advanced Industrial Science and Technology AIST tested the proper English Japanese translation machine Yamato which reported in 1964 as that reached the power level over the score of 90 point on the textbook of first grade of junior hi school 機械翻訳専用機 やまと コンピュータ博物館 Archived from the original on 19 October 2016 Retrieved 4 April 2017 Nye Mary Jo 2016 Speaking in Tongues Science s centuries long hunt for a common language Distillations 2 1 40 43 Archived from the original on 3 August 2020 Retrieved 20 March 2018 Gordin Michael D 2015 Scientific Babel How Science Was Done Before and After Global English Chicago Illinois University of Chicago Press ISBN 9780226000299 Wolfgang Saxon 28 July 1995 David G Hays 66 a Developer Of Language Study by Computer The New York Times Archived from the original on 7 February 2020 Retrieved 7 August 2020 wrote about computer assisted language processing as early as 1957 was project leader on computational linguistics at Rand from 1955 to 1968 上野 俊夫 13 August 1986 パーソナルコンピュータによる機械翻訳プログラムの制作 in Japanese Tokyo 株 ラッセル社 p 16 ISBN 494762700X a b c d Budiansky Stephen December 1998 Lost in Translation Atlantic Magazine pp 81 84 Schank Roger C 2014 Conceptual Information Processing New York Elsevier p 5 ISBN 9781483258799 Farwell David Gerber Laurie Hovy Eduard 29 June 2003 Machine Translation and the Information Soup Third Conference of the Association for Machine Translation in the Americas AMTA 98 Langhorne PA USA October 28 31 1998 Proceedings Berlin Springer p 276 ISBN 3540652590 Barron Brenda 18 November 2019 Babel Fish What Happened To The Original Translation Application We Investigate Digital com Archived from the original on 20 November 2019 Retrieved 22 November 2019 and gave other examples too Chan Sin Wai 2015 Routledge Encyclopedia of Translation Technology Oxon Routledge p 385 ISBN 9780415524841 Google Translator The Universal Language Blog outer court com 25 January 2007 Archived from the original on 20 November 2008 Retrieved 12 June 2012 Antonio Toral Sheila Castilho Ke Hu and Andy Way 2018 Attaining the unattainable reassessing claims of human parity in neural machine translation CoRR abs 1808 10432 Yvette Graham Barry Haddow Koehn Philipp 2019 Translationese in Machine Translation Evaluation arXiv 1906 09833 cs CL Katsnelson Alla 29 August 2022 Poor English skills New AIs help researchers to write better Nature 609 7925 208 209 Bibcode 2022Natur 609 208K doi 10 1038 d41586 022 02767 9 PMID 36038730 S2CID 251931306 Korab Petr 18 February 2022 DeepL An Exceptionally Magnificent Language Translator Medium Retrieved 9 January 2023 DeepL outperforms Google Translate DW 12 05 2018 Deutsche Welle Retrieved 9 January 2023 Fadelli Ingrid Study assesses the quality of AI literary translations by comparing them with human translations techxplore com Retrieved 18 December 2022 a b c Thai Katherine Karpinska Marzena Krishna Kalpesh Ray Bill Inghilleri Moira Wieting John Iyyer Mohit 25 October 2022 Exploring Document Level Literary Machine Translation with Parallel Paragraphs from World Literature arXiv 2210 14250 cs CL Milestones in machine translation No 6 Bar Hillel and the nonfeasibility of FAHQT Archived 12 March 2007 at the Wayback Machine by John Hutchins Bar Hillel 1960 Automatic Translation of Languages Available online at http www mt archive info Bar Hillel 1960 pdf Archived 28 September 2011 at the Wayback Machine Hybrid approaches to machine translation Costa jussa Marta R Rapp Reinhard Lambert Patrik Eberle Kurt Banchs Rafael E Babych Bogdan Switzerland 21 July 2016 ISBN 9783319213101 OCLC 953581497 a href Template Cite book html title Template Cite book cite book a CS1 maint location missing publisher link CS1 maint others link Claude Piron Le defi des langues The Language Challenge Paris L Harmattan 1994 Babych Bogdan Hartley Anthony 2003 Improving Machine Translation Quality with Automatic Named Entity Recognition PDF Paper presented at the 7th International EAMT Workshop on MT and Other Language Technology Tools Archived from the original PDF on 14 May 2006 Retrieved 4 November 2013 Hermajakob U Knight K amp Hal D 2008 Name Translation in Statistical Machine Translation Learning When to Transliterate Archived 4 January 2018 at the Wayback Machine Association for Computational Linguistics 389 397 a b Neeraj Agrawal Ankush Singla Using Named Entity Recognition to improve Machine Translation PDF Archived PDF from the original on 21 May 2013 Retrieved 4 November 2013 Schwartz Lane 2008 Multi Source Translation Methods PDF Paper presented at the 8th Biennial Conference of the Association for Machine Translation in the Americas Archived PDF from the original on 29 June 2016 Retrieved 3 November 2017 Cohn Trevor Lapata Mirella 2007 Machine Translation by Triangulation Making Effective Use of Multi Parallel Corpora PDF Paper presented at the 45th Annual Meeting of the Association for Computational Linguistics June 23 30 2007 Prague Czech Republic Archived PDF from the original on 10 October 2015 Retrieved 3 February 2015 Nakov Preslav Ng Hwee Tou 2012 Improving Statistical Machine Translation for a Resource Poor Language Using Related Resource Rich Languages Journal of Artificial Intelligence Research 44 179 222 doi 10 1613 jair 3540 a b c Vossen Piek Ontologies In Mitkov Ruslan ed 2003 Handbook of Computational Linguistics Chapter 25 Oxford Oxford University Press Knight Kevin 1993 Building a Large Ontology for Machine Translation Human Language Technology Proceedings of a Workshop Held at Plainsboro New Jersey March 21 24 1993 Princeton New Jersey Association for Computational Linguistics pp 185 190 doi 10 3115 1075671 1075713 ISBN 978 1 55860 324 0 Knight Kevin Luk Steve K 1994 Building a Large Scale Knowledge Base for Machine Translation Paper presented at the Twelfth National Conference on Artificial Intelligence arXiv cmp lg 9407029 Melby Alan The Possibility of Language Amsterdam Benjamins 1995 27 41 Benjamins com 1995 ISBN 9789027216144 Archived from the original on 25 May 2011 Retrieved 12 June 2012 Wooten Adam 14 February 2006 A Simple Model Outlining Translation Technology T amp I Business Archived from the original on 16 July 2012 Retrieved 12 June 2012 Appendix III of The present status of automatic translation of languages Advances in Computers vol 1 1960 p 158 163 Reprinted in Y Bar Hillel Language and information Reading Mass Addison Wesley 1964 p 174 179 PDF Archived from the original PDF on 28 September 2018 Retrieved 12 June 2012 Human quality machine translation solution by Ta with you in Spanish Tauyou com 15 April 2009 Archived from the original on 22 September 2009 Retrieved 12 June 2012 Google Translate Adds 20 Languages To Augmented Reality App Popular Science 30 July 2015 Retrieved 9 January 2023 Whitney Lance Google Translate app update said to make speech to text even easier CNET Retrieved 9 January 2023 Machine Translation Service 5 August 2011 Archived from the original on 8 September 2013 Retrieved 13 September 2013 Wilson Kyle 8 May 2019 Wikipedia has a Google Translate problem The Verge Retrieved 9 January 2023 Wikipedia taps Google to help editors translate articles VentureBeat 9 January 2019 Retrieved 9 January 2023 Content translation tool helps create over half a million Wikipedia articles Wikimedia Foundation 23 September 2019 Retrieved 10 January 2023 Magazine Undark 12 August 2021 Wikipedia Has a Language Problem Here s How To Fix It Undark Magazine Retrieved 9 January 2023 List of Wikipedias Meta meta wikimedia org Retrieved 9 January 2023 Gallafent Alex 26 April 2011 Machine Translation for the Military PRI s the World Archived from the original on 9 May 2013 Retrieved 17 September 2013 Jackson William 9 September 2003 GCN Air force wants to build a universal translator Gcn com Archived from the original on 16 June 2011 Retrieved 12 June 2012 Young sil Yoon 26 June 2023 Korean Games Growing in Popularity in Tough Japanese Game Market BusinessKorea Retrieved 8 August 2023 Automatic Language Processing Advisory Committee Division of Behavioral Sciences National Academy of Sciences National Research Council 1966 Language and Machines Computers in Translation and Linguistics PDF Report Washington D C National Research Council National Academy of Sciences Archived PDF from the original on 21 October 2013 Retrieved 21 October 2013 a href Template Cite report html title Template Cite report cite report a CS1 maint multiple names authors list link Randhawa Gurdeeshpal Ferreyra Mariella Ahmed Rukhsana Ezzat Omar Pottie Kevin April 2013 Using machine translation in clinical practice Canadian Family Physician 59 4 382 383 PMC 3625087 PMID 23585608 Archived from the original on 4 May 2013 Retrieved 21 October 2013 Gutherz Gai Gordin Shai Saenz Luis Levy Omer Berant Jonathan 2 May 2023 Kearns Michael ed Translating Akkadian to English with neural machine translation PNAS Nexus 2 5 pgad096 doi 10 1093 pnasnexus pgad096 ISSN 2752 6542 PMC 10153418 PMID 37143863 a b Way Andy Nano Gough 20 September 2005 Comparing Example Based and Statistical Machine Translation Natural Language Engineering 11 3 295 309 doi 10 1017 S1351324905003888 S2CID 3242163 Muegge 2006 Fully Automatic High Quality Machine Translation of Restricted Text A Case Study Archived 17 October 2011 at the Wayback Machine in Translating and the computer 28 Proceedings of the twenty eighth international conference on translating and the computer 16 17 November 2006 London London Aslib ISBN 978 0 85142 483 5 Comparison of MT systems by human evaluation May 2008 Morphologic hu Archived from the original on 19 April 2012 Retrieved 12 June 2012 Anderson D D 1995 Machine translation as a tool in second language learning Archived 4 January 2018 at the Wayback Machine CALICO Journal 13 1 68 96 Han et al 2012 LEPOR A Robust Evaluation Metric for Machine Translation with Augmented Factors Archived 4 January 2018 at the Wayback Machine in Proceedings of the 24th International Conference on Computational Linguistics COLING 2012 Posters pages 441 450 Mumbai India J M Cohen observes p 14 Scientific translation is the aim of an age that would reduce all activities to techniques It is impossible however to imagine a literary translation machine less complex than the human brain itself with all its knowledge reading and discrimination See the annually performed NIST tests since 2001 Archived 22 March 2009 at the Wayback Machine and Bilingual Evaluation Understudy Abadi Mark 4 times Google Translate totally dropped the ball Business Insider 回数を重ねるほど狂っていく Google翻訳で えぐ を英訳すると奇妙な世界に迷い込むと話題に ねとらぼ えぐ via www youtube com a b Zhao L Kipper K Schuler W Vogler C amp Palmer M 2000 A Machine Translation System from English to American Sign Language Archived 20 July 2018 at the Wayback Machine Lecture Notes in Computer Science 1934 54 67 Machine Translation No Copyright On The Result SEO Translator citing Zimbabwe Independent Archived from the original on 29 November 2012 Retrieved 24 November 2012 Further reading editCohen J M 1986 Translation Encyclopedia Americana vol 27 pp 12 15 Hutchins W John Somers Harold L 1992 An Introduction to Machine Translation London Academic Press ISBN 0 12 362830 X Lewis Kraus Gideon 7 June 2015 Tower of Babble New York Times Magazine pp 48 52 Weber Steven Mehandru Nikita 2022 The 2020s Political Economy of Machine Translation Business and Politics 24 1 96 112 arXiv 2011 01007 doi 10 1017 bap 2021 17 S2CID 226236853 External links edit nbsp Wikiversity has learning resources about Topic Computational linguistics The Advantages and Disadvantages of Machine Translation International Association for Machine Translation IAMT Archived 24 June 2010 at the Wayback Machine Machine Translation Archive Archived 1 April 2019 at the Wayback Machine by John Hutchins An electronic repository and bibliography of articles books and papers in the field of machine translation and computer based translation technology Machine translation computer based translation Publications by John Hutchins includes PDFs of several books on machine translation Machine Translation and Minority Languages John Hutchins 1999 Archived 7 September 2007 at the Wayback Machine Slator News amp analysis of the latest developments in machine translation From Classroom to Real World How Machine Translation is Changing the Landscape of Foreign Language Learning Retrieved from https en wikipedia org w index php title Machine translation amp oldid 1180071702, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.