Archive
Setting up European Union translation memories and document corpora for SDL-Trados
-
SDL-Trados installation allows the translation program to teach this industry-standard computer-aided translation application . So far, however, we had no actually translation memory loaded into this translation software.
-
The European Union is a powerhouse for translation and interpreting – at least for the wide range of their member languages many of which are world languages – , and makes some of their resources – which have been set up for translation and interpreting study use here before – available to the community free of charge as reported during a variety of LREC’s.
-
This spring, the Language Technology Group at the Joint Research Centre of the European Union this spring updated their translation memory offer DTG-TM can fill that void at least for the European Languages that have a translation component at UNC-Charlotte.
-
We download on demand (too big to store: http://langtech.jrc.ec.europa.eu/DGT-TM.html#Download)
-
Is the DGT-TM 2011 truly a superset of the 2007, or should both be merged? probably too much work?
-
-
and extract only the language pairs with English and the language only the languages “1”ed here : “G:\myfiles\doc\education\humanities\computer_linguistics\corpus\texts\multi\DGT-tm\DGT-tm_statistics.xlsx” (using “G:\myfiles\doc\education\humanities\computer_linguistics\corpus\texts\multi\DGT-tm\TMXtract.exe”)
-
and convert
-
English is the source language by default, but should be the target language in our programs,
-
The TMX format this translation memory is distributed provided in, should be “upgradeable ” to the SDL Trados Studio 2011/2011 SP1 format in the Upgrade Translation Memories wizard”.,
-
TBA:where is this component?
-
-
-
configure the Trados to load the translation memory
-
how much computing resources does this use up?
-
how do you load a tm?
-
can you load in demand instead of preload all?
-
- Here are the statistics for the translation memories for “our” languages
-
uncc Language Language code Number of units in DGT – release 2007 Number of units in DGT – release 2011 1 English EN 2187504 2286514 1 German DE 532668 1922568 1 Greek EL 371039 1901490 1 Spanish ES 509054 1907649 1 French FR 1106442 1853773 1 Italian IT 542873 1926532 1 Polish PL 1052136 1879469 1 Portuguese PT 945203 1922585 Total 8 8 7246919 15600580
-
-
Would it be of interest to have the document-focused jrc-acquis distribution of the materials underlying the translation materials available on student/teachers TRADOS computers so that sample texts can be loaded for which reliable translation suggestions will be available – this is not certain for texts from all domains – and the use of a translation memory can be trained in under realistic conditions?
-
“The DGT Translation Memory is a collection of translation units, from which the full text cannot be reproduced. The JRC-Acquis is mostly a collection of full texts with additional information on which sentences are aligned with each other.”
-
It remains to be seen how easily one can transfer documents from this distribution into Trados to work with the translation memory
-
Here is where to download:
-
uncc
lang
inc
1
de
1
en
1
es
1
fr
1
it
1
pl
1
pt
-
The JRC-Acquis comes with these statistics:
-
-
uncc
Language ISO code
Number of texts
Total No words
Total No characters
Average No words
1
de
23541
32059892
232748675
1361.87
1
en
23545
34588383
210692059
1469.03
1
es
23573
38926161
238016756
1651.3
1
fr
23627
39100499
234758290
1654.91
1
it
23472
35764670
230677013
1523.72
1
pl
23478
29713003
214464026
1265.57
1
pt
23505
37221668
227499418
1583.56
Total
7
164741
247374276
1588856237
10509.96
-
- What other multi corpora are there (for other domains and other non-European languages)?
How to use visual instead of aural cues during a Sanako oral proficiency exam
- This exam file has been authored with the Sanako Study 1200 TBA:authoring tool. It is displayed from the Sanako tutor application:
- images on a projection screen connected to the teacher computer,
- aural portion through the tutor-controlled Sanako student player and headsets.
- To protect the integrity and allow for reuse of the exam, only the initial instruction, example and collection of the results of an exam with visual cues are shown in this screencast
.
Protected: Sanako Study 1200 Final oral exam for advanced Business Spanish: A Job interview
Independent study with free language learning materials from the FSI?
The Foreign Service Institute language learning materials – consisting of scanned documents and digitized audio of multiple courses per language – were still a heavily-advertised resource when I visited the Defense Language Institute in Monterey in 2006.
It is nice to see these resources be made available for free. It is also nice to see the progress that has been made not only in technological adaptation of textbook learning materials since these materials were made available (post WW II?).
This, however, comes at a cost. If you shun it, and do not take a course that works which requires (and entitles you to the use of) a textbook, here are easily accessibleviewable learning materials for a large set of languages, including many LCTL: Amharic, Arabic, Bulgarian, Cambodian, Cantonese, Chinese, Chinyanja, Czech, Finnish, French, Fula, German, Greek, Hausa, Hebrew, Hindi, Hungarian, Igbo, Italian, Japanese, Kirundi, Kituba, Korean, Lao, Lingala, Luganda, Moré, Polish, Portuguese, Romanian, Russian, Serbo-Croatian, Shona, Sinhala, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Twi, Vietnamese, Yoruba.
The Forums , however seem to indicate that not too many still use these options. The transformation into a (technologically superficially) more modern format here is limited to very few languages and courses (and crashed my web browser).
I2speak.com: Web-based IPA Keyboard
The Sciweavers Team announces http://www.i2speak.com: “an online Smart IPA Keyboard that lets you quickly type IPA phonetics without the need to memorize any symbol code. For every Roman character you type, a popup menu displays a group of phonetic symbols that share the same sound or shape beneath typed character. Use arrow keys to select the proper symbol then hit the Enter button. I2Speak also supports the following features:
1. The Sampa English Keyboard lets you type English phonetics using Roman characters according to SAMPA (Speech Assessment Methods Phonetic Alphabet) rules.
2. The IPA English Keyboard provides you with a full English phonetics keyboard. Press the symbol of interest using a suitable input device.
3. You can type directly on your physical keyboard or on the virtual on-screen keyboard using a suitable input device such as mouse or touch screen device.
4. You can change the keyboard symbols by selecting another layout from the list box located above the virtual keyboard.
5. For every keyboard layout, more symbols can be displayed by pressing the CAPS Lock.
6. When you hover the mouse over an English phonetic button, a slick tooltip will show some example English words.
7. You can save typed phonetics as an MS-Word file by clicking the Save button, copy them to clipboard using the Copy button, or post them to Twitter, Facebook, etc. by clicking the desired button.”

