Archive

Posts Tagged ‘european-union’

Setting up European Union translation memories and document corpora for SDL-Trados

  1. SDL-Trados installation allows the translation program to teach this industry-standard computer-aided translation application . So far, however, we had no actually translation memory loaded into this translation software.
  2. The European Union is a powerhouse for translation and interpreting – at least for the wide range of their member languages many of which are world languages – , and makes some of their resources – which have been set up for translation and interpreting study use here before – available to the community free of charge as reported during a variety of LREC’s.
    1. This spring, the Language Technology Group at the Joint Research Centre  of the European Union this spring updated their translation memory  offer DTG-TM can fill that void at least for the European Languages  that have a translation component at UNC-Charlotte.
      1. We download on demand (too big to store: http://langtech.jrc.ec.europa.eu/DGT-TM.html#Download)
        1. Is the DGT-TM 2011 truly a superset of the 2007, or should both be merged? probably too much work?
      2. and extract only the language pairs with English and the language only the languages “1”ed here : “G:\myfiles\doc\education\humanities\computer_linguistics\corpus\texts\multi\DGT-tm\DGT-tm_statistics.xlsx” (using “G:\myfiles\doc\education\humanities\computer_linguistics\corpus\texts\multi\DGT-tm\TMXtract.exe”)
      3. and convert
        1. English is the source language by default, but should be the target language in our programs,
        2. The TMX format this translation memory is distributed provided in, should be “upgradeable ” to the SDL Trados Studio 2011/2011 SP1 format in the Upgrade Translation Memories wizard”.,
          1. TBA:where is this component?
      4. configure the Trados to load the translation memory
        1. how much computing resources does this use up?
        2. how do you load a tm?
        3. can you load in demand instead of preload all?
      5. Here are the statistics for the translation memories for “our” languages
      6. uncc Language Language code Number of units in DGT – release 2007 Number of units in DGT – release 2011
        1 English EN 2187504 2286514
        1 German DE 532668 1922568
        1 Greek EL 371039 1901490
        1 Spanish ES 509054 1907649
        1 French FR 1106442 1853773
        1 Italian IT 542873 1926532
        1 Polish PL 1052136 1879469
        1 Portuguese PT 945203 1922585
        Total 8 8 7246919 15600580
    2. Would it be of interest to have the document-focused jrc-acquis distribution of the materials underlying the translation materials available on student/teachers TRADOS computers so that sample texts can be loaded  for which reliable translation suggestions will be available – this is not certain for texts from all domains – and the use of a translation memory can be trained in under realistic conditions?
      1. “The DGT Translation Memory is a collection of translation units, from which the full text cannot be reproduced. The JRC-Acquis is mostly a collection of full texts with additional information on which sentences are aligned with each other.”
      2. It remains to be seen how easily one can transfer documents from this distribution into Trados to work with the translation memory
      3.   Here is where to download:
      4. uncc

        lang

        inc

        1

        de

        jrc-de.tgz

        1

        en

        jrc-en.tgz

        1

        es

        jrc-es.tgz

        1

        fr

        jrc-fr.tgz

        1

        it

        jrc-it.tgz

        1

        pl

        jrc-pl.tgz

        1

        pt

        jrc-pt.tgz

      5. The JRC-Acquis comes with these statistics:
    3. uncc

      Language ISO code

      Number of texts

      Total No words

      Total No characters

      Average No words

      1

      de

      23541

      32059892

      232748675

      1361.87

      1

      en

      23545

      34588383

      210692059

      1469.03

      1

      es

      23573

      38926161

      238016756

      1651.3

      1

      fr

      23627

      39100499

      234758290

      1654.91

      1

      it

      23472

      35764670

      230677013

      1523.72

      1

      pl

      23478

      29713003

      214464026

      1265.57

      1

      pt

      23505

      37221668

      227499418

      1583.56

      Total

      7

      164741

      247374276

      1588856237

      10509.96

  3. What other multi corpora are there (for other domains and other non-European languages)?

B-languages for Relay interpreting in European Parliament Plenary Video (2009)

You can do relay interpreting from European parliament plenary videos by selecting one of the b-languages which the parliament interpreters provide.

The (3) video download control for videos older than 20080711 allows for the recording of only one language-track in the video. You can download, from a link emailed to you, either the a- (e.g. (1) Italian here) or one b-language (e.g. (2) German here), as you can see below:

Given that software tends to always get impoved, is is rather surprising that one does not seem to have a similar choice in the new video downloader – however, the improvement is just a bit hidden.

For Videos newer than 20080710, all language-tracks are automatically contained within the downloaded (how? see here) video file. To switch between a- and b-language or between b-languages, in Windows Media Player, go to menu (if the menu does not show, right-click left from the “Now playing”button””: file / play / audio and language tracks / [now choose your language].

E.g. if you do not want to listen to Ferrero-Waldner not speaking her native tongue, choose like pictured below:

And she does not really speak “Zulu” which seems to have been chosen by the European Parliament technicians as the designator of the original a-language, there being no such concept in windows media player. Çan’t have it all. Pretty close, though.

Passing around European Parliament Plenary Video Clips & Transcripts

  1. European parliament video clips are quite big and it would be easier not to have to pass them around. But how to communicate to somebody else which video clip to watch if the clip selected is not reflected in the browser address bar? The flash application unfortunately forces you to provide the “bibliographic” information in pieces (start url, date, possibly video format, debate title, speaker name). But in the end you get a direct link which you can pass on to save others from having to jump through the same hoops: If you just need the direct link, skip to step 7. Otherwise: Start with the calendar interface: http://www.europarl.europa.eu/wps-europarl-internet/frd/vod/research-by-date?language=en, find your (1) date, e.g. “Wednesday 14 January 2009”,
  2. The window with the recording of that date will come up; now you CAN (2) change the video format  – wmv (should work on most Windows PCs, free upgrade for MACs here:http://www.microsoft.com/windows/windowsmedia/player/wmcomponents.mspx) or mp4 -, an option that will show in the browser address bar. If you must, change format this first, as it seems to rewind the video to the beginning of the session.
  3. Click on your (4) speaker, e.g. “ 15:16:50 Benita Ferrero-Waldner 00:13:12 15:30:02”
  4. Instead of watching online (e.g. if you find the stream quality lacking), you can (5) download the video (in the format you have chosen, either wmv or mp4). UPDATE: The web site added a disclaimer that you have to 1.read, 2.check before you can 3. download, as illustrated below:
  5. Note: you can (6) change the b-language (for relay interpreting) when streaming. Plus, when you download the video, all the b-languages are downloaded together with the a-language. See here how to select the desired b-language when playing the downloaded file.
  6. Easier than providing all bibliographical information (calendar URL, date, debate and speaker) is the direct URL of the download clip. Right click on “Download this Speech”, select (7) “Copy shortcut” from the context menu. Then paste this, e.g. http://vod.europarl.europa.eu/nasvod01/vod0301/2009/wm/VODUnit_20090114_15165000_15300200.wmv or if you chose mp4 format: http://vod.europarl.europa.eu/nasvod02/vod0301/2009/isma/VODUnit_20090114_15165000_15300200.mp4, into the calendar event for the exam – completes your checklist for the exam, and at the beginning of the exam, you can download the link from here onto the students’ computer. Or, for assigning materials to students or passing them to external examiners, email this direct link.
  7. Unfortunately, it appears that the transcripts, unlike the audio channels, do not include the relay languages and have to be accessed from a different (calendar-)interface here: http://www.europarl.europa.eu/activities/plenary/cre/calendar.do?language=EN: “The verbatim report of proceedings of each sitting (often referred to by its French abbreviation, CRE) is published (Rule 173 of the Rules of Procedure) and contains the speeches made in plenary, in the original language.”

 

Appendix: The file size of these videos is about 10MB per minute. If you feel you need to save the videos locally, use an appropriate location (where you have sufficient space, the file will not be erased, only appropriate users have access – consider this before using a public network share, personal drive). Not really more “local” is saving the video clip on the http://hale-interpreting.groups.live.com Skydrive which can also hold clips larger than 50MB[ doubled to 100MB on June 20,2011] if you pre-process them like described in the zipping instruction.

Watch a 5-minute narrated video-clip that demonstrates the above steps.