Archive

Archive for the ‘translation-software’ Category

Skype video conference live machine translation –“way to go…”?

… as in “has a way to go”- there are many more such difficulties in natural language for machine translation.

These sample screenshots from a recent demo show a lot of them in a nutshell.

You can probably sense that something is wrong with this company representative’s smile, skype-machine-translation-recode1

even if you do not speak German. If you do: 

skype-machine-translation-recode2 makes sense NOT!

… as in “NOT!” Smiley (Don’t forget, though, that there is an initial speech recognition layer in this demo which seems to have become almost transparent as a technology now? See Gartner’s hype cycle of 2014.)

“To rely on raw MT output is almost as bad an idea as getting a full-body tattoo in a language you don’t speak.”

“Hanzi Smatter, a blog, received a picture of a biker who got a computer-translated “Ride Hard Die Free” tattooed in huge Chinese characters down his torso. The only problem was that he got “die” in the sense of a “tool used for stamping or shaping metal” permanently inked on his body, probably because nothing like “die free” was in the translator’s training texts. (It also translated “free” as “free of charge”.)” (from: Johnson: Rise of the machine translators, Economist Jun 4th 2014). However, “using MT, plus post-editing, has cut translation time by 40% for” DELL.  Good use of Machine Translation seems all about “blending” resources intelligently, while managing expectations – like eLearning. Like most things in life Smiley.

How to configure Java not to check for updates in the frozen computer lab

  1. Many applications – both web-based or standalone – in the LRC rely on Java. They currently all start Java with the autoupdater:
  2. image
  3. and may at least temporarily allow for Java versions not tested for compatibility with LRC applications
  4. which should not cause permanent problems, since the computers are frozen, but does cause client s unnecessary hassle and delays
  5. provided that
    1. the LRC applications have been tested to work with the reasonably recent version of Java in the LRC image
    2. and staying on this version for, say, a term, causes no overarching security concerns (if it does, the more recent Java version should be frozen into the underlying software image anyway, after testing for compatibility with  LRC applications).
  6. The answer how to shut out the autoupdater is likely in the Java control panel. This screenshot is from version 7.51 while we have 7.45, but likely similar image
  7. Registry keys note 32-bit and 64-bit)
    1. HKLM\SOFTWARE\JavaSoft\Java Update\Policy    EnableAutoUpdateCheck
    2. HKLM\SOFTWARE\Wow6432Node\JavaSoft\Java Update\Policy    EnableAutoUpdateCheck
    3. HKLM\SOFTWARE\JavaSoft\Java Update\Policy    EnableJavaUpdate
    4. HKLM\SOFTWARE\Wow6432Node\JavaSoft\Java Update\Policy    EnableJavaUpdate
  8. For programmatically configuring this, a quick web search finds this:
  9. deployment.expiration.check.enabled

    Boolean

    true

    Must be “true” to prompt users to update the JRE when an out-of-date JRE is found on their system. Set to “false” to suppress the prompt.

  10. This is a setting in here:  The deployment.config file is used for specifying the System-Level deployment.properties in the infrastructure. By default no deployment.config file exists; thus, no system-wide deployment.properties file exists. If deployment.config exists, it is located in one of the directories shown in the following table.
    Operating System:Windows
    Location
    • <Windows Directory>\Sun\Java\Deployment\deployment.config
    • ${deployment.java.home}\lib\deployment.config
  11. in addition, likely this should be included: “SomeKey=SomeValue, may be locked by including another key, SomeKey.locked … so that the user cannot change it”.
  12. Information is from http://docs.oracle.com/javase/7/docs/technotes/guides/jweb/index.html which may likely contain other information needed to configure JAVA in the LRC environment.

SDL-Trados 2011 configuration issues

  1. This is on the teacher PC during a demo  – student PCs may not display this behavior, but should be checked also):
    1. We do not need the demo on each login for each user (“do not show me this again”).
    2. We have a duplicate project?
    3. Screenshots:
    4. CAM04305
    5. CAM04304CAM04306

Protected: Windows 7 LRC image: The list

2013/06/21 Enter your password to view comments.

This content is password protected. To view it please enter your password below:

How to resolve “WinAlign export files cannot be migrated because your User ID has not been specified”

  1. Steps to replicate:
    1. in Winalign window:
      1. menu:file / export file pair
      2. dialogue: “export file pair to file”, choose an name and format: text,
      3. wait for “export”dialogue to disappear;
    2. in SDL Trados Studio 2009,
      1. left menu pane: “translation memories”, menu button”:”open translation memories”,dialogue:”menu button: “open document”, choose the source text document, in dialogue:”open document”, dropdown: “target language”,
      2. dropdown: “target language”,  select yours
      3. dropdown”: add”: “file-based translation memory”
      4. open dialogue:”file-based translation memory”, select exported text file;
    3. Error pops up:  image
  2. Resolution: follow the instructions in the error dialogue.
    1. Bonus points for knowing that “translator’s workbench” is a legacy application and accessible only through SDL Trados Studio 2007: open this from the start menu.
    2. In the left pane, click to open “Translator’s workbench”:  image
    3. Give your username: image
  3. Test: on your next try, you will get past the error into upgrade converting your winalign-exported text file into TMX:20130328_163935

Trados translation memory from TMX

    1. Open Trados studio professional 2011
  1. choose from the menu: file / new / translation memory
  2. in the create translation memory dialogue
    1. choose source and target language (the same as in the file name of the TMX files which follow the source-target naming convention)
    2. select “Allow multiple translation for the same source segment”
    3. choose as name for this translation memory the file name of the TMX files you are going to import (some include a year)
    4. click button: “create”,
    5. choose from the menu: file / import
    6. in the “import” dialogue window
      1. select “large import file”, “keep most recent”, “add to setup”
        1. click button “ok”
    7. and in the dialogue “open import file”
        1. select as file type TMX 1.4b, if this does not work, 1.4, if this does not work, 1.1
        2. browse to C:\Temp\Trados to select one file (after the other)

Setting up European Union translation memories and document corpora for SDL-Trados

  1. SDL-Trados installation allows the translation program to teach this industry-standard computer-aided translation application . So far, however, we had no actually translation memory loaded into this translation software.
  2. The European Union is a powerhouse for translation and interpreting – at least for the wide range of their member languages many of which are world languages – , and makes some of their resources – which have been set up for translation and interpreting study use here before – available to the community free of charge as reported during a variety of LREC’s.
    1. This spring, the Language Technology Group at the Joint Research Centre  of the European Union this spring updated their translation memory  offer DTG-TM can fill that void at least for the European Languages  that have a translation component at UNC-Charlotte.
      1. We download on demand (too big to store: http://langtech.jrc.ec.europa.eu/DGT-TM.html#Download)
        1. Is the DGT-TM 2011 truly a superset of the 2007, or should both be merged? probably too much work?
      2. and extract only the language pairs with English and the language only the languages “1”ed here : “G:\myfiles\doc\education\humanities\computer_linguistics\corpus\texts\multi\DGT-tm\DGT-tm_statistics.xlsx” (using “G:\myfiles\doc\education\humanities\computer_linguistics\corpus\texts\multi\DGT-tm\TMXtract.exe”)
      3. and convert
        1. English is the source language by default, but should be the target language in our programs,
        2. The TMX format this translation memory is distributed provided in, should be “upgradeable ” to the SDL Trados Studio 2011/2011 SP1 format in the Upgrade Translation Memories wizard”.,
          1. TBA:where is this component?
      4. configure the Trados to load the translation memory
        1. how much computing resources does this use up?
        2. how do you load a tm?
        3. can you load in demand instead of preload all?
      5. Here are the statistics for the translation memories for “our” languages
      6. uncc Language Language code Number of units in DGT – release 2007 Number of units in DGT – release 2011
        1 English EN 2187504 2286514
        1 German DE 532668 1922568
        1 Greek EL 371039 1901490
        1 Spanish ES 509054 1907649
        1 French FR 1106442 1853773
        1 Italian IT 542873 1926532
        1 Polish PL 1052136 1879469
        1 Portuguese PT 945203 1922585
        Total 8 8 7246919 15600580
    2. Would it be of interest to have the document-focused jrc-acquis distribution of the materials underlying the translation materials available on student/teachers TRADOS computers so that sample texts can be loaded  for which reliable translation suggestions will be available – this is not certain for texts from all domains – and the use of a translation memory can be trained in under realistic conditions?
      1. “The DGT Translation Memory is a collection of translation units, from which the full text cannot be reproduced. The JRC-Acquis is mostly a collection of full texts with additional information on which sentences are aligned with each other.”
      2. It remains to be seen how easily one can transfer documents from this distribution into Trados to work with the translation memory
      3.   Here is where to download:
      4. uncc

        lang

        inc

        1

        de

        jrc-de.tgz

        1

        en

        jrc-en.tgz

        1

        es

        jrc-es.tgz

        1

        fr

        jrc-fr.tgz

        1

        it

        jrc-it.tgz

        1

        pl

        jrc-pl.tgz

        1

        pt

        jrc-pt.tgz

      5. The JRC-Acquis comes with these statistics:
    3. uncc

      Language ISO code

      Number of texts

      Total No words

      Total No characters

      Average No words

      1

      de

      23541

      32059892

      232748675

      1361.87

      1

      en

      23545

      34588383

      210692059

      1469.03

      1

      es

      23573

      38926161

      238016756

      1651.3

      1

      fr

      23627

      39100499

      234758290

      1654.91

      1

      it

      23472

      35764670

      230677013

      1523.72

      1

      pl

      23478

      29713003

      214464026

      1265.57

      1

      pt

      23505

      37221668

      227499418

      1583.56

      Total

      7

      164741

      247374276

      1588856237

      10509.96

  3. What other multi corpora are there (for other domains and other non-European languages)?