second-language-acquisition | Thomas' Work Space

Phonetic transcription websites

2012/07/03 plagwitz Leave a comment

Computerized Language resource centers are supposed to work wonders improving SLA students’ pronunciation: Can’t computers analyze and visualize sound for us?

However, it turned out there seems to be a considerable “impedance mismatch” not only between computers analyzing and understanding the signal, but also between a computer voice graph and the capability of a language learner to process and improve pronunciation on the basis of it.

Voice graphs may have some use for tonal languages. But can you even tell from a voice graph of a letter which sound is being produced?

Enter the traditional phonetic transcription that pre-computerized language learners remember from their paper dictionaries (provided you can teach your language learners phonetic symbol sets like the IPA). Not only are good online dictionaries perfectible capable of displaying phonetic symbol sets on the web (it’s all in Unicode nowadays).

There are now experimental programs that can automate the transcription of text into phonetic symbol sets for e.g. English, Portuguese or Spanish. The more advanced ones also come with text-to-speech.

You can provide your students with audio (or, text-to-speech capability provided) or text models and have them study the phonetic transcription, listen to the audio, and record their model imitation in the LRC. Maybe you will find that practice with recording and a phonetic transcription of the recorded text is more useful for your students’ pronunciation practice than a fancy voice graph.

Categories: audience-is-students, audience-is-teachers, Dictionaries, Speaking Tags: phonetics, phonetizing

How the LRC supports Second Language Acquisition (all 4 skills) and testing using computers, and provides requisite documentation and training

2012/06/28 plagwitz Leave a comment

Table of contents for 2 screencasts of a presentation, left screen slides/no audio, right screen/speaker audio – best viewed side-by-side.

Time in LRC-report-speaker	Time in LRC-report-slides	Topic	Subtopic
	0:00	Overview of LRC activities
0:00	0:40	SLA reading
0:02	1:10	SLA writing
1:00			high-stakes quiz screencast: http://goo.gl/AaGrK
3:40			Movie caption exercise generation using NLP
5:45	2:35	SLA listening	Text-to-speech Deskbot
7:15	4:00		example of time-stretched audio
10:00	10:10	SLA speaking	Moodle Kaltura for webcam recordings homework assignments
12:30			Sanako oral exams
	15:00		Example of oral exam material
16:40	15:45	Classroom management systems
27:15			Outlook: LRC as proficiency assessment/testing center, outreach/service to high schools
	16:40		Example of oral proficiency exam
28:30			Needed additions: video streaming to students, video recordings from students
30:10		Question period
30:10			LRC media repositories
33:30		Infrastructure work:	Year1:Ghost+imaging
33:35			Year2:LRC calendars (room reservation, equipment circulation, staff timetabling)
34:25			Outlook: things that need to be fixed in LRC calendars
39:25
39:45	19:45	LRC Blog
39:45			Querying tags and categories
45:00			tags, categories, RSS feeds displayed in internet explorer tag display,
55:20			Using tags/categories searches of the LRC blog in training teachers and students
57:25		Q:TOEFL, AP exams and other oral proficiency assessment –
58:45		Webcape placement exams and other written exam in the LRC
59:30		Q:Concurrent exam scheduling	Sanako has no scheduling system to allow a limited number of users to take an exam simultaneously (but it prevents users beyond the licensing seats to use the Sanako, including for exams), Scheduling plug-ins seem to be available for Moodle.
61:40			Outlook: Need more licenses for the Sanako to match the UNCC class size

Using NLP tools to automate production and correction of interactive learning materials for blended learning templates in the Language Resource Center. Presentation Calico 2012, Notre Dame University

2012/06/13 plagwitz Leave a comment

View screencast here.

Categories: audience-is-teachers, classroom-management-system, conferences, digital-audio-lab, e-languages, learning-materials, office-software, Screencasts, service-is-learning-materials-creation, Writing Tags: 2012, calico, nlp

Setting up European Union translation memories and document corpora for SDL-Trados

2012/05/10 plagwitz Leave a comment

SDL-Trados installation allows the translation program to teach this industry-standard computer-aided translation application . So far, however, we had no actually translation memory loaded into this translation software.

The European Union is a powerhouse for translation and interpreting – at least for the wide range of their member languages many of which are world languages – , and makes some of their resources – which have been set up for translation and interpreting study use here before – available to the community free of charge as reported during a variety of LREC’s.

This spring, the Language Technology Group at the Joint Research Centre of the European Union this spring updated their translation memory offer DTG-TM can fill that void at least for the European Languages that have a translation component at UNC-Charlotte.

We download on demand (too big to store: http://langtech.jrc.ec.europa.eu/DGT-TM.html#Download)
1. Is the DGT-TM 2011 truly a superset of the 2007, or should both be merged? probably too much work?
and extract only the language pairs with English and the language only the languages “1”ed here : “G:\myfiles\doc\education\humanities\computer_linguistics\corpus\texts\multi\DGT-tm\DGT-tm_statistics.xlsx” (using “G:\myfiles\doc\education\humanities\computer_linguistics\corpus\texts\multi\DGT-tm\TMXtract.exe”)
and convert
1. English is the source language by default, but should be the target language in our programs,
2. The TMX format this translation memory is distributed provided in, should be “upgradeable ” to the SDL Trados Studio 2011/2011 SP1 format in the Upgrade Translation Memories wizard”.,
  1. TBA:where is this component?
configure the Trados to load the translation memory
1. how much computing resources does this use up?
2. how do you load a tm?
3. can you load in demand instead of preload all?
Here are the statistics for the translation memories for “our” languages

uncc	Language	Language code	Number of units in DGT – release 2007	Number of units in DGT – release 2011
1	English	EN	2187504	2286514
1	German	DE	532668	1922568
1	Greek	EL	371039	1901490
1	Spanish	ES	509054	1907649
1	French	FR	1106442	1853773
1	Italian	IT	542873	1926532
1	Polish	PL	1052136	1879469
1	Portuguese	PT	945203	1922585
Total	8	8	7246919	15600580

Would it be of interest to have the document-focused jrc-acquis distribution of the materials underlying the translation materials available on student/teachers TRADOS computers so that sample texts can be loaded for which reliable translation suggestions will be available – this is not certain for texts from all domains – and the use of a translation memory can be trained in under realistic conditions?

“The DGT Translation Memory is a collection of translation units, from which the full text cannot be reproduced. The JRC-Acquis is mostly a collection of full texts with additional information on which sentences are aligned with each other.”
It remains to be seen how easily one can transfer documents from this distribution into Trados to work with the translation memory
Here is where to download:

uncc	lang	inc
1	de	jrc-de.tgz
1	en	jrc-en.tgz
1	es	jrc-es.tgz
1	fr	jrc-fr.tgz
1	it	jrc-it.tgz
1	pl	jrc-pl.tgz
1	pt	jrc-pt.tgz

The JRC-Acquis comes with these statistics:

uncc	Language ISO code	Number of texts	Total No words	Total No characters	Average No words
1	de	23541	32059892	232748675	1361.87
1	en	23545	34588383	210692059	1469.03
1	es	23573	38926161	238016756	1651.3
1	fr	23627	39100499	234758290	1654.91
1	it	23472	35764670	230677013	1523.72
1	pl	23478	29713003	214464026	1265.57
1	pt	23505	37221668	227499418	1583.56
Total	7	164741	247374276	1588856237	10509.96

What other multi corpora are there (for other domains and other non-European languages)?

Categories: audience-is-language-learning-center-manager, Corpus-linguistics, documentation, English, French, German, Greek (modern), Italian, Polish, Portuguese, Spanish, Translation, translation-software Tags: DGT-TM, european-union, sdl-trados

Spring 2012 Faculty Workshop II: Oral Proficiency testing with Audacity/Sanako

2012/04/30 plagwitz Leave a comment

View screens (best viewed side by side, but note that left and right screen are not synchronized):
1. for full slide show (note the included short links for convenient further reading), left screen
2. for Sanako interface and full audio track, right screen.
Table of contents:
1. Overview of a Sanako Oral Exam
2. Examples of Exam teachers’ exam question recordings
3. Example of a Sanako Exam
4. Loop induction
5. Step-by-Step of administering a Sanako oral exam
6. Grading Sanako oral exam student files
  1. Sanako voice insert for
    1. facilitating recording oral assignments for student without hard-coded pauses
    2. commenting on student responses during grading
7. Sanako authoring tool for providing visual on top of aural cues to students

Corpus del Español Actual (CEA)

2012/04/26 plagwitz Leave a comment

Link:
Example of KWIC view result:

Based on Europarl, Wikicorpus (2006!), MultiUN. From their metadata page:

Metadata for Corpus del Español Actual
Corpus name	Corpus del Español Actual
CQPweb’s short handles for this corpus	cea / CEA
Total number of corpus texts	73,010
Total words in all corpus texts	539,367,886
Word types in the corpus	1,680,309
Type:token ratio	0 types per token
Text metadata and word-level annotation
The database stores the following information for each text in the corpus:	There is no text-level metadata for this corpus.
The primary classification of texts is based on:	A primary classification scheme for texts has not been set.
Words in this corpus are annotated with:	Lemma (Lemma)
	Part-Of-Speech (POS)
	WStart (WStart)
The primary tagging scheme is:	Part-Of-Speech
Further information about this corpus is available on the web at:	http://sfn.uab.es:9080/SFN/tools/cea/english

To use, “consult the IMS’s brief description of the regular-expression syntax used by the CQP and their list of sample queries. If you wish to define your query in terms of grammatical and inflectional categories, you can use the part-of-speech tags listed on the CEA’s Corpus Tags page.”
Also provides frequency data (based on word forms or lemmas, and others – up to a 1000):
Examples of a frequency query result (click for full-size image. Note that a lemmatized list was requested here which links all inflected forms back to the lemma, and vice versa, upon clicking the lemma, displays a KWIC view containing all forms subsumed under that lemma, see picture above):

Categories: Corpus-linguistics, Spanish, websites Tags: links

How a teacher can use Sanako voice insert to easily add spoken comments to students’ Sanako oral proficiency exams

2012/04/25 plagwitz Leave a comment

All other things equal (given a limited amount of time), teachers can provide more and better corrective feedback on student oral proficiency recordings if, during their grading, they could easily insert their own oral comments into the students’ recordings (delivered as MP3 files to teachers’ desktops after Sanako oral exams).
Both the Sanako Tutor and Student Player have a voice insert mode that is much easier and quicker to use than (albeit not free as) editing the student audio in Audacity (which we still recommend for bare-bone viewing/listening because of Audacity’s capability of loading and displaying multiple tracks simultaneously).
Fortunately, Sanako tutor/student player are available on the teacher/student station PCs in the LRC (the latter’s insert function is available when the PC connected to the running Sanako Tutor on the teacher station).
How easy and fast is it to use this? As you can see in this demo screencast on how to use Sanako voice insert to add spoken comments into your students’ Sanako oral exams, voice insert only requires:

a click on the voice insert button in the center, whenever a user wants to speak during listening,
and, from the top left menu, a “file”/ “save as” at the end.

In a next step – not only during the grading process –, how easy is it to distribute student recordings made with Sanako to students? That is TBA:a different story.

Categories: audience-is-teachers, digital-audio-lab, e-languages, grading, multimedia-recording, Presenter-Computer, Screencasts, service-is-assessing, Speaking, Student-Computers Tags: audacity, sanako-study-1200, student.exe, voice-insert

Protected: Sanako Study 1200 Final oral exam for advanced Business Spanish: A Job interview

2012/04/19 plagwitz Enter your password to view comments.

Newer Entries Older Entries

Thomas' Work Space

Archive

Phonetic transcription websites

How the LRC supports Second Language Acquisition (all 4 skills) and testing using computers, and provides requisite documentation and training

Using NLP tools to automate production and correction of interactive learning materials for blended learning templates in the Language Resource Center. Presentation Calico 2012, Notre Dame University

Setting up European Union translation memories and document corpora for SDL-Trados

Spring 2012 Faculty Workshop II: Oral Proficiency testing with Audacity/Sanako

Corpus del Español Actual (CEA)

How a teacher can use Sanako voice insert to easily add spoken comments to students’ Sanako oral proficiency exams

Protected: Sanako Study 1200 Final oral exam for advanced Business Spanish: A Job interview

Blog Stats

Thank you for your response. ✨

Top Posts & Pages

Top Clicks

Categories

Email Subscription

Archives

Top