…has been accepted for inclusion in the program for CALICO 2014, May 9-10 at Ohio University, (Athens, OH) and was presented on May 9: Here are abstract and slide deck:
The LRC has availed itself of a free research distribution of 55GB collection of language corpora from http://www.elra.info/, the European Language Resources Association. This “big data” should be of interest for the translation program, as well as the language learning programs, since it enables corpus linguistic approaches to language learning and automated learning material production based on natural language processing.
Here is an overview of the materials included:
A list of files included can be found here:
You need to know a little bit about “yours truly” to find this spam comment interesting (which, if you follow the links, eventually leads to a discount wrist watches site).
Covers a subset of the languages supported by the LRC. Based on Wordnet which is rather than a dictionary for human consumption, a machine-readable semantic network, but here is one of its machine-generated applications.
- Why I come to THATCamp Piedmont:
- I am looking for practitioners of NLP in a language and literature teaching context since I am working on Using NLP tools to automate production and correction of interactive learning material (presented at Calico 2012)
- for the Learning Exercise Creation Engines (presented at EUROCALL 2007) I developed.
- A little about myself:
- My Ph.D. thesis expanded the close reading of textual variants in the German editorial schools of Hans Zäch and the use of the computer-generated textual concordances in the interpretation and selection of textual variants into a corpus linguistic-inspired approach, that traced Leitmotifs in the work (partially first digitized by myself) of the foremost Swiss-German classic as a digital corpus using Regular Expressions programming.
- I have since applied my corpus linguistic approach to