Corpus-linguistics | Thomas' Work Space

POS-Tagsets. A list.

2014/08/16 plagwitz Leave a comment

Categories: corpora, e-languages, mental-notes Tags: lists, nlp, pos-tagging

Learn and teach writing in your second language on Lang-8.com

2014/04/01 plagwitz Leave a comment

Improving language learning with technology for me seems to have 2 avenues: AI and human intelligence. Automated feedback on writing provided by proofing tools – even if they have become smarter and more contextual to spot (in MS-Word 2007 and up) common errors like your/you’re or their/there – makes one wonder about the feasibility of the former. But that automated essay-scoring tools which have been developed and deployed (at least for ESL) claim to score similarly as teachers makes one wonder about much more… Correcting writing remains expensive!

So may be we should look into crowd-sourced writing correction which needs no cutting edge NLP, only well-understood WWW-infrastructural technology to connect interested parties, but requires social engineering to attract and keep good contributors (and a viable business model to stay afloat: This site seems freemium).

Reading online comments and postings in your native language makes one wonder: can language teachers be replaced by crowdsourcing? I became aware of this the language learning website that offers peer correction of writing input by native-speaker through a language learner corpus. I have not thoroughly evaluated the site, but the fact that its data is being used by SLA researchers here (http://cl.naist.jp/nldata/lang-8/) seems a strong indicator that the work done on the website is of value.

To judge by the numbers accompanying the corpus (it is a snapshot from 2010, a newer version is available however on request), these are the most-represented L2 on lang-8.com:

Categories: audience-is-students, audience-is-teachers, corpora, e-languages, Writing Tags: automated-essay-scoring, lang-8.com, proofing-tools

Corpora, Treebanks, Word-Lists. A List.

2013/12/12 plagwitz Leave a comment

Categories: all-languages, corpora, Learning-logs, mental-notes, Spreadsheets, table-of-contents, Vocabulary Tags: lists, nlp, word-lists

How to workaround AntWordProfiler error “Cannot open the file”

2013/11/07 plagwitz Leave a comment

Seems a little bug in this otherwise great program. I started getting this on Windows 7 64-bit with
for all files, no matter which size.
It occurred to me to go to menu: Settings/ global settings / file settings / show full pathnames
Here is what you see: Note the duplicate path to the file.
How did I get there? Seems like you cannot take my usual preferred shortcut and paste the full file path into the browse dialogue.
If I browse to the file and select, the same botched up double path does not appear:
I can then process the file fine.

Categories: corpora, e-languages, English, Reading, service-is-learning-materials-creation, service-is-testing-troubleshooting-debugging, Vocabulary Tags: antwordprofiler, word-frequencies

Search Rhapsodie, a syntactic and prosodic Treebank of spoken French

2013/10/01 plagwitz Leave a comment

The Rhapsodie Treebank is made up of “57 short samples of spoken French (5 minutes long on average, amounting to 3 hours of speech and a 33 000 word corpus)” endowed with an orthographical phoneme-aligned transcription”.
Rhapsodie can be searched at http://www.projet-rhapsodie.fr/queryql.html:
View list, read (1) text or (2”phonetic transcription, click (3) and (4) to listen to found segment
You can also search for text and download:
The best is obviously the markup and query language – and hence has a learning curve.

Categories: corpora, French, Listening, Speaking, websites Tags: audio

WordSmith Tools are once again not licensed

2013/08/28 plagwitz Leave a comment

StartingWordsmith from the desktop shortcut results in this:
Didn’t I test and describe last summer how to fix this?

Categories: audience-is-IT-staff, Corpus-linguistics, e-languages, Glitches&Errors, Institution-is-University-of-North-Carolina-Charlotte, service-is-configuring-learning-tools, service-is-testing-troubleshooting-debugging, software, Translation Tags: image2013, wordsmith-tools

ELRA language corpora available in the LRC for research

2013/07/05 plagwitz Leave a comment

The LRC has availed itself of a free research distribution of 55GB collection of language corpora from http://www.elra.info/, the European Language Resources Association. This “big data” should be of interest for the translation program, as well as the language learning programs, since it enables corpus linguistic approaches to language learning and automated learning material production based on natural language processing.

Here is an overview of the materials included:

A list of files included can be found here:

Categories: corpora, learning-materials, Spreadsheets, Translation Tags: elda, elra, nlp

Voyant-tools.org

2013/03/22 plagwitz Leave a comment

Neat encounter at the ThatCamp2013 Digital Humanities Unconference at UNCC today. Certainly a simplification over Wordsmith tools. That’s all the reviewing I have time for right now. Smile

Categories: Corpus-linguistics, digital-humanities, Institution-is-University-of-North-Carolina-Charlotte, Translation, websites Tags: Voyant-tools.org

Older Entries

Thomas' Work Space

Archive

POS-Tagsets. A list.

Learn and teach writing in your second language on Lang-8.com

Corpora, Treebanks, Word-Lists. A List.

How to workaround AntWordProfiler error “Cannot open the file”

Search Rhapsodie, a syntactic and prosodic Treebank of spoken French

WordSmith Tools are once again not licensed

ELRA language corpora available in the LRC for research

Voyant-tools.org

Blog Stats

Thank you for your response. ✨

Top Posts & Pages

Top Clicks

Categories

Email Subscription

Archives

Top