To provide inductive empirical examples, SLA classes have benefitted from query interfaces to target language text corpora in SLA. But corpora are usually POS-tagged – and queried – at best, which constitutes a certain “impedance mismatch” to what SLA classes actually teach. The Fangorn very large treebank query language beta demonstration page
looks already interesting for analyzing English in SLA (hover over tree elements to highlight the corresponding text), including, thanks to its capability of editing and refining queries graphically from the search results, for demonstrations during face-to-face classes. Wondering whether other corpora than Penn Treebank, Wikipedia (5k and 5000k sentences) will be made available online, and other languages but English will be supported.
Wish my Latin teacher at home would have had such a nice tool when he analyzed the “Ante mare et terras et quod tegit omnia caelum / unus erat toto naturae vultus in orbe / quem dixere chaos”, he had only me:
- Now how could such exercise creation made more automated by having it accept the output of NLP tools like Treetagger?