Archive

Posts Tagged ‘news’

Scraping RSS of online actualités for language learning materials production

  1. The capability of RSS-news feed integration of foreign language news may be standard now in most LMS, but was not in 2002 (not even having an LMS was standard, I had to build my own while it took the university a few more years to adopt Blackboard as I had recommended in 2000): cc-calico-news-glossing.2
  2. But RSS-feed display is skin-deep and, even in extensive-reading pedagogies, not sufficient for integration into teaching and learning which requires more post-processing.
  3. At a recent Digital Humanities Unconference, I was asked how I had “scraped” (RSS-scraping was chosen since it easier than screen scraping,  for RSS is devoid of most markup, as long as it validates) into a SQL-server database. Here are some code-snippets to get you
    1. from the web glossing-rss-news--vs.net-c#
    2. into the database: sql-portal-csvs-codecc-ms-sql-server2cc-ms-sql-server3
    3. The scraped plain text in the database can form the foundation for post-processing for SLA-purposes, see e.g.  glossing for reading comprehension facilitation or question generation with the trpQuizConverter for

How to use archive.org’s US-English news collection as a language learning corpus with QUIK-like speaking samples

  1. Much of TV news nowadays seems to amount to not much more than a constant stream of sound bites  – however, exactly this brevity,
  2. the large archive and simple search interface: image
  3. the research/browsing capabilities visible on the left here, including the varied sources – of which Arabic and French and other European TV likely provide a somewhat different perspectives on Edward Snowden –
  4. image
  5. and the caption-like transcription, make it all the more accessible for intermediate learners of English.
  6. image
  7. video clips of only 30 seconds length is hardly enough for instruction, however, you can have students work with corpus-QUIK-like spoken samples, and have them string a news history together if you design webquest-like research assignments – with the major added benefits, that this corpus is spoken and trains listening.
  8. For more background info on archive.org’s transcribed TV news, consult this NYTimes article.