Scraping RSS of online actualités for language learning materials production

  1. The capability of RSS-news feed integration of foreign language news may be standard now in most LMS, but was not in 2002 (not even having an LMS was standard, I had to build my own while it took the university a few more years to adopt Blackboard as I had recommended in 2000): cc-calico-news-glossing.2
  2. But RSS-feed display is skin-deep and, even in extensive-reading pedagogies, not sufficient for integration into teaching and learning which requires more post-processing.
  3. At a recent Digital Humanities Unconference, I was asked how I had “scraped” (RSS-scraping was chosen since it easier than screen scraping,  for RSS is devoid of most markup, as long as it validates) into a SQL-server database. Here are some code-snippets to get you
    1. from the web
    2. into the database: sql-portal-csvs-codecc-ms-sql-server2cc-ms-sql-server3
    3. The scraped plain text in the database can form the foundation for post-processing for SLA-purposes, see e.g.  glossing for reading comprehension facilitation or question generation with the trpQuizConverter for

Language Lab Web Portal, University of Michigan – Dearborn

For lack of even an LMS – which in post-secondary language lab environments in the US in the “noughties” commonly has had to double as CMS and Groupware -, the lab web portal in the post title had to fulfill many functions.

While the technically most advanced features probably was full text search against both database and file system (uploaded documents) – which I could relatively easily implement thanks to MS-SQL-Server and a limited number of database tables –, I liked best the collaborative building of a bank of language learning exercises using authentic materials, i.e. interactive websites from the target culture.

A few sample illustrations of the use in both language lab and affiliated computerized classrooms you can see here:

The list below links to a series screencasts of the Language Lab Web Portal that I made for training and demonstration purposes. They show the language lab web portal software in action:

