Archive

Posts Tagged ‘DkPro’

Paper @ CALICO 2014: Using NLP Platforms for Language Learning Material Production

…has been accepted for inclusion in the program for CALICO 2014, May 9-10 at Ohio University, (Athens, OH) and was presented on May 9: Here are abstract and slide deck:

Installing dkPro in 2014…

  • … proved easier than 2012  (thanks, Richard Smile), but still not for the faint of heart…
  • I got it to work maven-download-sources, despite an update release of ver 1.6 – once again like in 2012 – in the middle of my installation travails.
  • Read all about those in here.

My DkPro settings.xml

<?xml version="1.0" encoding="utf-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0                        http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <profiles>
    <profile>
      <id>ukp-oss-releases</id>
      <repositories>
        <repository>
          <id>ukp-oss-releases</id>
          <url>http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-releases</url>
          <releases>
            <enabled>true</enabled>
            <updatePolicy>never</updatePolicy>
            <checksumPolicy>warn</checksumPolicy>
          </releases>
          <snapshots>
            <enabled>false</enabled>
          </snapshots>
        </repository>
      </repositories>
      <pluginRepositories>
        <pluginRepository>
          <id>ukp-oss-releases</id>
          <url>http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-releases</url>
          <releases>
            <enabled>true</enabled>
            <updatePolicy>never</updatePolicy>
            <checksumPolicy>warn</checksumPolicy>
          </releases>
          <snapshots>
            <enabled>false</enabled>
          </snapshots>
        </pluginRepository>
      </pluginRepositories>
    </profile>
    <profile>
      <id>ukp-oss-snapshots</id>
      <repositories>
        <repository>
          <id>ukp-oss-snapshots</id>
          <url>http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-snapshots</url>
          <releases>
            <enabled>false</enabled>
          </releases>
          <snapshots>
            <enabled>true</enabled>
          </snapshots>
        </repository>
      </repositories>
    </profile>
  </profiles>
  <activeProfiles>
    <activeProfile>ukp-oss-releases</activeProfile>
    <!-- voriges profile darf nicht auskommentiert werden -->
    <!-- Uncomment the following entry if you need SNAPSHOT versions. -->
    <activeProfile>ukp-oss-snapshots</activeProfile>
  </activeProfiles>
</settings>

trp-learning-materials-starting-pom.xml

<?xml version="1.0" encoding="utf-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>de.tudarmstadt.ukp.experiments.trp</groupId>
  <artifactId>de.tudarmstadt.ukp.experiments.trp.learning-materials</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <parent>
    <artifactId>dkpro-parent-pom</artifactId>
    <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
    <version>2</version>
  </parent>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.10</version>
      <type>jar</type>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
      <!-- not de.tudarmstadt.ukp.dkpro.core.io -->
      <artifactId>
 				de.tudarmstadt.ukp.dkpro.core.io.text-asl
 			</artifactId>
      <!-- 			<version>1.3.0</version> -->
      <type>jar</type>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
      <artifactId>de.tudarmstadt.ukp.dkpro.core.tokit-asl </artifactId>
      <!--  <version>1.4.0-SNAPSHOT</version> -->
      <type>jar</type>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
      <artifactId>de.tudarmstadt.ukp.dkpro.core.opennlp-asl</artifactId>
      <type>jar</type>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <!-- 
Add de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl to the dependencies -->
      <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
      <artifactId>de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl</artifactId>
      <!-- tba: type pom and scope  -->
    </dependency>
    <dependency>
      <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
      <artifactId>
 				de.tudarmstadt.ukp.dkpro.core.opennlp-model-tagger-en-maxent
 			</artifactId>
      <version>1.5</version>
    </dependency>
  </dependencies>
  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
        <artifactId>de.tudarmstadt.ukp.dkpro.core-asl</artifactId>
        <version>1.4.0-SNAPSHOT</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
      <!-- 
Add de.tudarmstadt.ukp.dkpro.core-asl with type pom and scope import to dependency management-->
      <dependency>
        <!-- 
Add de.tudarmstadt.ukp.dkpro.core-gpl with type pom and scope import to dependency management-->
        <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
        <artifactId>de.tudarmstadt.ukp.dkpro.core-gpl</artifactId>
        <version>1.4.0-SNAPSHOT</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>
</project>

Java IDE for NLP with DkPro – A running log.

2012/05/19 2 comments
        1. UPDATE: dkPro has been updated, see the comment below by the dkPro Project Lead.
        2. MyLyn Web Connector 3.8 for Eclipse Indigoeclipse-mylyn-webconnector-dkpro
      1. I next got an error (“Cannot complete the install because one or more required items could not be found.
        Software being installed: Mylyn Incubator SDK (Incubation) 3.8.0.I20120414-0402 (org.eclipse.mylyn.experimental_sdk_feature.feature.group 3.8.0.I20120414-0402)
        Missing requirement: Mylyn Tasks Connector: Web Templates (Advanced) (Incubation) 3.8.0.I20120414-0402 (org.eclipse.mylyn.web.tasks_feature.feature.group 3.8.0.I20120414-0402) requires ‘org.eclipse.mylyn_feature.feature.group [3.8.0,4.0.0)’ but it could not be found
        Cannot satisfy dependency:
        From: Mylyn Incubator SDK (Incubation) 3.8.0.I20120414-0402 (org.eclipse.mylyn.experimental_sdk_feature.feature.group 3.8.0.I20120414-0402)
        To: org.eclipse.mylyn.web.tasks_feature.feature.group [3.8.0,4.0.0)”), but starting over and updating my MyLyn installations form menu: Help / Install Updates fixed that.
      2. Show Task repositories window: image
      3. Error:  “Query Synchronization Failed _______ Q Failed to parse RSS feed: “Invalid XML: Error on line 114: The element type “meta” must be terminated by the matching end-tag “«meta>”,””
      4. . Well, the Google Code integration is anyways only for users that cannot run Maven. Maybe I can
          • [window /]Preferences -> Maven -> Discovery -> Open Catalog
          • search for “subclipse”
          • clip_image002

         

      5. Install Fails also (“Missing requirement”, again: this time it is “org.sonatype.m2e.subclipse.feature.feature.group 0.13.0.201107071330”), and here we are up the creek with no paddle: You do not want to read a thread on the developer site that ends in ”this must be a bad joke”.
      6. Then there are the heroes (as opposed to process) who make it work nevertheless: To bring back back the SVN SCM handler, extract this to your Eclipse dropins folder.
      7. eclipse-m2e-subclipse-dkpro
      8. Unless of course you suffer from extremely bad timing:
      9. Now to the real getting started
          1. Wait: First Programming Steps with DKPro Core: This page is currently outdated. We are working on a new DKPro Core release which makes several steps of this tutorial obsolete and changes others (Updated May 8, 2012)”. Can referring to the help provided on the mailing list may bridge that gap for you? Or may this “Setting up Maven and Eclipse for DKPro Core development (Updated May 10, 2012)” currently be the best instruction?
          2. “Go to the Package Explorer in Eclipse [Window->Show View->Other…->Java->Package Explorer] and create new a Maven Project”:
          3. imageimage
          4.  imageimage
          5. Other potential sources of confusion:
          6. settings.xml: There are 2, one in your Maven install directory and one in your .m2e directory – it seems the latter which counts
          7. my .m2e recursed (think ~/.m2e/.m2e) – did I cause this when trying to change its location (which supposedly you can)?
          8. file: nexus-maven-repository-index, in various forms of compression: What is this, and what prevents it from getting downloaded?
          9. maven repositories:
          10. the expansion option for the ukp-oss-releases comes and goes. if I right-click / rebuild index, I even get an error “Unable to update index for ukp-oss-releases”, but afterwards, the expansion option reappears.
          11. You are provided a settings.xml for Maven (m2eclipse) that points to the dkPro online Maven repository.
            1. Which looks like it needs an update to include a pluginrepository for snapshots.
            2. Check you are loading it alright by going to Menu: Window / Preferences / Maven / User Settings: image
              1. You are advised “to check if your Maven and Eclipse are configured correctly, try opening the “Maven Repositories” view in Eclipse, open “Global Repositories” and check if there is a “ukp-oss” folder in it with contents”, like so: image, or else fix your /m2e/settings.xml or Eclipse:
              2. Show Maven Repositories View by going to Menu:Windows/ Show View / Maven Repository: image
              3. Like so: image
              4. You get an overview of updating
            3. image
            4. and finally: image
            5. TBA: what causes the central maven repository to not get resolved?
          12. Build your own project, with guidance from a variety of documents (some need updating) and mailing lists
            1. My attempts to “browse” for the parent when creating my own project have remained unsuccessful: image
            2. I could however use as a model an existing POM.xml that loads a parent: image
            3. Which seems to work, at least if you click “open parent pom”,image
            4. it connects you to the dependency: image
            5. Afterwards, the search feature started working when selecting dependencies: image
            6. For the DkPro version updated 05-28, I also could not browse for parents or dependencies from m2eclipse, but needed to first manually add to my pom.xml (the syntax of which is explained here and here)
              1. the <parent> entry
                image
              2. Managing dependencies from within m2eclipse started to work for me only once I had added manually the <dependencyManagement> entries. This allows for autodiscovery of the snapshot-version (1.4.0 currently) version, whether you add a <dependency> into the pom.xml without <version>, like here: image, or browse for and select the latest released version (1.3.0 – I cannot browse for snapshots). like here:
            7. While I still can only browse for release versions (1.3 currently), the <dependencymanagement> updates
            8. Reminder: Given the current transitional status of DkPro, you need to first enable snapshots like in my settings.xml.
            9. HINTS
              1. You cannot “remove” through the gui-button a dependency that you erroneously added as empty. image Open the pom.xml with a text/xml editor and remove it there, then have the GUI reload the pom. image
              2. See here for some tools that helped me debug my project setup in Eclipse.
            10.   If you cannot use the built-in javadoc help for stanford-corenlp, and/or, when trying to set  up, get “Can’t download JavaDoc for edu.stanford.nlp:stanford-corenlp:1.3.2:javadoc”, this is a known issue, seems to have no resolution currently, but may have one in the future. imageWorkaround: browse the source elsewhere…

Using home-brew NLP regular expressions to automate question generation for learning material creation

    1. The trpQuizGenerator, from which the screenshots below are taken,
      1. is an attempt to facilitate, speed up, automate question generation for foreign language learning by collecting a regular expressions, reflecting typical patterns that cause difficulties for language learners in a number of L2 – inspired by common 1st and 2nd year textbooks:  trpQuizGenerator-NLP-samples-German-Italian-Spanish
      2. German: differentiation between Dative and Accusative case personal pronouns
      3. Italian: contraction of article and preposition
      4. Spanish: demonstrative pronouns.
      5. Some more rather arbitrary, but easily implemementable examples for ESL:
        1. Numbers: ―Much/many‖ dichotomy
        2. which/who‖ relative pronoun dichotomy: Difficult for German students of English which has no such
          dichotomy for innate beings/things, but whose (antiquated) relative pronoun ―Welch‖ as a false friend
          of which tends to lead to a wrong preference of ―which‖ to ―who‖ by German speakers.
        3. Sub clauses/tenses: if clauses up to period/comma, giving the number of words as hints. Would
          require a delegate.
      6. Regular expressions in .Net have a number of advanced features that makes the platform a good choice for this enterprise: trpQuizGenerator-NLP-sample-German-personalpronomen-akk-dat
      7. The resulting texts can be e.g. easily delivered as formative assessment exercises  to students using trpQuiz.dot.
    2. Update:
      1. In a  much more recent approach to the same automation problem, I am trying to repurpose well-established existing NLP-platforms for question generation.
      2. However, compared with the above customized approach, to transform the built-in, not SLA-specific NLP recognition I have found so far taking not only much more work for reformatting for delivery, but also more creativity, or willingness to put up with limitations when it comes to homing in on typical learner problems.