GATE Components
Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles
Submitted by rene on Sat, 2010-07-31 14:01New GATE PR: The Predicate-Argument Extractor (PAX)
At the LREC workshop New Challenges for NLP Frameworks we released a new component for GATE: The Predicate-Argument Extractor (PAX).
- Login to post comments
The OwlExporter: Flexible Ontology Population from Text
This page describes the OwlExporter, an open source (AGPL3) component that facilitates populating an OWL Ontology from annotations created by an existing GATE application.
The Javadoc NLP Corpus Generation Doclet
This page describes the process of generating a corpus from source code and source code comments using Javadoc. The SSLDoclet is a custom doclet that is passed as a parameter to Javadoc in order to create an Abstract Syntax Tree (AST) that can be used as a corpus within NLP frameworks such as GATE.
The GATE Multi-Parser Predicate-Argument EXtractor Component (MultiPaX)
The GATE Multi-Parser Predicate-Argument EXtractor Component (MultiPaX) can extract predicate-argument structures (PAS) from the output of different parsers.
First Release of the Reported Speech Tagger
Coinciding with the presentation of our paper on Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles at LREC 2008, we are happy to announce the first public release of our free/open source Reported Speech Tagging Components.
Reported Speech Tagger
Reported speech in the form of direct and indirect reported speech is an important indicator of evidentiality in traditional newspaper texts, but also increasingly in the new media that rely heavily on citation and quotation of previous postings, as for instance in blogs or newsgroups. We developed an NLP component in form of a GATE resource that can automatically detect and tag reported speech constructs, in particular the source, reporting verb and content. This is intended as a first module for more sophisticated representation and reasoning with attributed information, such as belief reasoning based on nested belief structures.
The Durm German Lemmatizer
The Durm German Lemmatization System consists of a number of GATE components and resources that perform morphological analysis and lemmatization for German nouns.
Multi-lingual Noun Phrase Extractor (MuNPEx)
The Multi-Lingual Noun Phrase Extractor (MuNPEx) is a fast, robust, customizable, and well-tested noun phrase (NP) chunker component developed for the GATE architecture, implemented in JAPE. It currently supports English, German, French, and Spanish (in beta). It provides detailed features for each NP annotation, with DET (determiner), MOD/MOD2 (pre/post-head modifiers), and HEAD noun slots, as well as (optional) text offset information.
MuNPEx requires a part-of-speech (POS) tagger to work and can additionally use detected named entities (NEs) to improve chunking performance. Please read the documentation (and source code) for more details.