NLP Components
{Generating an NLP Corpus from Java Source Code: The SSL Javadoc Doclet}
Submitted by ninus on Wed, 2011-03-16 12:53Reported Speech Tagger
Reported speech in the form of direct and indirect reported speech is an important indicator of evidentiality in traditional newspaper texts, but also increasingly in the new media that rely heavily on citation and quotation of previous postings, as for instance in blogs or newsgroups. We developed an NLP component in form of a GATE resource that can automatically detect and tag reported speech constructs, in particular the source, reporting verb and content. This is intended as a first module for more sophisticated representation and reasoning with attributed information, such as belief reasoning based on nested belief structures.
Multi-lingual Noun Phrase Extractor (MuNPEx)
The Multi-Lingual Noun Phrase Extractor (MuNPEx) is a fast, robust, customizable, and well-tested noun phrase (NP) chunker component developed for the GATE architecture, implemented in JAPE. It currently supports English, German, French, and Spanish (in beta).
MuNPEx requires a part-of-speech (POS) tagger to work and can additionally use detected named entities (NEs) to improve chunking performance. Please read the documentation (or source code) for more details.


