Noun Phrase Chunking
Multi-Lingual Noun Phrase Extractor (MuNPEx) v1.0 for GATE released
The noun phrase chunker MuNPEx (Multi-Lingual Noun Phrase Extractor) is now available in the new and improved release v1.0. MuNPEx is a base NP chunker for the GATE framework and implemented in JAPE. It is fast, robust, customizable, well-tested and currently supports English, German, and French (with Spanish in beta).
Major changes in this release:
- Limited number of pre- and post-head modifiers to make MuNPEx more robust on certain kinds of input (like a long list of tags or menu entries when processing web pages)
- New optional grammars to add a HEAD_LEMMA slot to an NP annotation, with the lemma extracted from the GATE morphological analyser (for English), the Durm Lemmatizer (for German), or the TreeTagger (for German, Spanish, French)
- DET/MOD/HEAD/MOD2 slots are now stored as strings (rather than Content objects) to make them easier to export and compatible with the new Predicate-Argument Extractor (PAX) component
- other code cleanup and improvements
- no longer labeled as "beta" -- five years of testing ought to be enough, we're not Google ;-)
For more details and the download, please visit the MuNPEx page.
- Login to post comments
Multi-lingual Noun Phrase Extractor (MuNPEx)
The Multi-Lingual Noun Phrase Extractor (MuNPEx) is a fast, robust, customizable, and well-tested noun phrase (NP) chunker component developed for the GATE architecture, implemented in JAPE. It currently supports English, German, French, and Spanish (in beta). It provides detailed features for each NP annotation, with DET (determiner), MOD/MOD2 (pre/post-head modifiers), and HEAD noun slots, as well as (optional) text offset information.
MuNPEx requires a part-of-speech (POS) tagger to work and can additionally use detected named entities (NEs) to improve chunking performance. Please read the documentation (and source code) for more details.