Multi-lingual Noun Phrase Extractor (MuNPEx)
1. Overview
The Multi-Lingual Noun Phrase Extractor (MuNPEx) is a noun phrase (NP) chunker component developed for the GATE architecture, implemented in JAPE. It is fast, robust, customizable, well-tested and currently supports English, German, and French (with Spanish in beta).
MuNPEx requires a part-of-speech (POS) tagger to work and can additionally use detected named entities (NEs) to improve chunking performance. For English, MuNPEx works with the Hepple tagger that comes as part of the ANNIE system in GATE. French, German, and Spanish are based on the TreeTagger. Please read the included documentation (or source code) for more details. Also note that you will need at least GATE version 5.x to run MuNPEx, as it makes use of the new Kleene operators for JAPE.
MuNPEx in GATE Developer: Screenshot of MuNPEx-1.0 running on a German text
2. Download
Latest version is v1.0 from 16.08.2010:
- download the gzipped tar archive, MuNPEx-1.0.tgz
this file includes documentation, the main and supplementary JAPE grammars, and some additional information.
3. Version history
- 1.0: 16.08.2010. More robust on malformed input. Optional grammars for adding a HEAD_LEMMA slot. DET/MOD/HEAD/MOD2 now stored as strings instead of Content objects. Code cleanup and tweaks.
- 0.2: 03.03.2006. Preliminary Spanish support added. Renamed from "NPE" to "MuNPEx". Small cleanups. Number transducer added.
- 0.1: 21.11.2005. Initial public release.
4. License
MuNPEx is distributed as free/open source software under the GNU GPL license v2.
5. Feedback
For questions, comments, etc., please use the Forum.
6. Acknowledgements
Thanks to Michelle Khalifé for helping out in developing the French version.


