Skip navigation.
Home
Semantic Software Lab
Concordia University
Montréal, Canada

Tools & Resources

The Durm Corpus

As part of the Durm project, we digitized a single volume from the historical German Handbuch der Architektur (Handbook on Architecture), namely:

Scanned Page Fragment from Handbuch der Architetur
E. Marx: Wände und Wandöffnungen (Walls and Wall Openings). In "Handbuch der Architektur", Part III, Volume 2, Number I, Second edition, Stuttgart, Germany, 1900.
Contains 506 pages with 956 figures.

The corpus developed in this project is made available under a free document license in several formats: scanned page images, Tustep format, and XML format. Additionally, an online version and tools for transforming the various formats are available as well.

Tools & Resources

Our lab published a number of free/open source tools, components, frameworks, and resources for NLP and Semantic Computing.

The Durm German Lemmatizer

The Durm German Lemmatization System consists of a number of GATE components and resources that perform morphological analysis and lemmatization for German nouns.

Multi-lingual Noun Phrase Extractor (MuNPEx)

The Multi-Lingual Noun Phrase Extractor (MuNPEx) is a fast, robust, customizable, and well-tested noun phrase (NP) chunker component developed for the GATE architecture, implemented in JAPE. It currently supports English, German, French, and Spanish (in beta). It provides detailed features for each NP annotation, with DET (determiner), MOD/MOD2 (pre/post-head modifiers), and HEAD noun slots, as well as (optional) text offset information.

MuNPEx requires a part-of-speech (POS) tagger to work and can additionally use detected named entities (NEs) to improve chunking performance. Please read the documentation (and source code) for more details.

Syndicate content