- SSL for Students
- Tools & Resources
Tools & Resources
Our has been publishing a number of free/open source tools, components, frameworks, and resources for NLP and Semantic Computing.
Table of Contents
- 1. Frameworks and Architectures
- 2. Corpora
- 3. Text Mining Systems
- 4. NLP Components
- 5. Other Tools and Resources
- 6. Support
1. Frameworks and Architectures
1.1. Semantic Assistants
The Semantic Assistants architecture provides for easy integration of NLP into (desktop) clients using W3C Web Services and ontologies. The server-side part integrates the GATE framework and allows to publish any existing pipeline as an NLP Web service and a plugin for the OpenOffice.org word processor provides for executing semantic services.
2.1. Durm Corpus
A single book from a historic encyclopedia of architecture, written in German (in various formats).
3. Text Mining Systems
The Open Mutation Miner (OMM) system provides a number of advanced text mining components for mutation mining from full-text research papers, including the detection of various forms of mutation mentions, protein properties, organism detection, impact mentions, and the relations between them.
4. NLP Components
Rhetector is a GATE plugin for the automatic detection of Rhetorical Entities (REs) in scientific literature. Rhetorical Entities are spans of text (sentences, passages, sections, etc.) in a document, where authors convey their findings, like Claims or Arguments, to the readers.
The LODtagger is a GATE component that allows linking entities in a document to their corresponding resources on the Linked Open Data (LOD) cloud. LODtagger relies on external tools to perform the actual content tagging and hides the complexity of communicating with LOD taggers, such as DBpedia Spotlight, from the perspective of pipeline developers.
The OrganismTagger is a hybrid rule-based/machine-learning system that extracts organism mentions from the biomedical literature, normalizes them to their scientific name, and provides grounding to the NCBI Taxonomy database.
A GATE component for easy ontology population from text.
A GATE component that extract predicate-argument structures (subject, predicate, object triples) in a common format from the output of different parsers (RASP, Minipar, Stanford, SUPPLE).
The Durm self-learning, context-aware lemmatizer for German nouns.
Our reported speech tagger for English newspaper articles.
4.8. MuNPEx NP Chunker
The multi-lingual noun phrase (NP) chunker MuNPEx for GATE.
5. Other Tools and Resources
A doclet for Javadoc that allow to generate a corpus from Java source code optimized for NLP processing of source code comments.
For questions, comments, etc., please visit the Tools & Resources Forum.