Corpora
Mutation Impact Analysis System: Automated Extraction of Protein Mutation Impacts from the Biomedical Literature
Submitted by nona on Thu, 2012-07-05 08:29Automated extraction and semantic analysis of mutation impacts from the biomedical literature
Submitted by nona on Mon, 2012-06-18 07:40Automated Extraction of Protein Mutation Impacts from the Biomedical Literature
Submitted by nona on Sun, 2011-09-18 08:45New Javadoc Doclet for NLP Analysis on Java Source Code
For those interested in performing NLP on source code, in particular Javadoc comments, we just released a Doclet at the NLP Frameworks workshop last week.
Its main feature is that it creates an XML corpus from Java source code that is optimised for processing in an NLP Framework (GATE in our case, but it should work for any framework that takes XML as input).
- Login to post comments
The Javadoc NLP Corpus Generation Doclet
This page describes the process of generating a corpus from source code and source code comments using Javadoc. The SSLDoclet is a custom doclet that is passed as a parameter to Javadoc in order to create an Abstract Syntax Tree (AST) that can be used as a corpus within NLP frameworks such as GATE.
Durm XML Markup
The formal DTD used within the Durm Corpus is available for download. Here, we briefly describe the meaning of the various elements.
Durm TUSTEP Markup
Tustep in general is documented at http://www.zdv.uni-tuebingen.de/tustep/tustep_eng.html. Here, we only provide an informal overview for users of the TUSTEP version of our Durm Corpus.
The Durm Corpus
As part of the Durm project, we digitized a single volume from the historical German Handbuch der Architektur (Handbook on Architecture), namely:
E. Marx: Wände und Wandöffnungen (Walls and Wall Openings). In "Handbuch der Architektur", Part III, Volume 2, Number I, Second edition, Stuttgart, Germany, 1900.
Contains 506 pages with 956 figures.
The corpus developed in this project is made available under a free document license in several formats: scanned page images, Tustep format, and XML format. Additionally, an online version and tools for transforming the various formats are available as well.