Mutation Impact Analysis System: Automated Extraction of Protein Mutation Impacts from the Biomedical LiteratureSubmitted by nona on Thu, 2012-07-05 08:29
Its main feature is that it creates an XML corpus from Java source code that is optimised for processing in an NLP Framework (GATE in our case, but it should work for any framework that takes XML as input).
This page describes the process of generating a corpus from source code and source code comments using Javadoc. The SSLDoclet is a custom doclet that is passed as a parameter to Javadoc in order to create an Abstract Syntax Tree (AST) that can be used as a corpus within NLP frameworks such as GATE.
Tustep in general is documented at http://www.zdv.uni-tuebingen.de/tustep/tustep_eng.html. Here, we only provide an informal overview for users of the TUSTEP version of our Durm Corpus.
As part of the Durm project, we digitized a single volume from the historical German Handbuch der Architektur (Handbook on Architecture), namely:
E. Marx: Wände und Wandöffnungen (Walls and Wall Openings). In "Handbuch der Architektur", Part III, Volume 2, Number I, Second edition, Stuttgart, Germany, 1900.
Contains 506 pages with 956 figures.
The corpus developed in this project is made available under a free document license in several formats: scanned page images, Tustep format, and XML format. Additionally, an online version and tools for transforming the various formats are available as well.