Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloudSubmitted by bahar on Thu, 2016-01-07 16:09
The LODeXporter is a GATE component that allows to export NLP annotations directly to a triplestore, with configurable vocabularies, for use in LOD applications.
Rhetector is a GATE plugin for the automatic detection of Rhetorical Entities (REs) in scientific literature. Rhetorical Entities are spans of text (sentences, passages, sections, etc.) in a document, where authors convey their findings, like Claims or Arguments, to the readers. We designed a lightweight pipeline to automatically detect rhetorical entities in scientific literature, currently limited to Claims and Contributions. The motivation and application behind Rhetector is described in our publication, "Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud", PeerJ Computer Science, vol. 1, no. e37 PeerJ, 12/2015.
The LODtagger is a GATE component that provides linking entities from a document to their corresponding resource on the Linked Open Data (LOD) cloud. LODtagger relies on external tools to perform the actual content tagging and hides the complexity of communicating with LOD taggers, such as DBpedia Spotlight, from the perspective of pipeline developers.
We present an automatic workflow that performs text segmentation and entity extraction from scientific literature to primarily address Task 2 of the Semantic Publishing Challenge 2015. The proposed solution is composed of two subsystems: (i) A text mining pipeline, developed based on the GATE framework, which extracts structural and semantic entities, such as, authors' information and citations, from text and produces semantic (typed) annotations; and (ii) a flexible exporting module that translates the document annotations into RDF triples according to a custom mapping file.