Skip navigation.
Semantic Software Lab
Concordia University
Montréal, Canada

Text Mining

LODeXporter: Transforming GATE Annotations to LOD Triples

The LODeXporter is a GATE component that allows to export NLP annotations directly to a triplestore, with configurable vocabularies, for use in LOD applications.

Rhetector: Automatic Dection of Rhetorical Entities in Scientific Literature

Rhetector is a GATE plugin for the automatic detection of Rhetorical Entities (REs) in scientific literature. Rhetorical Entities are spans of text (sentences, passages, sections, etc.) in a document, where authors convey their findings, like Claims or Arguments, to the readers. We designed a lightweight pipeline to automatically detect rhetorical entities in scientific literature, currently limited to Claims and Contributions. The motivation and application behind Rhetector is described in our publication, Sumner, T. (Eds.), Sateli, B., and R. Witte, "Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud", PeerJ Computer Science, vol. 1, no. e37 PeerJ, 12/2015.

The GATE LODtagger component

The LODtagger is a GATE component that provides linking entities from a document to their corresponding resource on the Linked Open Data (LOD) cloud. LODtagger relies on external tools to perform the actual content tagging and hides the complexity of communicating with LOD taggers, such as DBpedia Spotlight, from the perspective of pipeline developers.

Semantic Publishing Challenge 2015: Supplementary Material

This page provides supplementary material for our submission to the Semantic Publishing Challenge 2015 co-located with the Extended Semantic Web Conference (ESWC 2015).

We present an automatic workflow that performs text segmentation and entity extraction from scientific literature to primarily address Task 2 of the Semantic Publishing Challenge 2015. The proposed solution is composed of two subsystems: (i) A text mining pipeline, developed based on the GATE framework, which extracts structural and semantic entities, such as, authors' information and citations, from text and produces semantic (typed) annotations; and (ii) a flexible exporting module that translates the document annotations into RDF triples according to a custom mapping file.

Syndicate content