Skip navigation.
Semantic Software Lab
Concordia University
Montréal, Canada

Automatic Construction of a Semantic Knowledge Base from CEUR Workshop Proceedings

Printer-friendly versionPrinter-friendly versionPDF versionPDF version
TitleAutomatic Construction of a Semantic Knowledge Base from CEUR Workshop Proceedings
Publication TypeConference Paper
Year of Publication2015
Refereed DesignationRefereed
AuthorsSateli, B., and R. Witte
Conference NameThe 12th Extended Semantic Web Conference (The Semantic Publishing Challenge 2015)
Tertiary TitleSemantic Web Evaluation Challenges: SemWebEval 2015 at ESWC 2015, Portorož, Slovenia, May 31 – June 4, 2015, Revised Selected Papers
Date Published06/2015
Conference LocationPortoroz, Slovenia
Type of WorkPaper
ISBN Number978-3-319-25518-7
KeywordsDigital Libraries, Knowledge Base, natural language processing, RDF, Scholarly Literature, Semantic Publishing, Semantic Web, text mining

We present an automatic workflow that performs text segmentation and entity extraction from scientific literature to primarily address Task 2 of the Semantic Publishing Challenge 2015. The goal of Task 2 is to extract various information from full-text papers to represent the context in which a document is written, such as the affiliation of its authors and the corresponding funding bodies. Our proposed solution is composed of two subsystems: (i) A text mining pipeline, developed based on the GATE framework, which extracts structural and semantic entities, such as authors' information and references, and produces semantic (typed) annotations; and (ii) a flexible exporting module, the LODeXporter, which translates the document annotations into RDF triples according to custom mapping rules. Additionally, we leverage existing Named Entity Recognition (NER) tools to extract named entities from text and ground them to their corresponding resources on the Linked Open Data cloud, thus, briefly covering Task 3 objectives, which involves linking of detected entities to resources in existing open datasets. The output of our system is an RDF graph stored in a scalable TDB-based storage with a public SPARQL endpoint for the task's queries.


Copyright © Springer International Publishing Switzerland 2015. This is the author's version of the work. It is posted here by permission of Springer for your personal use. Not for redistribution.

sempub2015_poster.pdf1.52 MB
sempub_challenge2015.pdf903.62 KB