Skip navigation.
Home
Semantic Software Lab
Concordia University
Montréal, Canada

Text Mining

Semantic Computing Course

The Semantic Computing course (SOEN 691B) is offered at Concordia University, providing graduate students with a unique opportunity to study research and development of novel semantic software systems. The course is taught by Prof. René Witte and supported by team members from the Semantic Software Lab. Students from other universities in Québec can register for this course through CREPUQ.

This course provide an introduction to selected topics from Semantic Computing, including text mining, tagging and tag analysis, recommender systems, RDF and linked data, semantic desktops and semantic wikis.

Semantic Text Mining for Lignocellulose Research

Meurs, M. - J., C. Murphy, I. Morgenstern, N. Naderi, G. Butler, J. Powlowski, A. Tsang, and R. Witte, "Semantic Text Mining for Lignocellulose Research", The ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics in conjunction with CIKM, 10/2011.

OwlExporter v3.0 Released


We just released a new version of the OwlExporter ontology population plugin for GATE. The OwlExporter PR can be added to any NLP pipeline to facilitate the population of an existing OWL ontology with entities detected in the corpus. It supports the population of separate NLP- and domain-ontologies and has support for some advanced features, like the export of coreference chains.

In this release, we included a pre-compiled binary and a complete example pipeline that transforms GATE's ANNIE information extraction example into an ontology population system. We also completely revamped the documentation and website to make it more accessible to ontology population novices.

The Organism Tagger System


Our open source OrganismTagger is a hybrid rule-based/machine-learning system that extracts organism mentions from the biomedical literature, normalizes them to their scientific name, and provides grounding to the NCBI Taxonomy database. Our pipeline provides the flexibility of annotating the species of particular interest to bio-engineers on different corpora, by optionally including detection of common names, acronyms, and strains. The OrganismTagger performance has been evaluated on two manually annotated corpora, OT and Linneaus. On the OT corpus, the OrganismTagger achieves a precision and recall of 95% and 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linneaus-100, the results show a precision and recall of 99% and 97% and grounding with an accuracy of 97.4%. It is described in detail in our publication, Naderi, N., T. Kappler, C. J. O. Baker, and R. Witte, "OrganismTagger: Detection, normalization, and grounding of organism entities in biomedical documents", Bioinformatics: Oxford University Press, August 9, 2011.

{Text Mining: Wissensgewinnung aus natürlichsprachigen Dokumenten}

Witte, R., and J. Mülle (Eds.), , {Text Mining: Wissensgewinnung aus natürlichsprachigen Dokumenten}, : Universität Karlsruhe, Fakultät für Informatik, Institut für Programmstrukturen und Datenorganisation (IPD), 2006.

{Mutation Miner}

Baker, C. J. O., R. Witte, A. B. Gurpur, and V. Ryzhikov, "{Mutation Miner}", 5th International Conference of the Canadian Proteomics Initiative (CPI 2005), Toronto, Ontario, Canada, May 13–14, 2005.

{Mutation Miner}

Baker, C. J. O., and R. Witte, "{Mutation Miner}", 13th Annual International conference on Intelligent Systems for Molecular Biology (ISMB 2005), Detroit, Michigan, USA, June 25–29, 2005.
Syndicate content