Natural Language Processing for MediaWiki: First major release of the Semantic Assistants Wiki-NLP Integration
We are happy to announce the first major release of our Semantic Assistants Wiki-NLP integration. This is the first comprehensive open source solution for bringing Natural Language Processing (NLP) to wiki users, in particular for wikis based on the well-known MediaWiki engine and its Semantic MediaWiki (SMW) extension. It allows you to bring novel text mining assistants to wiki users, e.g., for automatically structuring wiki pages, answering questions in natural language, quality assurance, entity detection, summarization, among others, which are deployed in the General Architecture for Text Engineering (GATE) and brokered as web services through the Semantic Assistants server.
Last week I introduced the very first release of the OpenTrace tool at this year's WCRE conference in the lovely city of Kingston, Ontario. This 4-day event was the 19th conference on reverse engineering and hosted talks from research and industry on the state-of-the art techniques for program comprehension of software systems.
WikiSym is an international symposium on wikis and open collaborative techniques, mainly focused on wiki research and practice. Back in 2007, we coined the term "self-aware" wiki systems in our paper submitted to the WikiSym '07, fostering the idea that the integration of Natural Language Processing (NLP) techniques within wiki systems allows the wiki systems to read, understand, transform, and even write their own content, as well as supporting their users in information analysis and content development. Now after a few years, we have realized this idea through an open service-oriented architecture.
As part of the Semantic Assistants project, we developed the idea of a "self-aware" wiki system that can develop and organize its own content using state-of-art techniques from the Natural Language Processing (NLP) and Semantic Computing domains. This is achieved with our open source Wiki-NLP integration, a Semantic Assistants add-on that allows to incorporate NLP services into the MediaWiki environment, thereby enabling wiki users to benefit from modern text mining techniques.
Here, we want to exhibit how a seamless integration of NLP techniques into wiki systems helps to increase their acceptability and usability as a powerful, yet easy-to-use collaborative platform. We hope this will help you to identify new human-computer interaction patterns for other scenarios, allowing you to make the best possible use of this new technology.
We are happy to announce the first major public release of our protein mutation impact analysis system, Open Mutation Miner (OMM), together with a new open access publication: "Automated extraction and semantic analysis of mutation impacts from the biomedical literature", BMC Genomics, vol. 13, no. Suppl 4, pp. S10, 06/2012.
OMM is the first comprehensive, fully open source system for extracting and analysing mutation-related information from full-text research papers. Novel features not available in other systems include: the detection of various forms of mutation mentions, in particular mutation series, full mutation impact analysis, including linking impacts with the causative mutation and the affected protein properties, such as molecular functions, kinetic constants, kinetic values, units of measurements, and physical quantities. OMM provides output options in various formats, including populating an OWL ontology, Web service access, structured queries, and interactive use embedded in desktop clients. OMM is robust and scalable: we processed the entire PubMed Open Access Subset (nearly half a million full-text papers) on a standard desktop PC, and larger document sets can be easily processed and indexed on appropriate hardware.
OMM Query is our online search interface for an index for full-text research papers from the PMC Open Access Corpus (nearly half a million documents) that have been mined for mutation information with Open Mutation Miner (OMM) and OrganismTagger. It can be accessed using the Mímir query language, combining entity annotations with their features with plain text (see below for some examples).
Note that you can index your own set of documents through OMM and install a local query server, if you want to mine a different set of documents for mutation impact information: all software used in this process is freely available under open source licenses. Besides the web interface, it is also possible to query the Mímir server through a RESTful API.
The Semantic Computing course (SOEN 691B) is offered at Concordia University, providing graduate students with a unique opportunity to study research and development of novel semantic software systems. The course is taught by Prof. René Witte and supported by team members from the Semantic Software Lab. Students from other universities in Québec can register for this course through CREPUQ.
This course provide an introduction to selected topics from Semantic Computing, including text mining, tagging and tag analysis, recommender systems, RDF and linked data, semantic desktops and semantic wikis.
Natural Language Processing (NLP) for Software Engineering: Our Eclipse plug-in integrates the Eclipse development environment into the Semantic Assistants architecture. It provides a user interface for offering various Natural Language Processing services to users. In particular, when using Eclipse as a software development environment, you can now offer novel semantic analysis services, such as named entity detection or quality analysis of source code comments, to software developers.
Last week I presented our Semantic Assistants Eclipse plug-in at the IBM Conference of Advances studies in Markham, Ontario. CASCON is a conference hosted by IBM's Centre for Advanced Studies (CAS) in partnership with NSERC with the goal of showcasing various research projects in progress by individuals in academia, industry and the general public.
We just released a new version of the OwlExporter ontology population plugin for GATE. The OwlExporter PR can be added to any NLP pipeline to facilitate the population of an existing OWL ontology with entities detected in the corpus. It supports the population of separate NLP- and domain-ontologies and has support for some advanced features, like the export of coreference chains.
In this release, we included a pre-compiled binary and a complete example pipeline that transforms GATE's ANNIE information extraction example into an ontology population system. We also completely revamped the documentation and website to make it more accessible to ontology population novices.