Skip navigation.
Home
Semantic Software Lab
Concordia University
Montréal, Canada

Semantic Assistants for Wiki Systems

Printer-friendly versionPrinter-friendly versionPDF versionPDF version

1. Introduction

Semantic Assistants for wikis are our novel architecture for the integration of Natural Language Processing (NLP) capabilities into wiki systems, based on the Semantic Assistants framework. The vision is that of a new generation of wikis that can help developing their own primary content and organize their structure by using state-of-the-art technologies from the NLP and Semantic Computing domains. The motivation for this integration is to enable wiki users – novice or expert – to benefit from modern text mining techniques directly within their wiki environment. You can find additional background and details in our publication [1], Sateli, B., and R. Witte, "Natural Language Processing for MediaWiki: The Semantic Assistants Approach", The 8th International Symposium on Wikis and Open Collaboration (WikiSym 2012), Linz, Austria : ACM, 08/2012. The full details of this work are described in [2].

For a number of real-world application examples have a look at our Wiki-NLP integration showcase. A hands-on tutorial was presented at SMWCon Spring 2014 [3].

2. Natural Language Processing in Wiki Systems: Use Cases

In the design and implementation of our Wiki-NLP integration, we took three major use cases into account.

2.1. Text Mining Assistants for Wiki Users

The primary motivation for this integration is to enable wiki users – novice or expert – to benefit from modern text mining techniques directly within their wiki environment. Wikis have become powerful knowledge management platforms, while remaining easy to use and offering high customizability, from personal wikis to enterprise solutions. With a majority of content in natural language, wikis can greatly benefit from natural language processing techniques. Rather than relying on external NLP applications, we aim to bring NLP as an integrated feature to wiki systems, thereby creating new human/AI collaboration patterns, where users work together with semantic assistants on developing, structuring and improving wiki content.

2.2. Wikis as Corpora for NLP Researchers

Wikis have long been recognized as a useful source for natural language processing experiments. In this use case, the content of a wiki is accessed to develop or improve novel natural language processing tools or resources. Our integration facilitates this task through its highly flexible access methods that allow to extract a single page, a set of pages, or a whole namespace from a wiki, with metadata distinguishing content and discussion pages, as well as their history. Unlike other, existing solutions, our Wiki-NLP integration works on live content, rather than a static database or page dump.

2.3. Wikis as new User Interfaces for Language Technology Experiments

Language technologies are rapidly entering modern applications. However, testing the real-world applicability of novel natural language processing algorithms on real-world tasks so far has required a large amount of software engineering work. Our solution is based on a separation of concerns, where new NLP pipelines can be developed and easily deployed by a language engineer, without having to worry about front-end coding or web service invocations. Any new pipeline developed in GATE (which also integrates solutions from UIMA, OpenNLP, LingPipe, among others) can be automatically brokered to any connected wiki system. This greatly facilitates extrinsic experiments, where the impact of offering NLP to end users is measured on concrete tasks.

3. Architecture

The Wiki-NLP integration is a collaborative approach that combines the power of a lightweight MediaWiki extension with a server-side wiki component. While the extension is responsible for the wiki-specific tasks, such as patrolling content changes, the wiki component plays the role of an intermediator between the user's browser, the wiki engine and the Semantic Assistants framework [4] – our open source project that allows you to broker NLP pipelines as context-sensitive web services or assistants.

High-level Design of the Semantic Assistants Wiki-NLP IntegrationHigh-level Design of the Semantic Assistants Wiki-NLP Integration

The Wiki Component shown above is essentially an HTTP proxy server that dynamically creates a wiki-independent interface for the Wiki-NLP integration and injects it to the user's browser, thus, giving users the impression that they are still working with the wiki's native interface.

Our solution is designed from the ground up for scalability, robustness, and is based on fully open source software.

4. Features

The central goal of our Wiki-NLP integration is to provide a general architecture for enhancing various wiki systems with NLP techniques. Our solution is to offer a dynamically-generated user interface that is injected into the user's browser using JavaScript libraries. Such an approach allows the architecture's user interface to be portable among the multitude of existing wiki systems, providing a seamless integration of NLP capabilities within the wiki by creating the impression that the user is still interacting with the wiki's native interface.

The Wiki-NLP integration user interface embedded in a wiki pageThe Wiki-NLP integration user interface embedded in a wiki page

4.1. Light-weight MediaWiki Extension

The Wiki/NLP integration is introduced to an existing MediaWiki engine through installing a light-weight extension. Without requiring major modifications on the wiki engine, the extension adds a link to the wiki toolbox menu through which users can load the Wiki/NLP interface. Using this interface, users can then inquire about and invoke NLP services through the dynamically generated Wiki/NLP interface within the wiki environment. Therefore, no context switching is needed by the wiki users in order to use the NLP services.

4.2. NLP Pipeline Independent Architecture

The Wiki/NLP integration is backed by the Semantic Assistants server, which provides a service-oriented solution to offer NLP capabilities in a wiki system. Therefore, any NLP service available in a given Semantic Assistants server can be invoked through the Wiki/NLP integration on a wiki's content.

4.3. Flexible Wiki Input Handling

At times, a user's information need is scattered across multiple pages in the wiki. To address this problem, our Wiki/NLP integration allows wiki users to collect one or multiple pages of the wiki in a so-called "collection" and run an NLP service on the collected pages at once. This feature allows batch-processing of wiki pages, as well as gathering multiple input pages for pipelines analyzing multi-documents.

4.4. Flexible NLP Result Handling

The Wiki/NLP integration is also flexible in terms of where the NLP pipelines' output can be written. Upon a user's request, the pipeline results can be appended to an existing page body or its associated discussion page, create a new page, as well as writing to a wiki page in an external wiki, provided that it is supported by the Wiki/NLP integration architecture. Based on the type of results generated by the NLP pipeline, e.g., annotations or new files, the Wiki/NLP integration offers a simple template-based visualization capability that can be easily customized. Upon each successful NLP service execution, the Wiki/NLP integration automatically updates the existing results on the specified wiki page, where applicable.

4.5. Semantic Markup Generation

Where semantic metadata is generated by an NLP pipeline, the Wiki/NLP integration takes care of representing it in a formal language using the Semantic MediaWiki special markup. For generated metadata, the Wiki/NLP integration enriches the text with its equivalent markup and makes it permanent in the wiki repository. Therefore, for each generated result, both a user-friendly and machine-processable representation of the result is made available in the page. These markups are, in turn, transformed to RDF triples by the Semantic MediaWiki parsing engine, making them available for querying purposes as well as externalization to other applications.


Semantic enrichment of wiki text through the Wiki-NLP integrationSemantic enrichment of wiki text through the Wiki-NLP integration

For example, when the sentence "Mary won the first prize." is contained in a wiki page and processed by a Named Entity Detection pipeline, an XML document is generated by the Semantic Assistants server and returned back to the Wiki/NLP integration, which indicates "Mary" as an entity of type "Person". This XML document is then processed by our integration and transformed for Semantic MediaWiki into a formal representation in the form of markup. In our example, [[hasType::Person|Mary]] markup is generated and written into the wiki page.

The generated markup can then be queried using Semantic MediaWiki's inline queries. For example, a simple query like {{#ask: [[hasType::Person]]}} can be used to retrieve all the entities in wiki content with the type "Person".

4.6. Wiki-independent Architecture

The Wiki-NLP integration was developed from the ground up with extensibility in mind. Although the provided examples show how the Wiki-NLP integration can be used within a MediaWiki instance, it has an extensible architecture, where support for other wiki engines can be added to the architecture with a reasonable amount of effort. Both the Semantic Assistants server and the Wiki-NLP integration have a semantic-based architecture that allows adding new services and wiki engines without major modifications of their base code.

5. Download and Installation

The current implementation supports the MediaWiki engine, together with a number of extensions, in particular Semantic MediaWiki (SMW). Please visit the Semantic Assistants Architecture page for more documentation and download of our open source software.


References

  1. Sateli, B., and R. Witte, "Natural Language Processing for MediaWiki: The Semantic Assistants Approach", The 8th International Symposium on Wikis and Open Collaboration (WikiSym 2012), Linz, Austria : ACM, 08/2012.
  2. Sateli, B., "A General Architecture to Enhance Wiki Systems with Natural Language Processing Techniques", Department of Computer Science and Software Engineering, M.Sc. Software Engineering, Montreal : Concordia University, 04/2012.
  3. Witte, R., and B. Sateli, "Adding Natural Language Processing Support to your (Semantic) MediaWiki", The 9th Semantic MediaWiki Conference (SMWCon Spring 2014), Montreal, Canada, 05/2014.
  4. Witte, R., and T. Gitzinger, "Semantic Assistants – User-Centric Natural Language Processing Services for Desktop Clients", 3rd Asian Semantic Web Conference (ASWC 2008), vol. 5367, Bangkok, Thailand : Springer, pp. 360–374, Feb. 2–5, 2009, 2008.