Text Mining Assistants in Wikis for Biocuration


Title | Text Mining Assistants in Wikis for Biocuration |
Publication Type | Conference Paper |
Year of Publication | 2012 |
Refereed Designation | Refereed |
Authors | Sateli, B., C. Murphy, R. Witte, M. - J. Meurs, and A. Tsang |
Conference Name | 5th International Biocuration Conference |
Pagination | 126 |
Date Published | 04/2012 |
Publisher | International Society for Biocuration |
Conference Location | Washington DC, USA |
Type of Work | Poster |
Abstract | Researchers need to extract critical knowledge from a massive amount of literature available in multiple and ever-growing repositories. The sheer volume of information makes the exhaustive analysis of literature a labor-intensive and time consuming task, during which significant knowledge can be easily missed. We present our ongoing development of a generic architecture for collaborative literature curation through a user-friendly wiki interface. Our architecture seamlessly integrates NLP capabilities in a wiki environment, allowing users - curators - to benefit from text mining techniques to discover knowledge embodied in the wiki. Content to be curated is first imported into the wiki system. In addition, domain-specific NLP pipelines, in our context developed based on the General Architecture for Text Engineering (GATE), need to be deployed. The curator's wiki interface then becomes enhanced with automated "Semantic Assistants" that collaborate with him on locating important knowledge in the wiki, much like a human assistant would. For example, concrete NLP pipelines can locate biomedical concepts, their relations, or other entities. Technically, our wiki-NLP integration adds a user interface that is dynamically injected into the users' browser on-the-fly, allowing him to invoke any arbitrary NLP service that has been made available through the Semantic Assistants architecture. Once a user requests help from a specific assistant, the selected wiki content is sent to the designated NLP pipeline for analysis. Following a successful service execution, results are transformed by the architecture and added to the wiki's database. Thereby, all updated pages become immediately available to all curators for collaborative adjustment, modification and refinement of the results. Additionally, the wiki-based framework further facilitates the curation process by automatically versioning wiki pages and providing roll-back functionality in case of erroneous annotations. In one concrete application example, we have integrated our architecture with MediaWiki, a widely-used wiki engine best known from the Wikipedia project. The transformation of the NLP pipelines' results into RDF triples is realized through the Semantic MediaWiki (SMW) extension. This standard, formal representation of the extracted knowledge permits the users to semantically query the wiki content, in addition to manual browsing. To evaluate our integration, we deployed it within the Genozymes project in order to support biomedical literature curation for lignocellulose research. The NLP service used in the experiment was our mycoMINE, a pipeline that automatically extracts knowledge from the literature on fungal enzymes by using semantic text mining approaches combined with ontological resources. The results gathered from this experiment confirm the usability and the effectiveness of our approach. |
Attachment | Size |
---|---|
biocuration2012poster.png | 1.22 MB |
- Login to post comments
- Tagged
- XML
- BibTex
- Google Scholar