- SSL for Students
- Tools & Resources
Open Mutation Miner (OMM)
Table of Contents
- 1. Overview
- 2. Text Mining Components and Pipelines
- 3. System Output
- 4. Services
- 5. Resources
- 6. Download and Installation
- 7. License
- 8. Feedback
Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually reading through the rich and fast growing repository of biomedical literature is expensive and time-consuming. Text mining methods can help by automatically analysing the literature and extracting mutation-related knowledge into a structured represenation. As an example, consider the text fragment (from PMID 14592457):
Several single mutants (Q15K, Q15R, W37K, and W37R), double mutants (Q15K-W37K, Q15K-W37R, Q15R-W37K, and Q15R-W37R), and triple mutants (Q15K-D36A-W37R and Q15K-D36S-W37R) were prepared and expressed as glutathione S-transferase (GST) fusion proteins in Escherichia coli and purified by GSH-agarose affinity chromatography. Mutant Q15K-W37R and mutant Q15R-W37R showed comparable activity for NAD and NADP with an increase in activity nearly 3fold over that of the wild type.
Here, we want to automatically extract increase as an impact that is caused "comparably" by two mutation pairs (mutation series, comprised of two SNPs each), Q15K-W37R and Q15R-W37R. In other words, the two aforementioned mutations have the same impact on the activity of an enzyme, glutathione S-transferase (GST), which is residing in the host organism, Escherichia coli. We are also interested to know that activity as a kinetic property of the mutant enzyme is measured 3fold higher than the activity of the wild-type enzyme.
Our Open Mutation Miner (OMM) system provides a number of advanced text mining components for extracting these mutation information from full-text research papers, including the detection of various forms of mutation mentions, protein properties, organisms, impact mentions, and the relations between them. OMM provides output options in various formats, including populating an OWL ontology, Web service access, structured queries, and interactive use embedded in desktop clients. OMM is robust and scalable: we processed the entire PubMed Open Access Subset (nearly half a million full-text papers) on a standard desktop PC, and larger document sets can be easily processed and indexed on appropriate hardware. It is described and evaluated in detail in our paper : "Automated extraction and semantic analysis of mutation impacts from the biomedical literature", BMC Genomics, vol. 13, no. Suppl 4, pp. S10, 06/2012. Additional information is available in the thesis , which is now also available as a book .
2. Text Mining Components and Pipelines
Open Mutation Miner (OMM) is a component-based system that integrates multiple sub-systems. The following components are currently available.
2.1. OMM Mutations
- MutationTagger: our default integrated JAPE-based mutation tagging component
- MutationFinder: single-point mutations can be detected using the MutationFinder system, which is integrated into GATE
OMM Mutations includes additional rules to annotate and normalize mutation series, which is important for assigning impacts to mutations.
2.2. OMM Organisms
To detect host organisms and facilitate disambiguation of proteins, the Organism Tagger  is integrated as a sub-system. It detects organism mentions, normalizes them to their scientific name, and grounds textual mentions to the NCBI Taxonomy database.
2.3. OMM Impacts
The OMM Impact subsystem detects impact information of mutations, annotates them, and links the impacts to their originating mutation.
In particular, OMM Impacts currently contains the following sub-components:
- ImpactTagger: detection of impact mentions
- ProteinPropertiesTagger: detection of protein properties, including molecular functions, kinetic constants, kinetic values, units of measurements, and physical quantities
- ImpactGrounding: detection of the relation between impacts and their originating mutation
OMM Impacts integrates the Gene Ontology (GO) for the analysis of protein properties. Detected molecular functions are annotated with their GO ID.
3. System Output
Open Mutation Miner is based on a flexible architecture that facilitates multiple use cases, including batch-processing of a large number of publications in order to build a knowledge base or interactive use for supporting literature analysis. At present, the following output formats are supported (in addition to plain XML result export).
3.1. OMM Ontology Export
Results of the analysis can be exported into the OMM Ontology (ontology population) through the OwlExporter component. The resulting ontology can then be queried to facilitate knowledge discovery. Note that exported entities such as mutations, impacts, protein properties, or physical quantities are all linked to the originating paper from which they were extracted, which enables query-based literature navigation.
3.2. OMM Impact Summarization
Automatic summarization of a single publication, or a corpus of papers, is a particular application example of the ontology export functionality in OMM: Using a custom SPARQL query, all detected mutations can be extracted for a publication and displayed in various formats.
OMM results can be brokered through a number of different services, depending on the concrete use case: end users can browse results from batch processing through a query interface, send a publication to a web service for analysis, or invoke OMM from a connected client, like OpenOffice, when writing a new publication, in order to improve its content for automated analysis.
4.1. OMM Query
The structured information extracted by OMM can be queried together with the unstructured natural language text in a unified query language based on GATE Mímir. Indexed documents can be queried through a web interface or through a RESTful web service. A template for Mímir-based indexing is included with the OMM distribution. For an example, using documents from the PMC Open Access Corpus, please visit our OMM Query page.
4.2. OMM Web Service
OMM can be run as a Web service: Towards this end, an OWL service description for the Semantic Assistants framework is included in the release distribution. Simply follow these two steps, and after re-loading your SA server, it will now offer an "ImpactMining" service:
- Copy the OMM-Impacts.owl service description into the directory Resources/OwlServiceDescriptions/ inside your semantic-assist installation
- Copy the directory "gate" to Resources/GatePipelines and rename it to "OMM"
Either by using the command line client that ships with the Semantic Assistants architecture or any other Semantic Assistants-enabled client, like OpenOffice, you can execute the service and process input documents. Of course, the server also accepts plain SOAP requests. Here's a (simplified) example of the XML service response you'd get from the OMM Mutations service when sending the text "Hello Ala123Ser." for analysis:
- <?xml version="1.0"?>
- <annotation type="MutationMention" annotationSet="Annotation" isBoundless="">
- <document url="">
- <annotationInstance content="Ala123Ser" start="6" end="15">
- <feature name="mutationId" value="8"/>
- <feature name="docName" value="Ala123Ser"/>
- <feature name="type" value="single"/>
- <feature name="wildType" value="A"/>
- <feature name="base" value="A123"/>
- <feature name="mutant" value="S"/>
- <feature name="position" value="123"/>
- <feature name="wNm" value="A123S"/>
Our server minion.cs.concordia.ca (port 8879) usually offers the services "OMM Mutations" and "OMM Impacts", if you want to test them before installation. For more details, please refer to the OMM documentation.
4.3. OMM Semantic Assistant
The published OMM web service can be accessed through any SA-enabled client in order to analyse existing publications or assist in writing new ones. Information extracted through the OMM modules are mapped to the input documents, depending on the client use.
Many resources included with the text mining components, such as gazetteering lists or RDF triples, can be easily reused in other applications. Additionally, we developed a number of general resources for the domain of mutation analysis:
5.1. OMM Ontology
The OMM ontology is implemented in the standard Web Ontology Language (OWL) format. It formally models concepts and their relations in the mutation analysis domain, providing semantic knowledge during the text mining process. For example, the restrictions placed on possible slot fillers for a specific protein property allows to filter out erroneous value assignments during the analysis process, thereby improving overall precision.
Results of the text mining process are exported to the ontology (see above), linking entities in the impact domain ontology with our NLP ontology, which provides provenance information when querying the populated knowledge base (originating document, paragraph, sentence, etc.).
5.2. OMM Corpora
Two manually annotated corpora, containing full-text articles, are used for evaluation: (i) a corpus of 11 documents on protein engineering for evaluating the extraction of mutation series. (ii) a corpus of biomedical documents on enzymes for impact extraction and grounding evaluations. Due to copyright restrictions, we currently only publicly distribute our manual annotations, included in the system download below.
6. Download and Installation
- Download, install and configure GATE (v7 or better).
- Start GATE, open the Plugin Manager.
- Go to the "Configuration" tab and enable the "Semantic Software Lab" repository. Click "Apply All". (You might also need to specify your "User Plugin Directory", where the downloads will be stored.)
- Go to the "Available to install" tab and select "Open Mutation Miner". Click "Apply All".
- Go to the "Installed Plugins" tab and either check "Load Now" or "Load Always". Click "Apply All".
- Load one of the example pipelines from "File" → "Ready Made Applications" as shown in the screenshot.
You can also download the install package manually (but the recommended way of installation is to use the GATE Plugin Manager through the GATE Developer GUI).
For more details, in particular regarding setting up web services, a query interface, etc., please refer to the included documentation.
Release v1.2 (27.07.2012):
- Service release; fixed a packaging issue that affected some users
First major public release v1.1 (22.07.2012):
- Added pre-packaged pipelines for GATE7 Plugin Manager
- OMM Query subsystem based on GATE Mímir added
Initial review release was v1.0 (4.12.2011).
The Open Mutation Miner (OMM) system, including its components and resources, is published under the GNU Affero General Public License v3 (AGPL3).
- "Automated extraction and semantic analysis of mutation impacts from the biomedical literature", BMC Genomics, vol. 13, no. Suppl 4, pp. S10, 06/2012.
- "Automated Extraction of Protein Mutation Impacts from the Biomedical Literature", Department of Computer Science and Software Engineering, M. Comp. Sc., Montreal : Concordia University, 09/2011.
- Mutation Impact Analysis System: Automated Extraction of Protein Mutation Impacts from the Biomedical Literature, LAP LAMBERT Academic Publishing, pp. 1-224, 2012.
- "OrganismTagger: Detection, normalization, and grounding of organism entities in biomedical documents", Bioinformatics, vol. 27, no. 19 Oxford University Press, pp. 2721--2729, August 9, 2011.