The LODeXporter is a GATE component that allows to export NLP annotations directly to a triplestore, with configurable vocabularies, for use in LOD applications.
The LODtagger is a GATE component that provides linking entities from a document to their corresponding resource on the Linked Open Data (LOD) cloud. LODtagger relies on external tools to perform the actual content tagging and hides the complexity of communicating with LOD taggers, such as DBpedia Spotlight, from the perspective of pipeline developers.
We are happy to announce the first major public release of our protein mutation impact analysis system, Open Mutation Miner (OMM), together with a new open access publication: "Automated extraction and semantic analysis of mutation impacts from the biomedical literature", BMC Genomics, vol. 13, no. Suppl 4, pp. S10, 06/2012.
OMM is the first comprehensive, fully open source system for extracting and analysing mutation-related information from full-text research papers. Novel features not available in other systems include: the detection of various forms of mutation mentions, in particular mutation series, full mutation impact analysis, including linking impacts with the causative mutation and the affected protein properties, such as molecular functions, kinetic constants, kinetic values, units of measurements, and physical quantities. OMM provides output options in various formats, including populating an OWL ontology, Web service access, structured queries, and interactive use embedded in desktop clients. OMM is robust and scalable: we processed the entire PubMed Open Access Subset (nearly half a million full-text papers) on a standard desktop PC, and larger document sets can be easily processed and indexed on appropriate hardware.
Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually reading through the rich and fast growing repository of biomedical literature is expensive and time-consuming. Text mining methods can help by automatically analysing the literature and extracting mutation-related knowledge into a structured represenation.
Our Open Mutation Miner (OMM) system provides a number of advanced text mining components for mutation mining from full-text research papers, including the detection of various forms of mutation mentions, protein properties, organisms, impact mentions, and the relations between them. OMM provides output options in various formats, including populating an OWL ontology, Web service access, structured queries, and interactive use embedded in desktop clients. It is described and evaluated in detail in our paper, "Automated extraction and semantic analysis of mutation impacts from the biomedical literature", BMC Genomics, vol. 13, no. Suppl 4, pp. S10, 06/2012.
We just released a new version of the OwlExporter ontology population plugin for GATE. The OwlExporter PR can be added to any NLP pipeline to facilitate the population of an existing OWL ontology with entities detected in the corpus. It supports the population of separate NLP- and domain-ontologies and has support for some advanced features, like the export of coreference chains.
In this release, we included a pre-compiled binary and a complete example pipeline that transforms GATE's ANNIE information extraction example into an ontology population system. We also completely revamped the documentation and website to make it more accessible to ontology population novices.
Our open source OrganismTagger is a hybrid rule-based/machine-learning system that extracts organism mentions from the biomedical literature, normalizes them to their scientific name, and provides grounding to the NCBI Taxonomy database. Our pipeline provides the flexibility of annotating the species of particular interest to bio-engineers on different corpora, by optionally including detection of common names, acronyms, and strains. The OrganismTagger performance has been evaluated on two manually annotated corpora, OT and Linneaus. On the OT corpus, the OrganismTagger achieves a precision and recall of 95% and 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linneaus-100, the results show a precision and recall of 99% and 97% and grounding with an accuracy of 97.4%. It is described in detail in our publication, "OrganismTagger: Detection, normalization, and grounding of organism entities in biomedical documents", Bioinformatics, vol. 27, no. 19 Oxford University Press, pp. 2721--2729, August 9, 2011.
Assessing The Quality Factors Found in In-Line Documentation Written in Natural Language: The JavadocMinerSubmitted by ninus on Fri, 2011-04-15 19:14
The noun phrase chunker MuNPEx (Multi-Lingual Noun Phrase Extractor) is now available in the new and improved release v1.0. MuNPEx is a base NP chunker for the GATE framework and implemented in JAPE. It is fast, robust, customizable, well-tested and currently supports English, German, and French (with Spanish in beta).
Major changes in this release:
- Limited number of pre- and post-head modifiers to make MuNPEx more robust on certain kinds of input (like a long list of tags or menu entries when processing web pages)
- New optional grammars to add a HEAD_LEMMA slot to an NP annotation, with the lemma extracted from the GATE morphological analyser (for English), the Durm Lemmatizer (for German), or the TreeTagger (for German, Spanish, French)
- DET/MOD/HEAD/MOD2 slots are now stored as strings (rather than Content objects) to make them easier to export and compatible with the new Predicate-Argument Extractor (PAX) component
- other code cleanup and improvements
- no longer labeled as "beta" -- five years of testing ought to be enough, we're not Google ;-)
For more details and the download, please visit the MuNPEx page.