Genozymes
1. Overview
In the Genozymes project, we investigated semantic technologies for scientists in biology, biochemistry, and genomics for the development of bioproducts and bioprocesses, in particular for second generation biofuel production.
The Biofuel Process
Biofuels produced from biomass are considered to be promising sustainable alternatives to fossil fuels. The conversion of lignocellulose into fermentable sugars for biofuels production requires the use of enzyme cocktails that can efficiently and economically hydrolyze lignocellulosic biomass. As many fungi naturally break down lignocellulose, the identification and characterization of the enzymes involved is a key challenge in the research and development of biomass-derived products and fuels. One approach to meeting this challenge is to mine the rapidly-expanding repertoire of microbial genomes for enzymes with the appropriate catalytic properties.
Integrating semantic support in curation, analysis and retrieval
Semantic technologies, including natural language processing, ontologies, semantic Web services and Web-based collaboration tools, promise to support users in handling complex data, thereby facilitating knowledge-intensive tasks. Trying to select the appropriate technologies and combine them in a coherent system that brings measurable improvements to the users, we work on the development of a semantic infrastructure in support of genomics-based lignocellulose research. Part of this effort is the automated curation of knowledge from information on fungal enzymes that is available in the literature and genome resources.
Text mining results from our OrganismTagger and mycoMINE systems displayed in Firefox through the Semantic Assistants plug-in
Working closely with fungal biology researchers who manually curate the existing literature, we develop ontological natural language processing pipelines integrated in a Web-based interface to assist them in two main tasks: mining the literature for relevant knowledge, and at the same time providing rich and semantically linked information.
2. Project Members
2.1. Project Supervision
2.2. Group Members
- Marie-Jean Meurs
- Development of the mycoMINE text mining pipeline
- Nona Naderi
- High-performance species name recognition system OrganismTagger
- Bahar Sateli
- Semantic Wikis with NLP support for curation through IntelliGenWiki
3. Further Information
The following publications are currently available:
- "Semantic text mining support for lignocellulose research", BMC Medical Informatics and Decision Making, vol. 12, no. Suppl 1, pp. S5, 04/2012.
- "Text Mining Assistants in Wikis for Biocuration", 5th International Biocuration Conference, Washington DC, USA : International Society for Biocuration, pp. 126, 04/2012.
- "Semantic Text Mining for Lignocellulose Research", The ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics in conjunction with CIKM, Glasgow, UK : ACM New York, NY, USA ©2011, 10/2011.
- "Towards Evaluating the Impact of Semantic Support for Curating the Fungus Scientific Literature", The 3rd Canadian Semantic Web Symposium (CSWS2011) , vol. 774 , Vancouver, British Columbia, Canada , 08/2011.
For information on the parent project, please visit the Genozymes Wiki.
4. Funding
The Genozymes project is funded by Genome Canada and Génome Québec. The Semantic Assistants project, the OrganismTagger system, and IntelliGenWiki are funded through an NSERC Discovery Grant.


