Skip navigation.
Home
Semantic Software Lab
Concordia University
Montréal, Canada

Semantic Web Company

Syndicate content
Open World Assumptions
Updated: 13 hours 6 min ago

Automatic text analytics using DBpedia and PoolParty – A Live Demo

Thu, 2012-02-02 06:22

Let me show you which steps have to be taken to generate a high-quality text mining application, ready to be used to annotate and to categorize any kind of text or documents covering nearly any domain. With our approach of thesaurus based text mining your documents can also be linked to the world of linked (open) data; enrich your documents with data from the LOD cloud!

Step 1. Generate a thesaurus by using a linked data source like DBpedia

As recently reported SWC has developed a tool called SKOSsy which can be used to extract seed thesauri from DBpedia. In our example I will generate a knowledge model describing the domain of “digital photography“. This step took around 15 minutes.

Step 2. Load the thesaurus into PoolParty and improve it to your needs

After the seed thesaurus has been loaded into PoolParty Thesaurus Manager you have many possibilities to enhance the knowledge model further: Add more categories, synonyms, relations etc. In this example I use the seed-thesaurus without any further improvements. This step took approximately 2 minutes.

Step 3. Generate an automatic text extractor on top of your thesaurus

This step took a couple of seconds and ended up in having generated a fast and reliable text mining application on top of PoolParty Extractor, ready to be used to enrich your documents with data from the LOD cloud.

You can try it out here: PPX Live-Demo

To try the extractor on your own, please take a look at the image above which shows a proper configuration, you have to insert the following UUID in the form: d35d4ddb-adc3-4ea5-b027-deacac03e391

Since our example is all about ‘digital photography’, we recommend to use text samples (or some fragments) like these ones to test the quality of PPX based text analytics:

Let us know what you think about this straight-forward approach and your opinion about the quality of the results. We believe that thesaurus based text mining is in many cases an alternative to some other approaches, especially if you want to to enrich your content with information from the upcoming web of data.

Of course we would be happy to generate other demos in the areas of your interest! Just get in contact with us by using our contact form.

Categories: Blogroll

Linked Open Data: The Essentials – A quick start guide for decision makers

Fri, 2012-01-20 03:18

Together with REEEP (Renewable Energy and Energy Efficiency Partnership) the Semantic Web Company (SWC) has composed a fundamental publication on the topic of Linked Open Data.

Linked Open Data: The Essentials provides answers to the following key questions:

  • What do the terms Open Data, Open Government Data and Linked Open Data actually mean, and what are the differences between them?
  • What do I need to take into account in developing a LOD strategy?
  • What does my organisation need to do technically in order to open up and publish its datasets?
  • How can I make sure the data is accessible and digestible for others?
  • How can I add value to my own data sets by consuming LOD from others?
  • What can be learned existing best practices?
  • What are the key potentials of sharing and consuming open datasets?

Read more about this publication and find out how to obtain a copy.

Categories: Blogroll

SKOSsy-Lottery: Free Pass to Semantic Tech & Business Conference, Berlin

Wed, 2012-01-18 09:15

As PoolParty Team is present at SemTechBiz Berlin 2012 (February 6-7), we want you to join us.  This is why we have issued a little lottery to give away a full conference pass (€795) plus our unique PoolParty Cocktail Shaker in a set

How to enter the SKOSsy-lottery:
  • Enter a comment in this post. One comment per person. Describing which type of thesaurus you are interested in.
  • All comments must be submitted before Jan 25, 2012.
  • The winners will be selected at random.

Together with our PoolParty Suite, we are ready to present SKOSsy on our booth at SemTechBiz Berlin 2012 Exhibition area.  SKOSsy is a handsome tool, which generates SKOS based seed-thesauri in German or in English by extracting data from DBpedia. See our finger exercise on a thesaurus describing the world of Alan Turing – done with SKOSsy.

Let us know, which knowledge realm you are interested in and join the lottery now. Good luck, and see you in Berlin.

 

Categories: Blogroll

The ESA vocabulary site – Making Publishing and Reusing Vocabularies Easier

Mon, 2012-01-09 10:29

Reviewing the interview we made with Les Kneebone (project manager of the vocabulary projects at Education Services Australia) in November 2010 we can see that ESA has been one of the early adopters of SKOS as a standard for thesaurus development. Les said then: “We had already identified SKOS as an important standard for ScOT so it was natural to select PoolParty as our new thesaurus management tool”. Around a year later ESA´s vocabulary site went online with PoolParty as its basis.

We asked Les to comment on his statement from last year and he confirmed that SKOS continues to be central to the ESA vocabulary business model and that it has also been important for ESA that PoolParty has been flexible enough to support continued publication of non-RDF formats, especially IMS VDEX.

In the course of this project it became more and more obvious that SKOS cannot only be used as yet another format for publishing thesauri but rather as a unified model to build thesauri in general. This approach made possible several improvements to the vocabulary development model and the maintenance process of ESA. Since all data is stored as RDF in a triple store, and SKOS and RDF are flexible formats supporting interoperability and interchangeability of data, many manual transformations that had to be done before are not needed anymore and all other systems using the vocabularies are dynamically fed by PoolParty offering the data in its needed formats (see image below).

Changes in ESA’s vocabulary development model

Les states that while some manual processes still exist to support legacy systems, PoolParty ensures the integrity and richness of ESA data. Support and customizations for legacy systems can be achieved in the confidence that the linked-data capabilities are centrally managed and stored in the PoolParty triple store.

From the publishing perspective, the previous vocabulary publishing site has been replaced by the PoolParty Linked Data Frontend (LD-Frontend) that has been customized especially for this project to offer more flexibility in the display and the layout of the data. Similar to the frontend for the Austrian Geological Survey mentioned in a previous blog post , the LD-Frontend has been adapted to the ESA styleguide and the display of the data in the HTML view of the frontend has been adapted to be more user-friendly (see screenshot below).

From ESA’s perspective Les commented here that for the vocabulary manager, edits to the frontend styles and templates are intuitive and can be tested in staging environments. But he also stated that for publishing support is important, and that SWC was very responsive.

Example ESA linked data frontend

Of course we asked Les to give a preview of the next steps for ESA. He stated that they include language translation projects so that its vocabularies, especially Schools Online Thesaurus (ScOT), can be accessed by wider markets and by students of other languages. He also stated that PoolParty handles multi-lingual thesauri very well.

We here at SWC are glad to see PoolParty used in more and more applications and usage scenarios. We are looking forward to the next steps that will be done in this project and also to see how the data offered by the ESA vocabulary site is used in other applications.

Thanks to Les Kneebone from ESA for his contribution to his blog post.

Categories: Blogroll

Going to SEMTECHBIZ Berlin 2012

Thu, 2011-12-29 05:36

I went to London last September to visit SemTechBiz UK to represent the Semantic Web Company and PoolParty technologies in the exhibition area of this excellent conference. I had tons of interesting talks at our booth and – although I never found time to visit any talk – I have learned again a lot about customer´s needs.

Compared to ISWC or ESWC, two other major conferences in the area of semantic web, SemTechBiz is clearly the place to go if you´re interested in semantic web applications. Especially in the last three years we have observed a continuous growth of acceptance and demand for semantic web technologies in various industries. For many information professionals and IT managers it has become clearer than ever before that semantic web applications can solve several well-known problems in the areas of enterprise search, data integration, business intelligence and knowledge management.

Thus it was great news for us to have another SemTechBiz conference in place – this time in Berlin, which is one of the most vibrant cities in the world when it comes to innovative web technologies like linked data or open data. And again we will “explore how semantic solutions and linked data are being embraced throughout companies across a diverse range of disciplines and business categories”.

We hope to meet you at SemTechBiz Berlin 2012 (February 6-7) – PoolParty Team is present as Gold Sponsor and is looking forward to meeting you in the exhibition area to talk with you about your semantic web applications.

Related articles
Categories: Blogroll

I-Semantics: Get in touch with Europe´s Linked Data community!

Wed, 2011-12-28 08:43

In September 2012 I-Semantics will take place the 8th time. With more than 400 participants every year the conference is one of the largest conferences in Europe in the field of semantic systems and the semantic web.  It is held concurrently with the I-KNOW Conference on Knowledge Management and Knowledge Technologies.

I-Semantics is a conference aiming to bring together science and industry:

  • To address the needs and interests of industry the iPraxis track presents enterprise solutions that deal with semantic processing of data and/or information in areas like like Linked Data, Data Publishing, Semantic Search, Recommendation Services, Sentiment Detection, Search Engine Add-Ons, Thesaurus and/or Ontology Management, Text Mining, Data Mining and any related fields.
  • In the exhibition area I-SEMANTICS 2012 will offer its participants a unique platform either to present latest and leading edge developments or to catch up with the developments of most innovative IT technologies, content applications, knowledge management trends and emerging market opportunities.
  • For the first time in 2012 we will bring to you the I-CHALLENGE, consisting of the Best Paper Award, the Best Poster Award, the Best PhD Paper and the Linked Data Cup.
  • I-SEMANTICS 2012 proceedings will be published in the digital library of the ACM ICP Series and will contain all accepted papers from the Research & Application track and the I-CHALLENGE. The topics of interest for research and application papers include (but are not limited to): The Web of Data, Quality of Semantic Data on the Web, Corporate Semantic Web, Semantic Content Engineering, Semantic Multimedia and (Linked) Data Ecosystems & Markets

Website: I-Semantics 2012

Related articles
Categories: Blogroll

Experiences from teaching Linked Data

Sun, 2011-12-11 10:55

Dr. Bernhard Haslhofer works as instructor on Web Information Systems at Cornell Information Science. Just recently he gave a course which examined technologies for building data-centric information systems on the World Wide Web. Semantic Web Company (SWC) had the opportunity to talk with Dr. Haslhofer to examine the question “How to teach Linked Data?“.

SWC: Bernhard, you have been working on the Semantic Web and Linked Data for years now. What is the first lesson you usually give when you try to explain the “Semantic Web”?

Maybe I should first clarify that the course I am co-teaching is not a Semantic Web course. The course is about data-centric Web information systems in general and we spent some classes talking about Linked Data and Semantic Web technologies. We start explaining the origins and the fundamental architectural principles of the World Wide Web and then focus on the data-centric aspects of the Web.

“instead of building isolated repository-centric APIs we could also build a globally connected data graph

After introducing various data exchange formats (XML, JSON & co.) we teach how Web APIs work, and discuss the design principles of RESTful Web Services. Then the conceptual transition to Linked Data is just a small step, because we can argue that instead of building isolated repository-centric APIs we could also build a globally connected data graph, which is based on a uniform data model and can be traversed and queried using SPARQL.

“DBpedia and all the other existing Linked Data projects and tools that came up in recent years really help in explaining and illustrating how things work”

So, I am somehow approaching the “Semantic Web” bottom-up and concentrate on the “visible” parts of the “Semantic Web” vision. DBpedia and all the other existing Linked Data projects and tools that came up in recent years really help in explaining and illustrating how things work. And last but not least, schema.org and the design of the Facebook Open Graph protocol also show the growing importance of having structured data on the Web.

SWC: At least for non-technicians “Linked Data” sounds very technical. Antoine de Saint-Exupery said: “If you want to build a ship, don’t drum up people to collect wood and don’t assign them tasks and work, but rather teach them to long for the endless immensity of the sea.” Is there an “endless immensity of the sea” you try to bring in as well?

If you can access and combine data from the Web you can answer interesting questions and discover previously unknown relationships between things. We thought the best way to learn about Linked Data is to implement simple demo applications. So we asked the students to think about uses cases that bring some benefit for end users and require data from several Web sources to answer certain questions.

“I think it became clear what it means to work with easily accessible structured Web data opposed to working with unstructured data”

One group developed a service which connects safety records with public transport information. Now users can now easily choose the “safest” bus connection between from and to New York City and other cities. Another group combined public school district information with geographic data, which now allows parents to view statistical information about school districts in New York State by using apps like Google Earth. There are many more examples, but most importantly, I think it became clear what it means to work with easily accessible structured Web data opposed to working with unstructured data.

SWC: Instructing how to use the Semantic Web is not only a matter of slide-decks. It is rather a question of concrete use cases in combination with tool skills. What kind of tool skills should students of information sciences acquire to your opinion?

Collecting and making sense out of data is a common scholarly practice in many research areas and the Web is becoming, or is already, the primary medium for publishing and distributing results. I believe that making data accessible as part of a some research activity will become increasingly important in future and the Web will probably be infrastructure for doing this.

So I think that a student who is working with data should at least know (i) how to retrieve and (ii) how to publish data on the Web in way that others can easily discover, access, and use their data. Linked Data is one possible technical approach for doing that.

SWC: As a European who is teaching and working in the U.S., how do you perceive the different approaches between those two systems when it comes to transfer complex fields of knowledge like the semantic web from universities to business environments?

From the experiences I have made in my previous and current working environments I can only tell that the relations between businesses and universities seem to be tighter in the US. I don’t necessarily mean “formal” bounds between institutions but rather informal relations between people, who understand complex fields of knowledge, both in the academia and in business.

“I assume transferring knowledge between two proxies who speak the same ‘language’ makes it a lot easier”

PhD students, for instance, often work in business over the summer and/or continue their career in the research department of some company. Some continue their cooperation with their former professors and academic colleagues and I assume transferring knowledge between two proxies who speak the same “language” makes it a lot easier.

SWC: What are the most important things which are still missing to make linked data technologies an integral part of enterprise information systems?

Quite often I hear the complaint that major database vendors still don’t provide satisfactory RDF support in their products. I don’t think this is a necessary precondition for implementing Linked Data but for some institutions this seems to be very important.

Many thanks!

Related articles
Categories: Blogroll

WordPress plugin to make use of linked data

Sun, 2011-12-11 10:26

PoolParty Team has recently published an improved version of their WordPress plugin which enables linked data enrichments of blogs. Therefore a SKOS based vocabulary has to be uploaded or retrieved from a SPARQL-endpoint. Users and developers benefit from

  • automatic annotation of all blog entries displayed as tooltips
  • a comfortable search facility with auto-complete over all concepts from the linked thesaurus including semantic search over the whole blog
  • an integrated thesaurus browser, plus
  • a corresponding linked data frontend including RDF/XML serialization of the underlying thesaurus + SPARQL endpoint

All details about the new version 2.2.3 can be read here.

Related articles
Categories: Blogroll

Introducing SKOSsy – generate thesauri on the fly!

Tue, 2011-11-29 10:52

Imagine you could generate any thesaurus you would like for nearly any knowledge domain you can think of with quite a good quality! Sounds impossible? Reminds you of all the promises made by text mining software which generates “semantic nets” from scratch?

Let me introduce you to SKOSsy. I will explain what this web service can do for you:

SKOSsy generates SKOS based thesauri in German or in English for a domain you are interested in. Not any domain but nearly any: SKOSsy extracts data from DBpedia, so it can cover anything which is in DBpedia. Thus, SKOSsy works well whenever a first seed thesaurus should be generated for a certain organisation or project. If you load the automatically generated thesaurus into an editor like PoolParty Thesaurus Manager (PPT) you can start to enrich the knowledge model by additional concepts, relations and links to other LOD sources. But you don´t have to start in the open countryside with your thesaurus project.

Let me give you an example: Imagine you are working for a company which is an international plant builder and you would like to index several thousands of documents the “semantic way”. You have to walk through the following steps:

  1. Identify proper categories in Wikipedia/DBpedia which describe best what your business or your domain is all about. Those categories should contain pages / resources which are related to the documents you would like to index. For example: http://dbpedia.org/resource/Category:Metalworking or http://dbpedia.org/resource/Category:Industrial_automation
  2. After you have selected proper categories SKOSsy will traverse DBpedia for you and collect all resources, their hierarchical and non-hierarchical relations, alternative labels, definitions and other properties and put them together as a valid SKOS thesaurus; this step will last a couple of minutes. (Find the resulting vocabulary here)
  3. Load the resulting thesaurus into PPT, explore it, improve it and enrich it with additional facts.
  4. After you´re done you can generate a tailor-made text extractor by using PoolParty Extractor (PPX) which is the second component of PoolParty product family
  5. With PPX and its extraction model especially curated for your special use case you can extract named entities from your documents automatically and index your documents in a meaningful way.
  6. After a few seconds your semantic search engine is ready to be used. PoolParty Semantic Search (PPS) which is the third PoolParty component will offer some nice facilities like categorized auto-complete, faceted search, content recommendation (similarity search) and smart search suggestions to ease your life as a knowledge worker.

We have constantly discussed the application of thesauri and other knowledge models to improve search over the last years. Many people understood straight away why thesaurus based search is most often much better than search algorithms purely based on statistics. Of course the big contra always was, “the costs are too high to establish a “good-enough” thesaurus or even a “high-quality” one”.

With SKOSsy in place those kinds of arguments become weaker and weaker. To sum up,

  • SKOSsy makes heavy use of Linked Data sources, especially DBpedia
  • SKOSsy can generate SKOS thesauri for virtually any domain within a few minutes
  • Such thesauri can be improved, curated and extended to one´s individual needs but they serve usually as “good-enough” knowledge models for any semantic search application you like
  • SKOSsy based semantic search usually outperform search algorithms based on statistics since they contain high-quality information about relations, labels and disambiguation
  • SKOSsy works perfectly together with PoolParty product family

If you are interested in the results produced by SKOSsy, just send us a short note about your domain or your project and we will send you an invitation as beta-tester or prepare a demo for you.

Related articles
Categories: Blogroll

Geological Survey Austria launches thesaurus project

Mon, 2011-10-17 10:54

Throughout the last year the Semantic Web Company team has supported the Geological Survey of Austria (GBA) in setting up their thesaurus project. It started with a workshop in summer 2010 where we discussed use cases for using semantic web technologies as means to fulfill the INSPIRE directive. Now in fall 2011 GBA published their first thesauri as Linked Data using PoolParty’s new Linked Data front-end.

The Thesaurus Project of the GBA aims to create controlled vocabularies for the semantic harmonization of map-based geodata. The content-related realization of this project is governed by the Thesaurus Editorial Team, which consists of domain experts from the Geological Survey of Austria. With the development of semantically and technically interoperable geo-data the Geological Survey of Austria implements its legal obligation defined by the EU-Directive 2007/2/EC INSPIRE and the national “Geodateninfrastrukturgesetz” (GeoDIG), respectively.

Marcus Ebner, from the GBA Thesaurus Editorial Team

The construction of the thesauri has been done using the PoolParty Thesaurus Manager so they all are based on SKOS and fully compliant to the Linked Data principles. Apart from the standard implementation of SKOS some additions were made to the data model using Dublin Core terms for extra metadata and custom sub properties of skos:related to give some semantic constraints to related properties. This basically means that a big effort was put into the integration of bibliographic references for every concept in the data set using dcterms:source. This aims at the requirements of reuse by the scientific community and incorporation in domain specific data sets. On the other hand rdfs:subProperityOf was used to express how international geologic time scales map on regional concepts.

Currently four thesauri have been published, all are available in English and German and can be used under the cc-by-sa license. Also mappings to DBpedia have been made:

With the new PoolParty Release (3.0) the Linked Data front-end has been redesigned and is now highly customizable and extendable. In the GBA Thesaurus Project it is used as an publishing interface for the created controlled vocabularies both for the machine readable RDF version and an custom HTML version for comfortable browsing and searching.

GBA Linked Data frontend

After all it’s satisfying to see a project we’ve supported and worked on for some time now come to live and now we are looking forward to the next steps that will be done in this project.

P.S.: Thanks to Marcus Ebner from GBA for his contribution to his blog post.

Categories: Blogroll