One good thing about doing machine learning at present is that people actually use it! The back-ends of many systems we interact with on a daily basis are driven by machine learning. In most such systems, as users interact with the system, it is natural for the system designer to wish to optimize the models under the hood over time, in a way that improves the user experience. To ground the discussion a bit, let us consider the example of an online portal, that is trying to present interesting news stories to its user. A user comes to the portal and based on whatever information the portal has on the user, it recommends one (or more) news stories. The user chooses to read the story or not and life goes on. Naturally the portal wants to better tailor the stories it displays to the users’ taste over time, which can be observed if users start to click on the displayed story more often.
A natural idea would be to use the past logs and train a machine learning model which prefers the stories that users click on and discourages the stories which are avoided by the users. This sounds like a simple classification problem, for which we might use an off-the-shelf algorithm. This is indeed done reasonably often, and the offline logs suggest that the newly trained model will result in a lot more clicks than the old one. The old model is deployed, only to find out its performance is not as good as hoped, or even poorer than what was happening before! What went wrong? The natural reaction is typically that (a) the machine learning algorithm needs to be improved, or (b) we need better features, or (c) we need more data. Alas, in most of these cases, the right answer is (d) none of the above. Let us see why this is true through a simple example.
Imagine a simple world where some of our users are from New York and others are from Seattle. Some of our news stories pertain to finance, and others pertain to technology. Let us further imagine that the probability of a click (henceforth CTR for clickthrough rate) on a news article based on city and subject has the following distribution:City Finance CTR Tech CTR New York 1 0.6 Seattle 0.4 0.79 Table1: True (unobserved) CTRs
Of course, we do not have this information ahead of time while designing the system, so our starting system recommends articles according to some heuristic rule. Imagine that we user the rule:
- New York users get Tech stories, Seattle users get Finance stories.
Now we collect the click data according to this system for a while. As we obtain more and more data, we obtain increasingly accurate estimates of the CTR for Tech stories and NY users, as well as Finance stories and Seattle users (0.6 and 0.4 resp.). However, we have no information on the other two combinations. So if we train a machine learning algorithm to minimize the squared loss between predicted CTR on an article and observed CTR, it is likely to predict the average of observed CTRs (that is 0.5) in the other two blocks. At this point, our guess looks like:
City Finance CTR Tech CTR New York 1 / ? / 0.5 0.6 / 0.6 / 0.6 Seattle 0.4 / 0.4 / 0.4 0.79 / ? / 0.5 Table2: True / observed / estimated CTRs
Note that this would be the case even with infinite data and an all powerful learner, so machine learning is not to be faulted in any way here. Given these estimates, we naturally realize that show finance articles to Seattle users was a mistake, and switch to Tech. But Tech is also looking pretty good in NY, and we stick with it. Our new policy is:
- Both NY and Seattle users get Tech articles.
Running the new system for a while, we will fix the erroneous estimates for the Tech CTR on Seattle (that is, up 0.5 to 0.79). But we still have no signal that makes us prefer Finance over Tech in NY. Indeed even with infinite data, the system will be stuck with this suboptimal choice at this point, and our CTR estimates will look something like:City Finance CTR Tech CTR New York 1 / ? / 0.59 0.6 / 0.6 / 0.6 Seattle 0.4 / 0.4 / 0.4 0.79 / 0.79 / 0.79 Table3: True / observed / estimated CTRs
We can now assess the earlier claims:
- More data does not help: Since Observed and True CTRs match wherever we are collecting data
- Better learning algorithm does not help: Since Predicted and Observed CTRs coincide wherever we are collecting data
- Better data does help!! We should not be having the blank cell in observed column.
This seems simple enough to fix though. We should have really known better than to completely omit observations in one cell of our table. With good intentions, we decide to collect data in all cells. We choose to use the following rule:
- Seattle users get Tech stories during day and finance stories during night
- Similarly, NY users get Tech stories during day and finance stories during night
We are now collecting data on each cell, but we find that our estimates still lead us to a suboptimal policy. Further investigation might reveal that users are more likely to read finance stories during the day when the markets are open. So when we only display finance stories during night, we underestimate the finance CTR and end up with wrong estimates. Realizing the error of our ways, we might try to fix this again and then run into another problem and so on.
The issue we have discovered above is that of confounding variables. There is lot of wonderful work and many techniques that can be used to circumvent confounding variables in experimentation. Here, I mention the simplest one and perhaps the most versatile one of them: Randomization. The idea is that instead of recommending stories to users according to a fix deterministic rule, we allow for different articles to be presented to the user according to some distribution. This distribution does not have to be uniform. In fact, good randomization would likely focus on plausibly good articles so as to not degrade the user experience. However, as long as we add sufficient randomization, we can then obtain consistent counterfactual estimates of quantities from our experimental data. There is growing literature on how to do this well. A nice paper which covers some of these techniques and provides an empirical evaluation is http://arxiv.org/abs/1103.4601. A more involved example in the context of computational advertising at Microsoft is discussed in http://leon.bottou.org/papers/bottou-jmlr-2013.
Traditional KOSs include a broad range of system types from term lists to classification systems and thesauri. These organization systems vary in functional purpose and semantic expressivity. Most of these traditional KOSs were developed in a print and library environment. They have been used to control the vocabulary used when indexing and searching a specific product, such as a bibliographic database, or when organizing a physical collection such as a library (Hodge et al. 2000).KOS in the era of the Web
With the proliferation the World Wide Web new forms of knowledge organization principles emerged based on hypertextuality, modularity, decentralisation and protocol-based machine communication (Berners-Lee 1998). New forms of KOSs emerged like folksonomies, topic maps and knowledge graphs, also commonly and broadly referred to as ontologies.
With reference to Gruber’s (1993/1993a) classic definition:
“a common ontology defines the vocabulary with which queries and assertions are exchanged among agents” based on “ontological commitments to use the shared vocabulary in a coherent and consistent manner.”
From a technological perspective ontologies function as integration layer for semantically interlinked concepts with the purpose to improve the machine-readability of the underlying knowledge model. Ontologies leverage interoperability from a syntactic to a semantic level for the purpose of knowledge sharing. According to Hodge et al. (2003)
“semantic tools emphasize the ability of the computer to process the KOS against a body of text, rather than support the human indexer or trained searcher. These tools are intended for use in the broader, more uncontrolled context of the Web to support information discovery by a larger community of interest or by Web users in general.” (Hodge et al. 2003)
In other words ontologies are being considered valuable to classifying web information in that they aid in enhancing interoperability – bringing together resources from multiple sources (Saumure & Shiri 2008, p. 657).Which KOS serves your needs?
Schaffert et al. (2005) introduce a model to classify ontologies balong their scope, acceptance and expressivity, as can be seen in the figure below.
According to this model the design of KOSs has to take account of the user group (acceptance model), the nature and abstraction level of knowledge to be represented (model scope) and the adequate formalism to represent knowledge for specific intellectual purposes (level of expressiveness). Although the proposed classification leaves room for discussion, it can help to distinguish various KOSs from each other and gain a better insight into the architecture of functionally and semantically intertwined KOSs. This is especially important under conditions of interoperability.
 It must be critically noted that the inflationary usage of the term “ontology” often in neglect of its philosophical roots has not necessarily contributed to a clarification of the concept itself. A detailed discussion of this matter is beyond the scope of this post. In this paper the author refers to Gruber’s (1993a) definition of ontology as “an explicit specification of a conceptualization”, which is commonly being referred to in artificial intelligence research.
The next post will look at trends inknowledge organization before and after the emergence of the world wide web.
Go to the previous post:Thoughts on KOS (Part1): Getting to grips with “semantic” interoperabilityReferences:
Gruber, Thomas R. (1993). Toward Principles for the Design of Ontologies Used for Knowledge Sharing. In International Journal Human-Computer Studies 43, pp. 907-928.
Gruber, Thomas R. (1993a). A translation approach to portable ontologies. In: Knowledge Acquisition, 5/2, pp. 199-220
Hodge, Gail (2000). Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files. In: First Digital Library Federation electronic edition, September 2008. Originally published in trade paperback in the United States by the Digital Library Federation and the Council on Library and Information Resources, Washington, D.C., 2000
Hodge, Gail M.; Zeng, Marcia Lei; Soergel, Dagobert (2003). Building a Meaningful Web: From Traditional Knowledge Organization Systems to New Semantic Tools. In: Proceedings of the 2003 Joint Conference on Digital Libraries (JCDL’03), IEEE
Saumure, Kristie; Shiri, Ali (2008). Knowledge organization trends in library and information studies: a preliminary comparison of pre- and post-web eras. In: Journal of Information Science, 34/5, 2008, pp. 651–666
Schaffert, Sebastian; Gruber, Andreas; Westenthaler, Rupert (2005). A Semantic Wiki for Collaborative Knowledge Formation. In: Reich, Siegfried; Güntner, Georg; Pellegrini, Tassilo; Wahler, Alexander (Eds.). Semantic Content Engineering. Linz: Trauner, pp. 188-202
Enabling and managing interoperability at the data and the service level is one of the strategic key issues in networked knowledge organization systems (KOSs) and a growing issue in effective data management. But why do we need “semantic” interoperability and how can we achieve it?Interoperability vs. Integration
The concept of (data) interoperability can best be understood in contrast to (data) integration. While integration refers to a process, where formerly distinct data sources and their representation models are being merged into one newly consolidated data source, the concept of interoperability is defined by a structural separation of knowledge sources and their representation models, but that allows connectivity and interactivity between these sources by deliberately defined overlaps in the representation model. Under circumstances of interoperability data sources are being designed to provide interfaces for connectivity to share and integrate data on top of a common data model, while leaving the original principles of data and knowledge representation intact. Thus, interoperability is an efficient means to improve and ease integration of data and knowledge sources.Three levels of interoperability
When designing interoperable KOSs it is important to distinguish between structural, syntactic and semantic interoperability (Galinski 2006):
- Structural interoperability is achieved by representing metadata using a shared data model like the Dublin Core Abstraction Model or RDF (Resource Description Framework).
- Syntactic interoperability if achieved by serializing data in a shared mark-up language like XML, Turtle or N3.
- Semantic interoperability is achieved by using a shared terminology or controlled vocabulary to label and classify metadata terms and relations.
Given the fact that metadata standards carry a lot of intrinsic legacy, it is sometimes very difficult to achieve interoperability at all three levels mentioned above. Metadata formats and models are historically grown, they are most of the time a result of community decision processes, often highly formalized for specific functional purposes and most of the time deliberately rigid and difficult to change. Hence it is important to have a clear understanding and documentation of the application profile of a metadata format as a precondition for enabling interoperability at all three levels mentioned above. Semantic Web standards do a really good job in this respect!!
In the next post, we will take a look at various KOSs and how they differ with respect to expressivity, scope and target group.
For the Nolde project it was requested to build a knowledge graph, containing detailed information about the austrian music scene: artists, bands and their music releases. We decided to use PoolParty, since theses entities should be accessible in an editorial workflow. More details about the implementation will be provided in a later blog post.
In the first round I want to share my experiences with the mapping of music data into SKOS. Obviously, LinkedBrainz was the perfect source to collect and transform such data since this is available as RDF/NTriples dumps and even providing a SPARQL endpoint! LinkedBrainz data is modeled using the Music Ontology.
E.g. you can select all mo:MusicArtists with relation to Austria.
I imported LinkedBrainz dump files and imported them into a triple store, together with DBpedia dumps.
With two CONSTRUCT queries, I was able to collect the required data and transform it into SKOS, into a PoolParty compatible format:Construct Artists
Every matching MusicArtist results in a SKOS concept. The foaf:name is mapped to skos:prefLabel (in German).
As you can see, I used Custom Schema features to provide self-describing metadata on top of pure SKOS features: a MusicBrainz link, a MusicBrainz Id, DBpedia link, homepage…
In addition you can see in the query that also data from DBpedia was collected. In case a owl:sameAs relationship to DBpedia exists, a possible abstract is retrieved. When a DBpedia abstract is available it is mapped to skos:definition.Construct Releases (mo:SignalGroups) with relations to Artists
Similar to the Artists, a matching SignalGroup results in a SKOS Concept. A skos:related relationship is defined between an Artist and his Releases.Outcome
The SPARQL construct queries provided ttl files that could by imported directly into PoolParty, resulting in a project, containing nearly 1,000 Artists and 10,000 Releases:
For this example we analyze 4018 Reviews of Consumers who bought Omega-3 Supplements. Keep in mind that in most cases each Product Review has an associated Rating (usually given as 1-5 stars) which signifies the overall satisfaction of each Consumer . Therefore, after data collection of the Reviews and Ratings we have a file with the following entries per row :
[Text of Review,Rating]
The fact that a Customer gives also a Score can be especially helpful because we can identify the words and Phrases that differentiate Positive experiences (ie those having 5 Star Ratings) from the Negative Ones (We assume that any Review having a Rating of 4 stars or less is Negative). So for example, Positive Reviews may contain mostly words and phrases such as "Great", "Happy" and "Will buy again" whereas Negative Reviews may contain words and phrases such as "Never buying again","not happy" or "damaged".
The tools used for this example are NLTK and Python. The code simply reads the reviews and associated text and creates a Matrix with the same representation as the file it read.
Next, we want to identify which Insights we can extract from this representation. For example :
-Identify which words commonly occur in 5-star reviews
-Identify which words commonly occur in Reviews with a rating of 4 Stars or Lower.
-Identify potentially Interesting Phrases and Words
-Extract term Co-Occurrences
We start with terms occurring more frequently in Negative Reviews for Omega-3 Supplements. Here is what we've found :
So it appears that people tend to give negative Reviews when the Taste (and possibly After-Taste) is not quite right. A lot of people complain about a Fishy odor. Notice also that the 3rd Term is sure which we can assume that it originates from customers saying that they are not sure if the Product works or not (Notice also that the 4th term is yet). Some more terms to consider :
krill (a type of Oil which is alternative Product to Omega-3 Supplementation)
Now let's look at the Terms associated with Positive Reviews :
great and excellent are terms that were expected to be found in Positive Reviews. Some terms to consider are :
We move on to identifying potentially interesting terms and Phrases. Here is a Screenshot from the Software that i used :
I added a Red Rectangle wherever sensitive information (such as Company Names) appears which for the purpose of this post is not relevant (but it certainly is relevant in a different setting).
We immediately see some interesting mentions, for example : Heavy Metal poisoning, Upset Stomach incidences, Cognitive Function , Joint Pains, Panic Attacks, Reasonably Priced Items, Postpartum Depression, Allergic Reactions, Speedy Delivery and Soft Gels that Stick together.
Recall that in a previous example we found that the term however is a term that occurs frequently within Negative Reviews. Some analysts may have chosen to treat this term as a stopword which in this case would be a serious mistake. The reason for this is that the term however shows us very often the reason for which a product or service is not receiving a perfect rating and vice-versa. Therefore, If a Data Scientist would have chosen to exclude this term from the Analysis (stopwords are typically removed from the text), potentially interesting insights would have never surfaced.
Ideally, we would like to know what is the context that occurs after the term however whenever this term occurs withing a negative review. That will help us to focus on all occurrences of however with negative sentiment. To do this, we only take into account all reviews containing the term however and having a Rating of 3 stars or less. It appears that the most common terms occurring after the term however was Fishy odor and After-taste. In other words, fishy odor is the cause that keeps Customers from giving a 5-star Rating.
On the other hand, phrases such as highly recommend are interesting because we may use co-occurrence analysis to see which terms co-occur with a highly recommended product.
Of course this is -by no means- the end on what we can do. To extract even better insights we have to spend significantly more time to do proper Pre-processing, use Information Extraction and use several other techniques to analyze Text Data in novel and potentially interesting ways.
SEMANTiCS2015: Calls for Research & Innovation Papers, Industry Presentations and Poster/Demos are now open!
The SEMANTiCS2015 conference comes back this year in its 11th edition where it all started in 2005 to Vienna, Austria!
The conference takes place from 15-17 September 2015 (the main conference will be on 16-17th of September and several back 2 back workshops & events on 15th) at the University of Economics – see all information: http://semantics.cc/.
We are happy to announce the SEMANTiCS Open Calls as follows. All infos on the Calls can also be found on the SEMANTiCS2015 website here: http://semantics.cc/open-calls
Call for Research & Innovation Papers
The Research & Innovation track at SEMANTiCS welcomes the submission of papers on novel scientific research and/or innovations relevant to the topics of the conference. Submissions must be original and must not have been submitted for publication elsewhere. Papers should follow the ACM ICPS guidelines for formatting (http://www.acm.org/sigs/publications/proceedings-templates) and must not exceed 8 pages in lenght for full papers and 4 pages for short papers, including references and optional appendices.
Abstract Submission Deadline: May 22, 2015
Paper Submission Deadline: May 29, 2015
Notification of Acceptance: July 10, 2015
Camera-Ready Paper: July 24, 2015
Call for Industry & Use Case Presentations
To address the needs and interests of industry SEMANTICS presents enterprise solutions that deal with semantic processing of data and/or information in areas like like Linked Data, Data Publishing, Semantic Search, Recommendation Services, Sentiment Detection, Search Engine Add-Ons, Thesaurus and/or Ontology Management, Text Mining, Data Mining and any related fields. All submissions have a strong focus on real world applications beyond the prototypical status and demonstrate the power of semantic systems!
Submission Deadline: July 1, 2015
Notification of Acceptance: July 20, 2015
Presentation Ready: August 15, 2015
Call for Posters and Demos
The Posters & Demonstrations Track invites innovative work in progress, late-breaking research and innovation results, and smaller contributions (including pieces of code) in all fields related to the broadly understood Semantic Web. The informal setting of the Posters & Demonstrations Track encourages participants to present innovations to business users and find new partners or clients. In addition to the business stream, SEMANTiCS 2015 welcomes developer-oriented posters and demos to the new technical stream.
Submission Deadline: June 17, 2015
Notification of Acceptance: July 10, 2015
Camera-Ready Paper: August 01, 2015
We are looking forward to receive your submissions for SEMANTiCS2015 and see you in Vienna in autumn!
Data to Value & Semantic Web Company agree partnership to bring cutting edge Semantic Management to Financial Services clients
The partnership aims to change the way organisations, particularly within Financial Services, manage the semantics embedded in their data landscapes. This will offer several core benefits to existing and prospective clients including locating, contextualising and understanding the meaning and content of Information faster and at a considerably lower cost. The partnership will achieve this through combining the latest Information Management and Semantic techniques including:
- Text Mining, Tagging, Entity Definition & Extraction.
- Business Glossary, Data Dictionary & Data Governance techniques.
- Taxonomy, Data Model and Ontology development.
- Linked Data & Semantic Web analyses.
- Data Profiling, Mining & Discovery.
This includes improving regulatory compliance in areas such as BCBS, enabling new investment research and client reporting techniques as well as general efficiency drivers such as faster integration of mergers and acquisitions. As part of the partnership, Data to Value Ltd. will offer solution services and training in PoolParty product offerings, including ontology development and data modeling services.
Nigel Higgs, Managing Director of Data to Value notes; “this is an exciting collaboration between two firms which are pushing the boundaries in the way Data, Information and Semantics are managed by business stakeholders. We spend a great deal of time helping organisations at a grass roots level pragmatically adopt the latest Information Management techniques. We see this partnership as an excellent way for us to help organisations take realistic steps to adopting the latest semantic techniques.”
Andreas Blumauer, CEO of Semantic Web Company adds, “The consortium of our two companies offers a unique bundle, which consists of a world-class semantic platform and a team of experts who know exactly how Semantics can help to increase the efficiency and reliability of knowledge intensive business processes in the financial industry.”
Corinna Cortes and Neil Lawrence ran the NIPS experiment where 1/10th of papers submitted to NIPS went through the NIPS review process twice, and then the accept/reject decision was compared. This was a great experiment, so kudos to NIPS for being willing to do it and to Corinna & Neil for doing it.
The 26% disagreement rate presented at the conference understates the meaning in my opinion, given the 22% acceptance rate. The immediate implication is that between 1/2 and 2/3 of papers accepted at NIPS would have been rejected if reviewed a second time. For analysis details and discussion about that, see here.
Let’s give P(reject in 2nd review | accept 1st review) a name: arbitrariness. For NIPS 2014, arbitrariness was ~60%. Given such a stark number, the primary question is “what does it mean?”
Does it mean there is no signal in the accept/reject decision? Clearly not—a purely random decision would have arbitrariness of ~78%. It is however quite notable that 60% is much closer to 78% than 0%.
Does it mean that the NIPS accept/reject decision is unfair? Not necessarily. If a pure random number generator made the accept/reject decision, it would be ‘fair’ in the same sense that a lottery is fair, and have an arbitrariness of ~78%.
Does it mean that the NIPS accept/reject decision could be unfair? The numbers give no judgement here. It is however a natural fallacy to imagine that random judgements derived from people implies unfairness, so I would encourage people to withhold judgement on this question for now.
Is an arbitrariness of 0% the goal? Achieving 0% arbitrariness is easy: just choose all papers with an md5sum that ends in 00 (in binary). Clearly, there is something more to be desired from a reviewing process.
Perhaps this means we should decrease the acceptance rate? Maybe, but this makes sense only if you believe that arbitrariness is good, as it will almost surely increase the arbitrariness. In the extreme case where only one paper is accepted, the odds of it being the rejected on re-review are near 100%.
Perhaps this means we should increase the acceptance rate? If all papers submmitted were accepted, the arbitrariness would be 0, but as mentioned above arbitrariness 0 is not the goal.
Perhaps this means that NIPS is a very broad conference with substantial disagreement by reviewers (and attendees) about what is important? Maybe. This even seems plausible to me, given anecdotal personal experience. Perhaps small highly-focused conferences have a smaller arbitrariness?
Perhaps this means that researchers submit themselves to an arbitrary process for historical reasons? The arbitrariness is clear, but the reason less so. A mostly-arbitrary review process may be helpful in the sense that it gives authors a painful-but-useful opportunity to debug the easy ways to misinterpret their work. It may also be helpful in that it perfectly rejects the bottom 20% of papers which are actively wrong, and hence harmful to the process of developing knowledge. None of these reasons are confirmed of course.
Is it possible to do better? I believe the answer is “yes”, but it should be understood as a fundamentally difficult problem. Every program chair who cares tries to tweak the reviewing process to be better, and there have been many smart program chairs that tried hard. Why isn’t it better? There are strong nonvisible constraints on the reviewers time and attention.
What does it mean? In the end, I think it means two things of real importance.
- The result of the process is mostly arbitrary. As an author, I found rejects of good papers very hard to swallow, especially when the reviews were nonsensical. Learning to accept that the process has a strong element of arbitrariness helped me deal with that. Now there is proof, so new authors need not be so discouraged.
- CMT now has a tool for measuring arbitrariness that can be widely used by other conferences. Joelle and I changed ICML 2012 in various ways. Many of these appeared beneficial and some stuck, but others did not. In the long run, it’s the things which stick that matter. Being able to measure the review process in a more powerful way might be beneficial in getting good review practices to stick.
Edit: Cross-posted on CACM.