Skip navigation.
Semantic Software Lab
Concordia University
Montréal, Canada


Reinforcement Learning Platforms

Machine Learning Blog - Mon, 2018-04-16 16:17

If you are interested in building an industrial Reinforcement Learning platform, we are hiring a data scientist and multiple developers as a followup to last year’s hiring. Please apply if interested as this is a real chance to be a part of building the future

Categories: Blogroll

Discerning Truth in the Age of Ubiquitous Disinformation (5): Impact of Russia-linked Misinformation vs Impact of False Claims Made By Politicians During the Referendum Campaign

Discerning Truth in the Age of Ubiquitous Disinformation (5)Impact of Russia-linked Misinformation vs Impact of False Claims Made By Politicians During the Referendum Campaign
Kalina Bontcheva (@kbontcheva)

My previous post focuses mainly on the impact of misinformation from Russian Twitter accounts.  However it is important to also acknowledge the impact of false claims made by politicians which were shared and distributed through social media.

A House of Commons Treasury Committee Report published on May 2016, states that: “The public debate is being poorly served by inconsistent, unqualified and, in some cases, misleading claims and counter-claims. Members of both the ‘leave’ and ‘remain’ camps are making such claims. Another aim of this report is to assess the accuracy of some of these claims..”

In our research, we analysed the number of Twitter posts around some of the these disputed claims, firstly to understand their resonance with voters, and secondly, to compare this to the volume of Russia-related tweets discussed above.

A study  of the news coverage of the EU Referendum campaign established that the economy was the most covered issue, and in particular, the Remain claim that Brexit would cost households £4,300 per year by 2030 and the Leave campaign’s claim that the EU cost the UK £350 million each week. Therefore, we focused on  these two key claims and analysed tweets about them.

With respect to the disputed £4,300 claim (made by the Chancellor of the Exchequer), we  identified 2,404 posts in our dataset (tweets, retweets, replies), referring to this claim.

For the £350 million a week disputed claim - there are 32,755 pre-referendum posts (tweets, retweets, replies) in our dataset. This is 4.6 times the 7,103 posts related to Russia Today and Sputnik and 10.2 times more than the 3,200 tweets by the Russia-linked accounts suspended by Twitter.

In particular, there are more than 1,500 tweets from different voters, with one of these wordings:

I am with @Vote_leave because we should stop sending £350 million per week to Brussels, and spend our money on our NHS instead.

I just voted to leave the EU by postal vote! Stop sending our tax money to Europe, spend it on the NHS instead! #VoteLeave #EUreferendum

Many of those tweets have themselves received over a hundred likes and retweets each.

This false claim is being regarded by media as one of the key ones behind the success of VoteLeave.

So returning to Q27 on likely impact of misinformation on voting behaviour - it was not possible for us to quantify this from such tweets alone. A potentially useful indicator comes from an Ipsos Mori poll published on 22 Jun 2016, which  showed that for 9% of respondents the NHS was the most important issue in the campaign.

In conclusion, while it is important to quantify the potential impact of Russian misinformation, we should also consider the much wider range of misinformation that was posted on Twitter and Facebook during the referendum and its likely overall impact.

We should also study not just fake news sites and the social platforms that were used to disseminate misinformation, but also the role and impact of Facebook-based algorithms for micro-targeting adverts, that have been developed by private third parties.

A related question, is studying the role played by hyperpartisan and mainstream media sites during the referendum campaign. This is the subject of our latest study, with key findings available here
High Automation Accounts in Our Brexit Tweet Dataset

While it is hard to quantify all different kinds of fake accounts, we know already that a study by City University identified 13,493 suspected bot accounts, amongst which Twitter found only 1% as being linked to Russia. In our referendum tweet dataset there are tweets by 1,808,031 users in total, which makes the City bot accounts only 0.74% of the total.

If we consider in particular, Twitter accounts that have posted more than 50 times a day (considered high automation accounts by researchers), then there are only 457 such users in the month leading up to the referendum on 3 June 2016.

The most prolific were "ivoteleave" and "ivotestay", both suspended, which were similar in usage pattern. There were also a lot of accounts that did not really seem to post much about Brexit but were using the hashtags in order to get attention for commercial reasons.

We also analysed the leaning of these 457 high automation accounts an identified 361 as pro-leave (with 1,048,919 tweets), 39 pro-remain (156,331 tweets), and the remaining 57 as undecided.

I covered how we can address the “fake news” problem in me previous blog post (link) but in summary we need to promote fact checking efforts, and fund open-source research on automatic methods for disinformation detection.

Disclaimer: All views are my own.

Categories: Blogroll

Discerning Truth in the Age of Ubiquitous Disinformation (4): Russian Involvement in the Referendum and the Impact of Social Media Misinformation on Voting Behaviour

Discerning Truth in the Age of Ubiquitous Disinformation (4)Russian Involvement in the Referendum and the Impact of Social Media Misinformation on Voting Behaviour
Kalina Bontcheva (@kbontcheva)

In my previous blog posts I wrote about the 4Ps of the modern disinformation age: post-truth politics, online propaganda, polarised crowds,  and partisan media; and how we can combat online disinformation

The news is currently full of reports of Russian involvement in the referendum and the potential impact of social media misinformation on voting behaviour

A small scale experiment by the Guardian exposed 10 US voters (five on each side) to  alternative Facebook news feeds. Only one participant changed his mind as to how they would vote. Some found their confirmation bias too hard to overcome, while others became acutely aware of being the target of abuse, racism, and misogyny.  A few started empathising with voters holding opposing views. They also gained awareness of the fact that opposing views abound on Facebook, but the platform is filtering them out. 

Russian Involvement in the Referendum
We analysed the accounts that were identified by Twitter as being associated with Russia in front of the US Congress in the fall of 2017, and we also took the other 45 ones that we found with BuzzFeed. We looked at tweets posted by these accounts one month before the referendum, and we did not find an awful lot of activity when compared to the overall number of tweets on the referendum, i.e. both the Russia-linked ads and Twitter accounts did not have major influence. 

There were 3,200 tweets in our data sets coming from those accounts, and 800 of those—about 26%—came from the new 45 accounts that we identified. However, one important aspect that has to be mentioned is that those 45 new accounts were tweeting in German, so even though they are there, the likely impact of those 800 tweets on the British voter is, I would say, not very likely to have been significant.

The accounts that tweeted on 23 Jun were quite different from those that tweeted before or after, with virtually all tweets posted in German. Their behaviour is also very different - with mostly retweets on referendum day by a tight network of anti-Merkel accounts, often within seconds of each other. The findings are in line with those of Prof. Cram from the University of Edinburgh, as reported in the Guardian

Journalists from BuzzFeed UK and our Sheffield  team  used the re-tweet  network to identify another 45 suspicious accounts, subsequently suspended by Twitter. Amongst the 3,200 total tweets, 830 came from the 45 newly identified accounts (26%).  Similar to those identified by Twitter, the newly discovered accounts were largely ineffective in skewing public debate. They attracted very few likes and retweets – the most successful message in the sample got just 15 retweets.

An important distinction that needs to be made is between Russia-influenced accounts that used advertising on one hand, and the Russia-related bots found by Twitter and other researchers on the other. 

The Twitter sockpuppet/bot accounts generally pretended to be authentic people (mostly American, some German) and would not resort to advertising, but instead try to go viral or gain prominence through interactions. An example of one such successful account/cyborg is Jenn_Abrams. Here are some details on how the account duped mainstream media: 

“and illustrates how Russian talking points can seep into American mainstream media without even a single dollar spent on advertising.” 

A related question is the influence of Russia-sponsored media and its Twitter posts. Here we consider the Russia Today promoted tweets - the 3 pre-referendum ones attracted just 53 likes and 52 retweets between them.

We analysed all tweets posted one month before 23 June 2016, which are either authored by Russia Today or Sputnik, or are retweets of these. This gives an indication of how much activity and engagement there was around these accounts. To put these numbers in context, we also included the equivalent statistics for the two main pro-leave and pro-remain Twitter accounts:

Account Original tweets Retweeted by others Retweets by this account Replies by account Total tweets @RT_com -  General Russia Today 39 2,080 times 62 0 2,181 @RTUKnews 78 2,547 times 28 1 2,654 @SputnikInt 148 1,810 times 3 2 1,963 @SputnikNewsUK 87 206 times 8 4 305 TOTAL 352 6,643 101 7 7,103

@Vote_leave 2,313 231,243 1,399 11 234,966 @StrongerIn 2,462 132,201 910 7 135,580

We also analysed which accounts retweeted RT_com and RTUKnews the most in our dataset. The top one with 75 retweets of Russia Today tweets was a self-declared US-based account that retweets Alex Jones from infowars, RT_com, China Xynhua News, Al Jazeera, and an Iranian news account. This account (still live) joined in Feb 2009 and as of 15 December 2017 has 1.09 million tweets - this means an average of more than 300 tweets per day, indicating it is a highly automated account. It has more than 4k followers, but follows only 33 accounts. Two of the next most active retweeters are a deleted and a suspended account, as well as two accounts that both stopped tweeting on 18 Sep 2016. 

For the two Sputnik accounts, the top retweeter made 65 retweets. It declares itself as Ireland based; has 63.7k tweets and 19.6k likes; many self-authored tweets; last active on 2 May 2017; account created on May 2015; avg 87 tweets a day (which possibly indicates an automated account);. It also retweeted Russia Today 15 times. The next two Sputnik retweeters (61 and 59 retweets respectively) are accounts with high average post-per-day rate (350 and 1,000 respectively) and over 11k and 2k followers respectively. Lastly, four of the top 10 accounts have been suspended or deleted. 

Disclaimer: All views are my own.
Categories: Blogroll

Discerning Truth in the Age of Ubiquitous Disinformation (3): The Role of News Media

Discerning Truth in the Age of Ubiquitous Disinformation (3)The Role of News Media
Kalina Bontcheva (@kbontcheva)

Post coming soon

Categories: Blogroll

<h2 dir="ltr" id="docs-internal-guid

Discerning Truth in the Age of Ubiquitous Disinformation (2)How Can We Combat Online Disinformation

Kalina Bontcheva (@kbontcheva)

In my previous blog post I wrote about the 4Ps of the modern disinformation age: post-truth politics, online propaganda, polarised crowds,  and partisan media. 
Now, let me reflect some more on the question of what can we do about it. Please note that this is not an exhaustive list!Promote Collaborative Fact Checking Efforts In order to counter subjectivity, post-truth politics, disinformation, and propaganda, many media and non-partisan institutions worldwide have started fact checking initiatives – 114 in total, according to Poynter. These mostly focus on exposing disinformation in political discourse, but generally aim at encouraging people to pursue accuracy and veracity of information (e.g. Politifact,, Snopes). A study by the American Press Institute has shown that even politically literate consumers benefit from fact-checking as they increase their knowledge of the subject.Professional fact checking is a time-consuming process that cannot cover a significant proportion of the claims being propagated via social media channels. To date, most projects have been limited to one or two steps of the fact checking process, or are specialized on certain subject domains: Claimbuster, ContentCheck and the ongoing Fake News Challenge are a few examples. There are two ways to lower the overheads and I believe both are worth pursuing: 1) create a coordinated fact-checking initiative that promotes collaboration between different media organisations, journalists, and NGOs; 2) fund the creation of automation tools for analysing disinformation, to help the human effort.  I discuss the latter in more detail next.
Fund open-source research on automatic methods for disinformation detection In the PHEME research project we focused specifically on studying rumours associated with different types of events—some were events like shootings and others were rumours and hoax stories like “Prince is going to have a concert in Toronto”—and how those stories were disseminated via Twitter or Reddit. We looked at how reliably we can identify such rumours: one of the hardest tasks is how to group all the different social media posts like tweets or Reddit posts around the same rumour together. In Reddit it is a bit easier thanks to threads. Twitter is harder because often there are multiple originating tweets that refer to the same rumour.
That is the real challenge: to piece together all these stories, because the ability to identify whether something is correct or not depends a lot on evidence and also on the discussions around that rumour, that the public are carrying out on social media platforms. By seeing one or two tweets, sometimes even journalists cannot be certain whether a rumour is true or false, but as we see the discussion around the rumours and the accumulating evidence over time, the judgment becomes more reliable.
Consequently,  it becomes easier to predict the veracity of a rumour, but the main challenge is identifying reliably all these different tweets that are talking about the same rumour. If sufficient evidence can be provided across different tweet posts, it becomes possible to determine the veracity of that rumour with around 85% accuracy.
In the wider context, there is emerging technology for veracity checking and verification of social media content (going beyond images/video forensics). These include tools developed in several European projects (e.g. PHEME, REVEAL, and InVID), tools assisting crowdsourced verification (e.g. CheckDesk,, citizen journalism (e.g. Citizen Desk), and repositories of checked facts/rumours (e.g. Emergent, FactCheck). However, many of those tools are language specific and would thus need adaptation and enhancement to new languages. Besides, further improvements are needed to the algorithms themselves, in order to achieve accuracy comparable to that of email spam filter technology. 
It is also important to invest in establishing ethical protocols and research methodologies, since social media content raises a number of privacy, ethical, and legal challenges. 
Dangers and pitfalls of relying purely on automated tools for disinformation detection Many researchers (myself included) are working on automated methods based on machine learning algorithms, in order to identify automatically disinformation on social media platforms. Given the extremely large volume of  social media posts, key questions are can disinformation be identified in real time and should such methods be adopted by the social media platforms themselves?The very short answer is: Yes, in principle, but we are still far from solving many key socio-technical issues, so, when it comes to containing the spread of disinformation, we should be mindful of the problems which such technology could introduce: l      Non-trivial scalability: While some of our algorithms work in near real time on specific datasets such as tweets about the Brexit referendum - applying them across all posts on all topics as Twitter would need to do, for example, is very far from trivial. Just to give a sense of the scale here - prior to 23 June 2016 (referendum day) we had to process fewer than 50 Brexit-related tweets per second, which was doable. Twitter, however, would need to process more than 6,000 tweets per second, which is a serious software engineering, computational, and algorithmic challenge.l      Algorithms make mistakes, so while 90 per cent accuracy intuitively sounds very promising, we must not forget the errors - 10 per cent in this case, or double that at 80 per cent algorithm accuracy. On 6,000 tweets per second this 10 per cent amounts to 600 wrongly labeled tweets per second rising to 1,200 for the lower accuracy algorithm. To make matters worse, automatic disinformation analysis often combines more than one algorithm - first to determine which story a post refers to and second - whether this is likely true, false, or uncertain. Unfortunately, when algorithms are executed in a sequence, errors have a cumulative effect.l      These mistakes can be very costly: broadly speaking algorithms make two kinds of errors - false negatives in which disinformation is wrongly labelled as true or bot accounts wrongly identified as human and false positives, correct information is wrongly labelled as disinformation or genuine users being wrongly identified as bots. False negatives are a problem on social platforms, because the high volume and velocity of social posts (e.g. 6,000 tweets per second on average) still leaves  with a lot of disinformation “in the wild”. If we draw an analogy with email spam - even though most of it is filtered out automatically, we are still receiving a significant proportion of spam messages. False positives, on the other hand, pose an even more significant problem, as falsely removing genuine messages is effectively censorship through artificial intelligence. Facebook, for example, has a growing problem with some users having their accounts wrongly suspended.Therefore, I strongly believe that the best way forward is to implement human-in-the-loop solutions, where people are assisted by machine learning and AI methods, but not replaced entirely, as accuracy is still not high enough, but primarily, for the censorship danger.
Establishing Cooperation and Data Exchange between Social Platforms and ScientistsOur latest work on analysing misinformation in tweets about the UK referendum [1] [2]  showed yet again a very important issue - when it comes to social media and furthering our ability to understand its misuse and impact on society and democracy, the only way forward is for data scientists, political and social scientists and journalists to work together alongside the big social media platforms and policy makers. I believe data scientists and journalists need to be given open access to the full set of public social media posts on key political events for research purposes (without compromising privacy and data protection laws), and be able to work in collaboration with the platforms through grants and shared funding (such as the Google Digital News Initiative).
There are still many outstanding questions that need to be researched - most notably the dynamics of the interaction between all these Twitter accounts over time - for which we need the complete archive of public tweets, images, and URL content shared, as well as profile data and friend/follower networks. This would help us quantify better (amongst other things) what kinds of tweets and messages resulted in misinformation spreading accounts gaining followers and re-tweets, how human-like was the behaviour of the successful ones, and also were they connected to the alternative media ecosystem and how.
The intersection of automated accounts, political propaganda, and misinformation is a key area in need of further investigation, but for which, scientists often lack the much needed data, while the data keepers lack the necessary transparency, motivation to investigate these issues, and willingness to create open and unbiased algorithms.
Policy Decisions around Preserving Important Social Media Content for Future StudiesGovernments and policy makers are in a position to help establish this much needed cooperation between social platforms and scientists, promote the definition of policies for ethical, privacy-preserving research and data analytics over social media data, and also ensure the archiving and preservation of social media content of key historical value. For instance, given the ongoing debate on the scale and influence of Russian propaganda on election and referenda outcomes, it would have been invaluable to have Twitter archives made available to researchers under strict access and code of practice criteria, so it becomes possible to study these questions in more depth. Unfortunately, this is not currently possible, with Twitter having suspended all Russia-linked accounts and bots, as well as all their content and social network information. Similar issues arise when trying to study online abuse of and from politicians, as posts and accounts are again suspended or deleted at a very high rate. Related to this is the challenge of open and repeatable science on social media data, as again many of the posts in current datasets available for training and evaluating machine learning algorithms, have been deleted or are not available. This causes a problem as algorithms do not have sufficient data to improve as a result and neither can scientists determine easily whether a new method is really outperforming the state-of-the-art. Promoting Media Literacy and Critical Thinking for Citizens
According to the Media Literacy project: “Media literacy is the ability to access, analyze, evaluate, and create media. Media literate youth and adults are better able to understand the complex messages we receive from television, radio, Internet, newspapers, magazines, books, billboards, video games, music, and all other forms of media.”
Training citizens in the ability to recognise spin, bias, and mis- and disinformation are key elements. Due to the extensive online and social media exposure of children, there are also initiatives aimed specifically at school children, starting from as young as 11 years old. There are also online educational resources on media literacy and fake news [3], [4] that could act as a useful starting point of national media literacy initiatives.
Increasingly, media literacy and critical thinking are seen as key tools in fighting the effects of online disinformation and propaganda techniques [5], [6]. Many of the existing programmes today are delivered by NGOs in a face-to-face group setting. The next challenge is how to roll these out at scale and also online, in order to reach wide audience across all social and age groups.
Establish/revise and enforce national code of practice for politicians and media outlets
Disinformation and biased content reporting are not just the preserve of fake news and state-driven propaganda sites  and social accounts. A significant amount also comes from partisan media and factually incorrect statements by prominent politicians.
In the case of the UK EU membership referendum, for example, a false claim regarding immigrants from Turkey was made on the front pages of a major UK newspaper [7], [8]. Another widely known and influential example was VoteLeave’s false claim that the EU costs £350 million a week [9]. Even though the UK Office of National Statistics disputed the accuracy of this claim on 21 April 2016 (2 months prior to the referendum), it continued to be used throughout the campaign.
Therefore, an effective way to combat deliberate online falsehoods must address such cases as well. Governments and policy makers could help again through establishing new or updating existing codes of practice of political parties and press standards, as well as ensuring that they are adhered to. 
These need to be supplemented with transparency in political advertising on social platforms, in order to eliminate  or significantly reduce promotion of misinformation through advertising. These measures would also help with reducing the impact of all other kinds of disinformation already discussed above.
Disclaimer: All views are my own.
Categories: Blogroll

ICML Board and Reviewer profiles

Machine Learning Blog - Mon, 2018-03-05 17:34

The outcome of the election for the IMLS (which runs ICML) adds Emma Brunskill, Kamalika Chaudhuri, and Hugo Larochelle to the board. The current members of the board (and the reason for board membership) are:

President Elect is a 2-year position with little responsibility, but I decided to look into two things. One is the website which seems relatively difficult to navigate. Ideas for how to improvement are welcome.

The other is creating a longitudinal reviewer profile. I keenly remember the day after reviews were due when I was program chair (in 2012) which left a panic-inducing number of unfinished reviews. To help with this, I’m planning to create a profile of reviewers which program chairs can refer to in making decisions about who to ask to review. There are a number of ways to do this wrong which I’m avoiding with the following procedure:

  1. After reviews are assigned, capture the reviewer/paper assignment. Call this set A.
  2. After reviews are due, capture the completed & incomplete reviews for papers. Call these sets B & C respectively.
  3. Strip the paper ids from B (completed reviews) turning it into a multiset D of reviewers completed reviews.
  4. Compute C-A (as a set difference) then turn it into a multiset E of reviewers incomplete reviews.
  5. Store D & E for long term reference.

This approach:

  • Is objectively defined. Approaches based on subjective measurements seem both fraught with judgment issues and inconsistent. Consider for example the impressive variation we all see in review quality.
  • Does not record a review as late for reviewers who are assigned a paper late in the process via step (1) and (4). We want to encourage reviewers to take on the unusual but important late tasks that arrive.
  • Does not record a review as late for reviewers who discover they are inappropriate after assignment and ask for reassignment. We want to encourage reviewers to look at their papers early and, if necessary, ask for a paper to be reassigned early.
  • Preserves anonymity of paper/reviewer assignments for authors who later become program chairs. The conversion into a multiset removes the paper id entirely.

Overall, my hope is that several years of this will provide a good and useful tool enabling program chairs and good (or at least not-bad) reviewers to recognize each other.

Categories: Blogroll

Students use GATE and Twitter to drive Lego robots

At the university's Headstart Summer School in July 2017, 42 students (age 16 and 17) from all over the UK were taught to write Java programs to control Lego robots, using input from the robots (such as the sensor for detecting coloured marks on the floor) as well as operating the motors to move and turn.  (The university provided a custom Java library for this.)

On 11 and 12 July we ran a practical session on "Controlling Robots with Tweets".  We presented a quick introduction to natural language processing (using computer programs to analyse human languages such as English) and provided them with a bundle of software containing a version of the GATE Cloud Twitter Collector modified to run a special GATE application with a custom plugin to let it use the Java robot library.

The bundle came with a simple "gazetteer" containing two lists of classified keywords:

and a basic JAPE grammar to make use of it.  JAPE is a specialized language used in GATE to match regular expressions over annotations in documents. (The annotations are similar to XML tags, except that GATE applications can create them as well as read them and they can overlap each other without restrictions.  Technically they form an annotation graph.)

The grammar we provided would match any keyword from the "turn" list followed by any keyword from the "left" list (with zero or more unmatched words in between, e.g., "turn to port", "take a left", "turn left") and then run the code to turn the robot's right motor (making it turn left in place).

We showed them how to configure the Twitter Collector, authenticate with their Twitter accounts, follow themselves, and then run the collector with this application.  Getting the system set up and working was a bit laborious, but once the first group got their robot to move in response to a tweet and cheered, everyone got a lot more interested very quickly.  They were very interested in extending the word lists and JAPE rules to cover a wider range of tweeted commands.

Some of the students had also developed interesting and complicated manoeuvres in Java the previous day, which they wanted to incorporate into the Twitter-controlled system.  We helped these students add their code to their own copies of the GATE plugin and re-load it so the JAPE rules could call their procedures.

This project was fun and interesting for the staff as well as the students, and we will include it in Headstart 2018.

The Headstart 2017 video includes these activities.  The instructions (presentation and handout) and software are available on-line.

This work is supported by the European Union's Horizon 2020 project SoBigData (grant agreement no. 654024).

Categories: Blogroll

Pervasive Simulator Misuse with Reinforcement Learning

Machine Learning Blog - Wed, 2018-02-14 17:25

The surge of interest in reinforcement learning is great fun, but I often see confused choices in applying RL algorithms to solve problems. There are two purposes for which you might use a world simulator in reinforcement learning:

  1. Reinforcement Learning Research: You might be interested in creating reinforcement learning algorithms for the real world and use the simulator as a cheap alternative to actual real-world application.
  2. Problem Solving: You want to find a good policy solving a problem for which you have a good simulator.

In the first instance I have no problem, but in the second instance, I’m seeing many head-scratcher choices.

A reinforcement learning algorithm engaging in policy improvement from a continuous stream of experience needs to solve an opportunity-cost problem. (The RL lingo for opportunity-cost is “advantage”.) Thinking about this in the context of a 2-person game, at a given state, with your existing rollout policy, is taking the first action leading to a win 1/2 the time good or bad? It could be good since the player is well behind and every other action is worse. Or it could be bad since the player is well ahead and every other action is better. Understanding one action’s long term value relative to another’s is the essence of the opportunity cost trade-off at the core of many reinforcement learning algorithms.

If you have a choice between an algorithm that estimates the opportunity cost and one which observes the opportunity cost, which works better? Using observed opportunity-cost is an almost pure winner because it cuts out the effect of estimation error. In the real world you can’t observe the opportunity cost directly Groundhog day style. How many times have you left a conversation and thought to yourself: I wish I had said something else? A simulator is different though—you can reset a simulator. And when you do reset a simulator, you can directly observe the opportunity-cost of an action which can then directly drive learning updates.

If you are coming from viewpoint 1, using a “reset cheat” is unappealing since it doesn’t work in the real world and the goal is making algorithms which work in the real world. On the other hand, if you are operating from viewpoint 2, the “reset cheat” is a gigantic opportunity to dramatically improve learning algorithms. So, why are many people with goal 2 using goal 1 designed algorithms? I don’t know, but here are some hypotheses.

  1. Maybe people just aren’t aware that goal 2 style algorithms exist? They are out there. The most prominent examples of goal 2 style algorithms are from Learning to search and AlphaGo Zero.
  2. Maybe people are worried about the additional sample complexity of doing multiple rollouts from reset points? But these algorithm typically require little additional sample complexity in the worst case and can provide gigantic wins. People commonly use a discount factor d values future rewards t timesteps ahead with a discount of dt. Alternatively, you can terminate rollouts with probability 1 – d and value future rewards with no discount while preserving the expected value. Using this approach a rollout terminates after an expected 1/(1-d) timesteps bounding the cost of a reset and rollout. Since it is common to use very heavy discounting (e.g. d=0.9), the worst case additional sample complexity is only a small factor larger. On the upside, eliminating estimation error is can radically reduce sample complexity in theory and practice.
  3. Maybe the implementation overhead for a second family of algorithms is to difficult? But the choice of whether or not you use resets is far more important than “oh, we’ll just run things for 10x longer”. It can easily make or break the outcome.

Maybe there is some other reason? As I said above, this is head-scratcher that I find myself trying to address regularly.

Categories: Blogroll
Syndicate content