Semantic Software Lab
Concordia University
Montréal, Canada


GitHut: the Universe of Programming Languages across GitHub

Information aesthetics - Fri, 2014-09-12 11:37

GitHut [], developed by Microsoft data visualization designer Carlo Zapponi, is an interactive small multiples visualization revealing the complexity of the wide range of programming languages used across the repositories hosted on GitHub.

GitHub is a web-based repository service which offers the distributed revision control and source code management (SCM) functionality of Git, enjoying more than 3 million users.

Accordingly, by representing the distribution and frequency of programming languages, one can observe the continuous quest for better ways to solve problems, to facilitate collaboration between people and to reuse the effort of others.

Programming languages are ranked by various parameters, ranging from the number of active repositories to new pushes, forks or issues. The data can be filtered over discrete moments in time, while evolutions can be explored by a collection of timelines.

Pi Visualized as a Public Urban Art Mural

Information aesthetics - Wed, 2014-09-10 16:23

Visualize Pi [] is a mural project that aimed to use popular mathematics to connect Brooklyn students to the community with a visualization of Pi. It was funded by a successful KickStarter project as proposed by visual artist artist Ellie Balk, The Green School Students, staff and Assistant Principal Nathan Affield.

The mural seems to consist of different parts. A reflective line graph, reminiscent of a sound wave, represents the number Pi (3.14159...) by way of colors that are coded by the sequence of the prime numbers found in Pi (2,3,5,7), as well as height.

Additionally, a golden spiral was drawn based on the Fibonacci Sequence, as an exploration of the relationship between the golden ratio and Pi. The number Pi was represented in a color-coded graph within the golden spiral. In this, the numbers are seen as color blocks that vary in size proportionately within the shrinking space of the spiral, representing the 'shape' of Pi.

"By focusing on the single, transcendental concept of Pi across courses, the mathematics department plans to not only deepen student understanding of shape and irrational number, but more importantly, connect these foundational mental schema for students while dealing with the concrete issues of neighborhood beautification and how proportion can inform aesthetic which can in turn improve quality of life."

A few more similar urban / public visualization projects can be found at Balk's project page, e.g. showing weather patterns, emotion histograms or sound waves.

Via @mariuswatz .

The Key Players in the Middle East and their Relationships

Information aesthetics - Wed, 2014-09-10 15:48

Whom Likes Whom in the Middle-East? [] by David McCandless and UniversLab is a forced-network visualisation of key players & notable relationships in the Middle East.

Next to its expressive aesthetic, the interactive features allow users to highlight individual nodes and its direct connections to others, as well as filter between the kind of possible relationships, such as "hate", "strained", "good" or "love".

Reminds me a bit of Mapping the Relationships between the Artists who Invented Abstraction.

The problem with personalized education

Greg Linden's Blog - Mon, 2014-09-08 17:20
Personalized education has had some spectacular failures lately, in large part due to how tone-deaf the backers have been to the needs of teachers, parents, and students.

The right way to do personalization is to prove you're useful first. Personalization is just a tool. If a new tool doesn't work better than the old tool, it's useless. There's no reason to use personalized education unless it works better than unpersonalized education. A tool needs to be useful.

Teachers are already overworked and, after having been burned too many times on supposedly exciting new technologies that fail to help, correctly are cynical about tech startups coming in and demanding something of them. If some tech startup isn't helping a teacher get something done they need to get done, it's a bad tool and it's useless.

Parents are leery of companies who say they only want to help and what corporations are doing with the data they have on their children, correctly so given all the marketing abuses that have happened in the past.

Kids don't want more boring busywork to do -- they get enough of that already -- and don't see why anything this company is talking about helps them or is useful to them.

If a company wants to succeed in personalized education, it should:
  1. Be useful, noticeably raise test scores
  2. Not require additional busy work
  3. Be optional
  4. Have no marketing whatsoever, only use data to help
I think there are plenty of examples of how this might work. I would like to see a company offer a free Duolingo-like pre-algebra and algebra app that jumps students ahead rapidly as they answer questions correctly and spends more time on similar problems after a question is wrong. The app would be completely optional for students to use, but, when students use it, their test scores increase.

I would like to see a company use the existing standardized tests required by several states, analyze the incorrect answers to identify concepts a student is not understanding, and then print short worksheets targeting only those missed concepts for teachers to hand out to each student. The worksheets would be free and arrive in teachers' mailboxes. If the teacher doesn't want to hand them out, that's not a problem, but test scores go up for the classrooms where the teachers do hand them out. So, even if most teachers don't hand them out at first and most students throw them away at first, over time, more and more teachers will start handing them out and more and more students will do them, as only helps those who do.

In both of these examples, a startup could set up from the beginning to run large scale experiments, showing different problems to different students, and learning what raises test scores, what designs and lesson lengths cause students to stop, what concepts are important and which matter less, what can be taught easily through this and what cannot, what people enjoy, and what works.

When a company comes in and says, "Give us your data, teachers, parents, and kids, and do all this work. Maybe we'll boost your test scores for you later," they're being arrogant and tone-deaf. Everyone responds, "I don't believe you. How about you prove you're useful first? I'm busy. Do something for me or go away." And they're right to do so.

There likely is a way to do personalized education that everyone would embrace. But that way probably requires proving you're useful first. After all, personalization is just a tool.
SEMANTiCS – the emergence of a European Marketplace for the Semantic Web

Semantic Web Company - Mon, 2014-09-08 07:34

SEMANTiCS conference celebrated its 10th anniversary this September in Leipzig. And this year’s venue has been capable of opening a new age for the Semantic Web in Europe - a marketplace for the next generation of semantic technologies was born.

As Phil Archer stated in his key note, the Semantic Web is now mature, and academia and industry can be proud of the achievements so far. And exactly that fact gave the thread for the conference: Real world use cases demonstrated by industry representatives, new and already running applied projects presented by the leading consortia in the field and a vivid academia showing the next ideas and developments in the field. So this years SEMANTiCS conference brought together the European Community in Semantic Web Technology – both from academia and industry.

  • Papers and Presentations: 45 (50% of them industry talks)
  • Posters: 10 (out of 22)
  • A marketplace with 11 permanent booths
  • Presented Vocabularies at the 1st Vocabulary Carnival: 24
  • Attendance: 225
  • Geographic Coverage: 21 countries

This year’s SEMANTiCS was co-located and connected with a couple of other related events, like the German ISKO, the Multilingual Linked Open Data for Enterprises (MLODE 2014) and the 2nd DBpedia Community Meeting 2014. This wisely connected gatherings brought people together and allowed transdisciplinary exchange.

Recapitulatory speaking: This SEMANTiCS has opened up new sights on Semantic Technologies, when it comes to

  • industry use
  • problem solving capacity
  • next generation development
  • knowledge about top companies, institutes and people in the sector

Visits: Mapping the Places you Have Visited

Information aesthetics - Thu, 2014-09-04 09:02

Visits [] automatically visualizes personal location histories, trips and travels by aggregating geotagged one's Flickr collection with a Google Maps history. developed by Alice Thudt, Dominkus Baur and prof. Sheelagh Carpendale, the map runs locally in the browser, so no sensitive data is uploaded to external servers.

The timeline visualization goes beyond the classical pin representation, which tend to overlap and are relatively hard to read. Instead, the data is shown as 'map-timelines', a combination of maps with a timeline that convey location histories as sequences of maps: the bigger the map, the longer the stay. This way, the temporal sequence is clear, as the trip starts with the map on the left and continues towards the right.

A place slider allows the adjusting of the map granularity, reaching from street-level to country-level.

Read the academic research here [PDF], or watch a explanatory video below.

More quick links

Greg Linden's Blog - Wed, 2014-09-03 18:59
More of what caught my attention lately:
  • The overwhelming majority of smartphone users set up their phone once, then barely ever download a new app again ([1] [2])

  • Cool and successful use of speculative execution in cloud computing for games, trading off extra CPU and bandwidth for the ability to hide network latency ([1])

  • Infrared vision on your phone ([1] [2])

  • How easy is it to get people to memorize hard-to-crack random 56-bit passwords, equivalent to about 12 random letters or 6 words? ([1] [2])

  • Desalination needs warm water, data centers need to be cooled, why not put them together? Clever idea. ([1])

  • It's easy to overhype this, but it's still pretty cool, transmitting data (0 and 1 bits) directly brain-to-brain without implants (using magnetic stimulation of the brain and EEG reading of the brain, both from the surface of the scalp) with relatively low error rates (5-15%). Data rates are extremely low at 2-3 bits/minute, but it's still interesting that it's possible at all. ([1])

  • Xiaomi's remarkable iPhone clone ([1])

  • Has Amazon sold less than 35k Fire phones? ([1] [2])

  • Facebook publishes a paper which details how its ad targeting works and suggests they will be doing more personalization in the future ([1] [2])

  • "Having a multiyear project with no checks along the way and the promise of one big outcome is not a highly successful approach, in or outside government" ([1] [2])

  • More evidence patent trolls cause real harm. Trolled firms "dramatically reduce R&D spending". ([1])

  • "Using nothing more than a laptop ... [they could] alter the normal timing pattern of the [traffic] lights, turning all the lights along a given route green, for instance, or freezing an intersection with all reds" ([1])

  • Interesting data visualization showing how CD took over in music sales, then got replaced by downloads, all over the last two decades or so ([1])

  • Neat charts on how the strike zone expands on 3 ball counts and contracts on 2 strike counts ([1])

  • Cute SMBC comic on "What is the fastest animal?" ([1])

  • Great SMBC comic on job interviews ([1])
Culturegraphy: the Cultural Influences and References between Movies

Information aesthetics - Mon, 2014-09-01 14:52

Culturegraphy [], developed by "Information Model Maker" Kim Albrecht reveals represent complex relationships of over 100 years of movie references.

Movies are shown as unique nodes, while their influences are depicted as directed edges. The color gradients from blue to red that originate in the1980s denote the era of postmodern cinema, the era in which movies tend to adapt and combine references from other movies.

Although the visualizations look rather minimalistic at first sight, their interactive features are quite sophisticated and the resulting insights are naturally interesting. Therefore, do not miss out the explanatory movie below.

Via @albertocairo .

A World of Terror: the Impact of Terror in the World

Information aesthetics - Thu, 2014-08-28 13:11

A World of Terror [] by Periscopic shows the reach, frequency and impact of about 25 terrorism groups around the world.

The visualization exists of 25 smartly organized pixel plots that are displayed as ordered small multiples. Ranging from Al-Qa'ida and the Taliban to less known organizations like Boko Haram, the plots reveal which ones are more deadly, are more recently active, or have been historically more active. In addition, all data can be filtered over time.

The data is based on the Global Terrorism Database (GTD), the most comprehensive and open-source collection of terrorism data available.

Metrico: a Puzzle Action Game based on Infographics

Information aesthetics - Wed, 2014-08-27 06:57

Metrico [], designed by Dutch game design studio Digital Dreams, is a recently released video game for the Playstation Vita.

Described as an "atmospheric puzzle action game with a mindset of its own", it's visual style has been completely based on the world of infographics. In essence, the concept of infographics seem to work as a gameplay environment not just because of its pretty aesthetics, but also because of its natural interaction with (visual) data.

Consequently, in Metrico, each action is quantified and explicitly shown, such as the number of times an avatar needs to jump up and down or shoots a projectile. Metrico's goal is thus similar to most infographics: enticing users to make sense of a complex system.

Via Wired. Watch the gameplay trailer below.

The Many Factors Influencing Breast Cancer Incidence

Information aesthetics - Mon, 2014-08-25 12:47

A Model of Breast Cancer Causation [], designed by 'do good with data' visualization studio Periscopic illustrates many of the factors that can lead to breast cancer and how they may interact with others.

The interactive circos graph is meant to demonstrate the complexity of breast cancer causation, in terms of educating the general public as well as possibly stimulating new scientific research in this direction. Users can explore the different influencing factors by domain, predicted correlation strength as well as the quality of the data evidence behind.

See also:
- Health InfoScape: Illustrating the Relationships between Disease Conditions
- Visualizing the Major Health Issues Facing Americans Today

How we Sleep (and How we Awake after an Earthquake)

Information aesthetics - Mon, 2014-08-25 12:31

Since we already know in what angle people put their face when taking a selfie in different cities, we now also know how they sleep differently: Which Cities Get the Most Sleep? [] by interactive graphics editor Stuart A. Thompson of the Wall Street Journal compares the sleeping habits of citizens of different cities.

On the topic of sleep, Jawbone also just released an interesting graph revealing how the recent Napa earthquake affected the sleep of local residents []. Indeed, the distance to the epicenter seems to correlate to the number of people who awoke, and the time it took for them to get back to sleep.

As the visualizations are based on a vast dataset released by Jawbone, the makers of a digitized wristband that tracks motion and sleep behavior, the data is not necessarily representative for the whole general population.

In Out, In Out, Shake It All About

Code from an English Coffee Drinker - Sat, 2014-08-23 05:04
In the very abstract sense text analysis can be divided into three main tasks; load some text, process it, export the result. Out of the box GATE (both the GUI and the API) provides excellent support for both loading documents and processing them, but until now we haven't provided many options when it comes to exporting processed documents.

Traditionally GATE has provided two methods of exporting processed documents; a lossless XML format that can be reloaded into GATE but is rather verbose, or the "save preserving format" option which essentially outputs XML representing the original document (i.e. the annotations in the Original markups set) plus the annotations generated by your application. Neither of these options were particularly useful if you wanted to pass the output on to some other process and, without a standard export API, this left people having to write custom processing resources just to export their results.

To try and improve the support for exporting documents recent nightly builds of GATE now include a common export API in the gate.DocumentExporter class. Before we go any further it is worth mentioning that this code is in a nightly build so is subject to change before the next release of GATE. Having said that I have now used it to implement exporters for a number of different formats so I don't expect the API to change drastically.

If you are a GATE user, rather than a software developer, than all you need to know is that an exporter is very similar to the existing idea of document formats. This means that they are CREOLE resources and so new exporters are made available by loading a plugin. Once an exporter has been loaded then it will be added to the "Save as..." menu of both documents and corpora and by default exporters for GATE XML and Inline XML (i.e. the old "Save preserving format) are provided even when no plugins have been loaded.

If you are a developer and wanting to make use of an existing exporter, then hopefully the API should be easy to use. For example, to get hold of the exporter for GATE XML and to write a document to a file the following two lines will suffice:
DocumentExporter exporter =

exporter.export(document, file);There is also a three argument form of the export method that takes a FeatureMap that can be used to configure an exporter. For example, the annotation types the Inline XML exporter saves is configured this way. The possible configuration options for an exporter should be contained in it's documentation, but possibly the easiest way to see how it can be configured is to try it from the GUI.

If you are a developer and want to add a new export format to GATE, then this is fairly straightforward; if you already know how to produce other GATE resources then it should be really easy. Essentially you need to extend gate.DocumentExporter to provide an implementation of it's one abstract method. A simple example showing an exporter for GATE XML is given below:
@CreoleResource(name = "GATE XML Exporter",
tool = true, autoinstances = @AutoInstance, icon = "GATEXML")
public class GateXMLExporter extends DocumentExporter {

public GateXMLExporter() {
super("GATE XML", "xml", "text/xml");

public void export(Document doc, OutputStream out, FeatureMap options)
throws IOException {
try {
DocumentStaxUtils.writeDocument(doc, out, "");
} catch(XMLStreamException e) {
throw new IOException(e);
}As I said earlier this API is still a work in progress and won't be frozen until the next release of GATE, but the current nightly build now contains export support for Fast Infoset compressed XML (I've talked about this before), JSON inspired by the format Twitter uses, and HTML5 Microdata (an updated version of the code I discussed before). A number of other exporters are also under development and will hopefully be made available shortly.

Hopefully if you use GATE you will find this new support useful and please do let us have any feedback you might have so we can improve the support before the next release when the API will be frozen.
The Feltron Annual Report of 2013 on Communication

Information aesthetics - Thu, 2014-08-21 10:29

Each year, Nicholas Felton releases an personal year report, and the one of 2013 [] was just released. These reports always stand out because of the immense sense of data-centric detail, and an always original infographic style.

This year, the report focuses on communication data, as it aspires to uncover patterns and insights within a large collection of tracked conversations, SMS, telephone calls, email, Facebook messages and even physical mail.

See also the annual reports of:
- 2012
- 2010 and 2011
- 2010 (about his father's life)
- 2009
- 2008
- 2007
- 2006
- 2005

US Domestic Migration Charted as Ordered Stacked Area Graphs

Information aesthetics - Mon, 2014-08-18 16:06

The interactive infographic Where We Came From, State by State [] by Gregor Aisch, Robert Gebeloff and Kevin Quely reveals how US citizens have moved between different US states since the year 1900.

The migration data is based on Census data, which was used to compare the state of residence versus the state of birth of a representative sample of Census forms. The visualization technique resembles that of organically shaped, stacked area graphs, also coined as stream graphs or ThemeRiver.

See also:
- Ebb and Flow of Movies
- lastgraph
- 2008 Movie Box Revenue
- What People in Tokyo are Doing on a Tuesday
- Memetracker: Tracking News Phrases over the Web
- DailyRadar TrendMap: Interactive Stacked Line Graph of Popular Trends
- Twitter Activity during the 2012 European Football Tournament

oneSecond: Printing Every Tweet Created During a Single Second

Information aesthetics - Wed, 2014-08-13 12:36

#oneSecond [] by graphic design student Philipp Adrian aggregates all the tweets sent at exactly 14:47:36 GMT of 9 November 2012.

The 5522 Twitter messages are categorized and ordered in 4 different books. Every user is part of each book but dependent on the categorization her position within the book changes.

Accordingly, the book "My Message is..." contains the content of each message, ordered by its language. The size and order of the tweet is derived from the number of followers (recipients).

The book "My Color is..." shows each user's Twitter account color, ordered by the timezone the tweet was send in.

The book "My Description is..." shows how each user describes himself on his profile, of which the size and order is derived from the Klout score.

Finally, the book "My Name is..." lists the avatar that each user chose to represent him or herself, ordered by the number of tweets the user sent.

Charting Culture: 2000 Years of Cultural History in 5 Minutes

Information aesthetics - Tue, 2014-08-12 14:59

Charting Culture [] shows the geographical movements of over 120,000 individuals who were notable enough in their life-times that the dates and locations of their births and deaths were recorded.

The animation commences around 600 bc and ends in 2012, and tracks the life of people like Leonardo da Vinci or Jett Travolta -- son of the actor John Travolta. It presents each person's birth place as a blue dot and their death as a red dot. Developed by Mauro Martino, research manager of the Cognitive Visualization Lab in IBM's Watson Group, the animated map is based on data retrieved from the Google-owned knowledge base, Freebase, a community-curated database of well-known people, places, and things.

More scientific information can be found in the Science paper "A network framework of cultural history", which was spearheaded by Maximilian Schich and his team.

Watch the movie below.

Amsterdam City Dashboard: a City as Urban Statistics

Information aesthetics - Mon, 2014-08-11 11:37

Amsterdam City Dashboard [] presents the city of Amsterdam through the lens of data, including demographic statistics, traffic reports, noise readings or political messages.

The small collection of information graphics are divided in distinct domains, such as transport, environment, statistics, economy, social, cultural and security. All data is shown in near real-time, based on blocks of 24 hours. Larger dots and darker colors symbolize higher values, whereas an interactive map provides a geographic reference.

Based on the Linked Data API from the CitySDK project, this dashboard should be easily transferable to the data repositories from other cities.

See also City Dashboard: Aggregating All Spatial Data for Cities in the UK.

Quick links

Greg Linden's Blog - Tue, 2014-08-05 12:29
What caught my attention lately:
  • Great idea for walking directions: "At times, we do not [want] the fastest route ... When walking, we generally prefer tiny streets with trees over large avenues with cars ... [We] suggest routes that are not only short but also emotionally pleasant." ([1] [2] [3])

  • Cool idea for a drone that autonomously flies a small distance above and behind you while filming in HD ([1] [2])

  • "OkCupid doesn’t really know what it’s doing. Neither does any other website. It’s not like people have been building these things for very long, or you can go look up a blueprint or something. Most ideas are bad. Even good ideas could be better. Experiments are how you sort all this out." ([1] [2])

  • "Amazon’s cloud revenue now runs almost on par with VMware (VMW), which posted revenue of $5.2 billion last year" ([1])

  • Walmart is getting more aggressive about competing with Amazon on personalization and recommendations ([1])

  • It's important to realize that Amazon could have been a small bookstore on the Web ([1])

  • A lot of us thought the Amazon logo was phallic when it was introduced (worse, it was animated and actually grew from left-to-right). Remarkably, it's lived on for 14 years now. ([1])

  • A big problem with layoffs is not only do you lose some of the people you intended to layoff, but also some of your best employees will pick that time to leave. People with good options won't wait around to experience the chaos and fear; they'll just leave. ([1])

  • "A brand-name USB stick [claims to be] a computer keyboard [device] ... [and then] opens a command window on an attached computer and enters commands that cause it to download and install malicious software." ([1])

  • Financial services and poor computer security: "Our assumption was that, generally speaking, the financial sector had its act together much more" ([1] [2])

  • "NSA employees [were] passing around nude photos that were intercepted in the course of their daily work" ([1] [2])

  • Google Cloud googler says, "It should always be cheaper to run in the cloud no matter what your workload" but that the pricing isn't there yet ([1])

  • Details on Google's remarkably large and fast data warehouse ([1] [2])

  • Cool augmented reality game intended to be played as a passenger in a moving car that creates the terrain and enemies you see in the game based on the stores and buildings around you in the real world ([1])

  • "Astronomers of the 2020s will be swimming in petabytes of data streaming from space and the ground ... [such as] a 3,200-megapixel camera, which will produce an image of the entire sky every few days and over 10 years will produce a movie of the universe, swamping astronomers with data that will enable them to spot everything that moves or blinks in the heavens, including asteroids and supernova explosions." ([1])

  • Data are or data is: "'datum' isn't a word we ever use. So it makes no sense to use the plural when the singular doesn't exist." ([1])

  • The "If Google was a guy" series from CollegeHumor is hilarious (but probably NSFW) ([1] [2] [3])

  • Funny Dilbert comics on a Turing test for management ([1] [2])

  • Cathartic Xkcd comic on defending your thesis ([1])
Open Machine Learning Workshop, August 22

Machine Learning Blog - Sat, 2014-07-26 09:14

On August 22, we are planning to have an Open Machine Learning Workshop at MSR, New York City taking advantage of CJ Lin and others in town for KDD.

If you are interested, please email msrnycrsvp at and say “I want to come” so we can get a count of attendees for refreshments.

