Skip navigation.
Home
Semantic Software Lab
Concordia University
Montréal, Canada

Blogroll

Becky’s and My Annotation Paper in TACL

LingPipe Blog - Wed, 2014-10-29 17:05
Finally something published after all these years working on the problem: Rebecca J. Passonneau and Bob Carpenter. 2014. The Benefits of a Model of Annotation. Transactions of the Association for Comptuational Linguistics (TACL) 2(Oct):311−326. [pdf] (Presented at EMNLP 2014.) Becky just presented it at EMNLP this week. I love the TACL concept and lobbied Michael Collins […]
Categories: Blogroll

At what point is an over-the-air TV antenna too long to be legal?

Greg Linden's Blog - Sat, 2014-10-25 10:40
You can get over-the-air HDTV signals using an antenna. This antenna gets a better, stronger signal with less interference if it is direct line-of-sight and as near as possible to the broadcast towers. So, you might want an antenna that is up high or even some distance away to get the best signal.

But if you try to do this, you immediately run into a question: At what point does that antenna become too long to be legal or the signal from the antenna is transmitted in a way where it is no longer legal?

Let's say I put an antenna behind my TV hooked up with a wire. That's obviously legal and what many people currently do.

Let's say I put an antenna outside on top of a tree or my garage and run a wire inside. Still seems obviously legal.

Let's say I put an antenna on top of my roof. Still clearly fine.

Let's say I put it on my neighbor's roof and run a wire to my TV. Still ok?

Let's say I put the antenna on my neighbor's roof, but have the antenna connect to my WiFi network and transmit the signal using my local area network instead of using a direct wired cable connection. Still ok?

Let's say I put the antenna on my neighbor's roof, but have the antenna connect to my neighbor's WiFi network and transmit the signal over their WiFi, over the internet, then to my WiFi, instead of using a direct wired cable connection. Still ok?

Let's say I put my antenna on my neighbor's roof, but my neighbor won't do this for free. I have to pay a small amount of rent to my neighbor for the space on his roof used by my antenna. I also have the antenna connect to my neighbor's WiFi network and transmit its signal over their WiFi, over the internet, then to my WiFi, instead of using a direct wired cable connection. Still ok?

Let's say, like before, I put my antenna on my neighbor's roof, pay the neighbor rent for the space on his roof, use the internet to transmit the antenna's signal. But, this time, I buy the antenna from my neighbor at the beginning (and, like before, I own it now). Is that okay?

Let's say I put my antenna on my neighbor's roof, pay the neighbor rent for the space on his roof, use the internet to transmit the antenna's signal, but now I rent or lease the antenna from my neighbor. Still ok? If this is not ok, which part is not ok? Is it suddenly ok if I replace the internet connection with a direct microwave relay or hardwired connection?

Let's say I do all of the last one, but use a neighbor's roof three houses away. Still ok?

Let's say I do all of the last one, but use a roof on a building five blocks away. Still ok?

Let's say I rent an antenna on top of a skyscraper in downtown Seattle and have the signal sent to me over the internet. Not ok?

The Supreme Court recently ruled Aereo is illegal. Aereo put small antennas in a building and rented them to people. The only thing they did beyond the last thing above is time-shifting, so they would not necessary send the signal from the antenna immediately, but instead store it, and only transmit it when demanded.

You might think it's the time shifting that's the problem, but that didn't seem to be what the Supreme Court said. Rather, they said the intent of the 1976 amendments to US copyright law prohibit community antennas (which is one antenna that sends its signal to multiple homes), labelling those a "public performance". They said Aereo's system was similar in function to a community antenna, despite actually having multiple antennas, and violated the intent of the 1976 law.

So, the question is, where is the line? Where does my antenna become too distant, transmit using the wrong methods, or involve too many payments to third parties in the operation of the antenna that it becomes illegal? Can it not be longer than X meters? Not transmit its signal in particular ways? Not require rent for the equipment or space on which the antenna sits? Not store the signal at the antenna and transmit it only on demand? What is the line?

I think this question is interesting for two reasons. First, as an individual, I would love to have a personal-use over-the-air HDTV antenna that gets a much better reception than the obstructed and inefficient placement behind my TV, but I don't know at what point it becomes illegal for me to place an antenna far away from the TV. Second, I suspect many others would like a better signal from their HDTV antenna too, and I'd love to see a startup (or any group) that helped people set up these antennas, but it is very unclear what it might be legal for a startup to do.

Thoughts?
Categories: Blogroll

Why can't I buy a solar panel somewhere else in the US and get a credit for the electricity from it?

Greg Linden's Blog - Fri, 2014-10-24 09:48
Seattle City Light has a clever project where, instead of installing solar panels on your house where they might be obscured by trees or buildings, you can buy into a solar panel installation on top of a building in a more efficient location and get a credit for the electricity generated on your electric bill.

Why stop there? Why can't I buy a solar panel in a very different location and get the electricity from it?

Phoenix, Arizona has about twice the solar energy efficiency of Seattle. Why can't I buy a solar panel and enjoy the electricity credit from that solar panel when it is installed in a nice sunny spot in the Southwest?

This doesn't require shipping the actual electricity to your home. Instead, you fund an installation of solar panels on top of a building in an area of the US with high solar energy efficiency, then get a credit for that electricity on your monthly electricity bill.

I suppose, at some boring financing level, this starts to resemble a corporate bond, with an initial payment yielding a stream of payments over time, but people wouldn't see it that way. The attraction would be installing solar panels and getting a credit on your energy bill without installing solar panels on your own home. Perhaps the firm arranging the installations and working out the deals with local utilities could be treating the entire thing as the equivalent of marketing bonds to people who like solar energy, but the attraction to people is that visceral appeal of a near $0 electricity bill they see every month from the solar panels they feel like they own and installed.

Even with the overhead pulled out by the company selling this and arranging deals with local utilities so this all appears on your local electricity bill, the credit on your electricity bill still should be much higher than you could possibly get installing panels on your own home with all its obstructions and cloudy weather. Solar generation in an ideal location in the US easily can generate twice as much power as what is available locally, on your rooftop.

So, why hasn't someone done this? Why can't I buy solar panels and have them installed not on my own home, but in some much better spot?
Categories: Blogroll

Sequence Data Mining for Health Applications

Life Analytics Blog - Thu, 2014-10-16 07:48
An often overlooked type of Analysis is  Sequence Data Mining (or Sequential Pattern Mining).


Sequence Data Mining is a type of Analysis which aims in extracting patterns sequences of  Events. We can also see Sequence Data Mining as an Associations Discovery Analysis with a Temporal Element.

Sequence Data Mining has many potential applications (Web Page Analytics, Complaint Events, Business Processes) but here today we will show an application for Health. I believe that this type of Analysis will become even more important as wearable technology will be used even more and therefore more Data of this kind will be generated.

Consider the following hypothetical scenario : 
A 30-year old Male patient complaints about several symptoms which -for simplicity reasons- we will name them as Symptom1, Symptom2, Symptom3,etc.

His Doctor tries to identify what is going on and after the patient takes all necessary Blood work and finds no problems. After thorough evaluation the Doctor believes that his patient suffers from Chronic Fatigue Syndrome. Under the Doctor's supervision the patient will record his symptoms along with different supplements to understand more about his condition. Several events (e.g a Visit to the Gym, a stressful Event) will also be taken under consideration to see if any patterns emerge.
-How Can we easily record Data for the scenario above?-Can we extract sequences of events that occur more frequently than mere chance?-Can we identify which sequences of Events / Food / Medication may potentially lead to specific Symptoms or to a lack of Symptoms?

Looking the problem through the eyes of a Data Scientist, We have :
A series of Events that happen during a day : A Stressful event, A sedentary day, Cardio workouts, Weight Lifting, Abrupt Weather Deterioration, etc
A Number of Symptoms : Headaches, "Brain Fog", Mood problems, Insomnia, Arthralgia, etc.

Let's begin with Data Collection. We first suggest to the patient to use an Android app called MyLogsPro (or some other equivalent application) to easily input information as this happens :

   So if the patient feels a specific Symptom he will press the relevant Symptom button on his  mobile device. The same applies for any events that have happened and any Food or Medication taken. As the day passes we have the following data collected :


The snapshot shows what happened starting on the 20th of August 2014, where our patient has logged the intake of Medication (at 08:22 AM) and/or Supplements upon waking up then a Food entry was added at 08:47. At 11:06 the patient had a Symptom and immediately reached his phone and pressed the relevant Symptom (Symptom No 4) button.
After many days of Data Collection we decide that its time to analyze this information. We export the data from the application as a csv file which looks as follows :


We will use KNIME to read the csv file, change the contents of the entries accordingly so that an Algorithm can read the events and then perform Sequence Data Mining. We have the following layout :


 The File Reader reads the .csv file, then during the Pre-processing block (shown in yellow), a String Manipulation node which removes colon (:) from time field (e.g 12:10 becomes 1210). The Sorter sorts the data according to date then time as the second field and a Java snippet uses replaceAll() function to remove all leading zeros from Time field (e.g 0010 becomes 10).
The R Snippet loads the CSPADE Algorithm and then uses this Algorithm to extract pattern of sequences.

After executing the stream we get the following output :

The information consists of two outputs : The first one is a list of sequences along with their support and the second one contains the output from rule induction which gives us two more useful metrics (namely the lift and the confidence for each rule).

We immediately notice an interesting entry on the first output :

Medication1->Symptom2

and on the second output we see that this particular rule has a lift of 1.4 and 0.8 confidence.

However, as Data Scientists we should always double-check the extracted knowledge and must be aware of pitfalls. Let's see some examples (list not exhaustive) :

1) The algorithm does not account for time as it should : As an example, consider the following entries :

10/09/14,08:00,Medication1
10/09/14,08:05,Symptom2

We assume that Medication1 is taken by mouth and needs 60 minutes to be properly dissolved and that these entries occur frequently enough in that order in our data set. Even though the algorithm might show a statistically significant pattern , it is not logical to hypothesize that Medication1 could be related to Symptom2. The Analyst should first examine each of these entries to see which proportion of the records has a time difference of at least -say- or greater than 60 minutes.

Apart from the example shown above we must consider the opposite effect. Consider this entry :

10/09/14,08:00,Medication1
...
...
...
10/09/14,21:05,Symptom2

In other words : Is it possible that a Medication taken in the morning to generate a Symptom 12 hours later?


2) The algorithm is not able to account for the compounding effect of a Medication. For example, the patient might have low levels of Taurine and for this level to be replenished, an x amount of days of Taurine supplementation is needed. The algorithm cannot account for this possibility.


 3) The patient should also input entries of "No Symptoms". It is not clear however when this should be done (e.g at the end of each day? assess every 6 hours and add 2 entries accordingly?)


However, this does not mean that a Sequence Mining algorithm should not be used under these circumstances. This technique can generate several potentially interesting hypotheses which Doctors and/or Researchers may wish to pursue further.
 




Categories: Blogroll

Conference on Digitial Experimentation

Machine Learning Blog - Sat, 2014-10-11 16:30

I just attended CODE. The set of people interested in digital experimentation have very diverse backgrounds encompassing theory, machine learning, social science, economics, and industry so this seems like a good subject for a new conference. I hope it continues.

I found several talks interesting.

  • Eytan Bakshy talked about PlanOut which is language/platform for flexibly specifying experiments.
  • Ron Kohavi talked about EXP which is a heavily used A/B testing platform.
  • Susan Athey talked about long term vs short term metrics which seems both important to address, a constant problem, and not yet systematically solved.

There was a panel about the ongoing Facebook experimentation controversy. The issue here is complex. My understanding is that Facebook users have some expected ownership of the content they create, and hence aren’t comfortable with the content being used in unexpected ways. On the other hand, experimentation is so necessary to the functioning of all large modern internet sites that banning it or slowing down the process by a factor of a million (as some advocated) would badly degrade the future of these sites in practice.

My belief is that what’s lacking is education and trust. W.r.t. education, people need to understand that experimentation is unavoidable when trying to figure out how to optimize an enormously complex system, as there is just no other way to systematically make 1000 right decisions as is necessary for basic things like choosing the best homepage/search result/etc… W.r.t. trust, companies are not particularly good at creating trust in general, but finding the right mechanism for doing so seems critical. I would point out Vanguard as a company that managed to successfully create trust by design.

Categories: Blogroll

Quick links

Greg Linden's Blog - Wed, 2014-10-01 21:08
What caught my attention lately:
  • 12% of Harvard is enrolled in CS 50: "In pretty much every area of study, computational methods and computational thinking are going to be important to the future" ([1])

  • Excellent "What If?" nicely shows the value of back-of-the-envelope calculations and re-thinking what exactly it is you want to do ([1])

  • The US has almost no competition, only local monopolies, for high speed internet ([1] [2])

  • You can't take two large, dysfunctional, underperforming organizations, mash them together, and somehow make diamonds. When you take two big messes and put them together, you just get a bigger mess. ([1])

  • "Yahoo was started nearly 20 years ago as a directory of websites ... At the end of 2014, we will retire the Yahoo Directory." ([1] [2])

  • Investors think that Yahoo is essentially worthless ([1])

  • "At a moment when excitement about the future of robotics seems to have reached an all-time high (just ask Google and Amazon), Microsoft has given up on robots" ([1])

  • "Firing a bunch of tremendously smart and creative people seems misguided. But hey—at least they own Minecraft!" ([1])

  • "Macs still work basically the same way they did a decade ago, but iPhones and iPads have an interface that's specifically designed for multi-touch screens" ([1] [2])

  • On the difficulty of doing startups ([1] [2])

  • "Be glad some other sucker is fueling the venture capital fire" ([1])

  • "Just how antiquated the U.S. payments system has become" ([1])

  • Is everyone grabbing money from online donations to charities? Visa's charge fee on charities is only 1.35%, but the lowest online payment system for charities charges 2.2% and most charge much more than that. ([1])

  • "For most people, the risk of data loss is greater than the risk of data theft" ([1])

  • Password recovery "security questions should go away altogether. They're so dangerous that many security experts recommend filling in random gibberish instead of real answers" ([1])

  • Brilliantly done, free, open source, web-based puzzle game with wonderfully dark humor about ubiquitous surveillance ([1])

  • How Udacity does those cool transparent hands in its videos ([1])

  • There's just a bit of interference when you move your hand above the phone, just enough interference to detect gestures without using any additional power or sensors ([1] [2])

  • Small, low power wireless devices powered by very small fluctuations in temperature ([1] [2])

  • Cute intuitive interface for transferring data between PC and mobile ([1] [2])

  • "Federal funding for biomedical research [down 20%] ... forcing some people out of science altogether" ([1])

  • Another fun example of virtual tourism ([1])

  • Ig Nobel Prizes: "Dogs prefer to align themselves to the Earth's north-south magnetic field while urinating and defecating" ([1])

  • Xkcd: "In CS, it can be hard to explain the difference between the easy and the virtually impossible" ([1] [2])

  • Dilbert: "That process sounds like a steaming pile of stupidity that will beat itself to death in a few years" ([1])

  • Dilbert on one way to do job interviews ([1])

  • The Onion: "Startup Very Casual About Dress Code, Benefits" ([1])

  • Hilarious South Park episode, "Go Fund Yourself", makes fun of startups ([1])
Categories: Blogroll

The Longform Manifesto

Data Mining Blog - Fri, 2014-09-26 00:37

Sometimes a title for a blog posts suggests itself to me which seems so self contained that it takes real effort to actual write the post ('Machine Intelligence, not Machine Learning is the Next Big Thing' is another in this line). The idea behind the (or a) Longform Manifesto is as follows. I have become aware of late of the sense of deterioration that is associated with the mobile 'revolution' and the info snacking, casual gaming and interupt driven lifestyle that it has entailed. The behaviours are perfectly illustrated in this scene from Portlandia:

 

With a daughter who has now come of technological age (she has a cell phone) it has become important to me to remind myself what content consumption was like before this mobile mess appeared.

We read books, we watched movies, we listened to music. But, of course, we haven't stopped doing that. Rather, we have started all this other stuff, and the problem is that this is influencing how we approach longform content. I find myself watching bits of movies, or listening to bits of music or reading parts of essays.

The Longform Manifesto, through the definition of longform content and the discipline and commitment needed to consume it as it was meant to be consumed, helps to dillute and remove the behaviour degrading influence of mobile technology. Someone should write it.

Categories: Blogroll

No more MSR Silicon Valley

Machine Learning Blog - Fri, 2014-09-19 19:44

This news report is correct, the Microsoft Research Silicon Valley center has been cut. The New York lab has not been directly affected although obviously cross-lab collaborations are impacted, and we sympathize deeply with those involved. Most of the rest of MSR is not directly affected.

I’m not privy to the rationale behind the decision, but in my opinion there are some very strong people in the various groups (Algorithms, Architecture, Distributed Systems, Privacy, Software tools, Web Search), and I expect offers have started raining on them. In my experience, this is harrowing in the short term, yet I know that most of my previous colleagues ended up happier after the troubles hit Yahoo! Research 2 1/2 years ago.

Categories: Blogroll

Visualizing Publicly Available US Government Data Online

Information aesthetics - Fri, 2014-09-19 03:11


Brightpoint Consulting recently released a small collection of interactive visualizations based on open, publicly available data from the US government. Characterized by a rather organic graphic design style and color palette, each visualization makes a socially and politically relevant dataset easily accessible.

The custom chore diagram titled Political Influence [brightpointinc.com] highlights the monetary contributions made by the top Political Action Committees (PAC) for the 2012 congressional election cycle, for the House of Representatives and the Senate.

The hierarchical browser 2013 Federal Budget [brightpointinc.com] reveals the major flows of spending in the US government, at the federal, state, and local level, such as the relationship of spending between education and defense.

The circular flow chart United States Trade Deficit [brightpointinc.com] shows the US Trade Deficit over the last 11 years by month. The United States sells goods to the countries at a the top, while vice versa, the countries at the bottom sell goods to the US. The dollar amount in the middle represents the cumulative deficit over this period of time.

Categories: Blogroll

The Disappearing Planet: Comparing the Extinction Rates of Animals

Information aesthetics - Thu, 2014-09-18 15:05


The subtly designed A Disappearing Planet [propublica.org] by freelance data journalist Anna Flagg reveals the extinction rates of animals, caused by a variety of human-caused effects, including climate change, habitat destruction and species displacement.

Divided into mammals, reptiles, amphibians and birds, the interactive bar graph allows users to browse horizontally through the vast amount of species by order and family, and vertically by genus.

Species in risk are highlighted in red, so that dense clusters denote related families (e.g. bears, parrots, turtles) that are specially threatened over the next 100 years.

Categories: Blogroll

Scottish Independence : Bing Predicts 'No'

Data Mining Blog - Thu, 2014-09-18 11:58

Bing's prediction team has a feature live on the site right now that predicts Scotland will not become an independant nation as a result of today's referendum.

Categories: Blogroll

GitHut: the Universe of Programming Languages across GitHub

Information aesthetics - Fri, 2014-09-12 10:37


GitHut [githut.info], developed by Microsoft data visualization designer Carlo Zapponi, is an interactive small multiples visualization revealing the complexity of the wide range of programming languages used across the repositories hosted on GitHub.

GitHub is a web-based repository service which offers the distributed revision control and source code management (SCM) functionality of Git, enjoying more than 3 million users.

Accordingly, by representing the distribution and frequency of programming languages, one can observe the continuous quest for better ways to solve problems, to facilitate collaboration between people and to reuse the effort of others.

Programming languages are ranked by various parameters, ranging from the number of active repositories to new pushes, forks or issues. The data can be filtered over discrete moments in time, while evolutions can be explored by a collection of timelines.

Categories: Blogroll

Pi Visualized as a Public Urban Art Mural

Information aesthetics - Wed, 2014-09-10 15:23


Visualize Pi [tumblr.com] is a mural project that aimed to use popular mathematics to connect Brooklyn students to the community with a visualization of Pi. It was funded by a successful KickStarter project as proposed by visual artist artist Ellie Balk, The Green School Students, staff and Assistant Principal Nathan Affield.

The mural seems to consist of different parts. A reflective line graph, reminiscent of a sound wave, represents the number Pi (3.14159...) by way of colors that are coded by the sequence of the prime numbers found in Pi (2,3,5,7), as well as height.

Additionally, a golden spiral was drawn based on the Fibonacci Sequence, as an exploration of the relationship between the golden ratio and Pi. The number Pi was represented in a color-coded graph within the golden spiral. In this, the numbers are seen as color blocks that vary in size proportionately within the shrinking space of the spiral, representing the 'shape' of Pi.

"By focusing on the single, transcendental concept of Pi across courses, the mathematics department plans to not only deepen student understanding of shape and irrational number, but more importantly, connect these foundational mental schema for students while dealing with the concrete issues of neighborhood beautification and how proportion can inform aesthetic which can in turn improve quality of life."

A few more similar urban / public visualization projects can be found at Balk's project page, e.g. showing weather patterns, emotion histograms or sound waves.

Via @mariuswatz .

Categories: Blogroll

The Key Players in the Middle East and their Relationships

Information aesthetics - Wed, 2014-09-10 14:48


Whom Likes Whom in the Middle-East? [informationisbeautiful.net] by David McCandless and UniversLab is a forced-network visualisation of key players & notable relationships in the Middle East.

Next to its expressive aesthetic, the interactive features allow users to highlight individual nodes and its direct connections to others, as well as filter between the kind of possible relationships, such as "hate", "strained", "good" or "love".

Reminds me a bit of Mapping the Relationships between the Artists who Invented Abstraction.

Categories: Blogroll

The problem with personalized education

Greg Linden's Blog - Mon, 2014-09-08 16:20
Personalized education has had some spectacular failures lately, in large part due to how tone-deaf the backers have been to the needs of teachers, parents, and students.

The right way to do personalization is to prove you're useful first. Personalization is just a tool. If a new tool doesn't work better than the old tool, it's useless. There's no reason to use personalized education unless it works better than unpersonalized education. A tool needs to be useful.

Teachers are already overworked and, after having been burned too many times on supposedly exciting new technologies that fail to help, correctly are cynical about tech startups coming in and demanding something of them. If some tech startup isn't helping a teacher get something done they need to get done, it's a bad tool and it's useless.

Parents are leery of companies who say they only want to help and what corporations are doing with the data they have on their children, correctly so given all the marketing abuses that have happened in the past.

Kids don't want more boring busywork to do -- they get enough of that already -- and don't see why anything this company is talking about helps them or is useful to them.

If a company wants to succeed in personalized education, it should:
  1. Be useful, noticeably raise test scores
  2. Not require additional busy work
  3. Be optional
  4. Have no marketing whatsoever, only use data to help
I think there are plenty of examples of how this might work. I would like to see a company offer a free Duolingo-like pre-algebra and algebra app that jumps students ahead rapidly as they answer questions correctly and spends more time on similar problems after a question is wrong. The app would be completely optional for students to use, but, when students use it, their test scores increase.

I would like to see a company use the existing standardized tests required by several states, analyze the incorrect answers to identify concepts a student is not understanding, and then print short worksheets targeting only those missed concepts for teachers to hand out to each student. The worksheets would be free and arrive in teachers' mailboxes. If the teacher doesn't want to hand them out, that's not a problem, but test scores go up for the classrooms where the teachers do hand them out. So, even if most teachers don't hand them out at first and most students throw them away at first, over time, more and more teachers will start handing them out and more and more students will do them, as only helps those who do.

In both of these examples, a startup could set up from the beginning to run large scale experiments, showing different problems to different students, and learning what raises test scores, what designs and lesson lengths cause students to stop, what concepts are important and which matter less, what can be taught easily through this and what cannot, what people enjoy, and what works.

When a company comes in and says, "Give us your data, teachers, parents, and kids, and do all this work. Maybe we'll boost your test scores for you later," they're being arrogant and tone-deaf. Everyone responds, "I don't believe you. How about you prove you're useful first? I'm busy. Do something for me or go away." And they're right to do so.

There likely is a way to do personalized education that everyone would embrace. But that way probably requires proving you're useful first. After all, personalization is just a tool.
Categories: Blogroll

SEMANTiCS – the emergence of a European Marketplace for the Semantic Web

Semantic Web Company - Mon, 2014-09-08 06:34

SEMANTiCS conference celebrated its 10th anniversary this September in Leipzig. And this year’s venue has been capable of opening a new age for the Semantic Web in Europe – a marketplace for the next generation of semantic technologies was born.

As Phil Archer stated in his key note, the Semantic Web is now mature, and academia and industry can be proud of the achievements so far. And exactly that fact gave the thread for the conference: Real world use cases demonstrated by industry representatives, new and already running applied projects presented by the leading consortia in the field and a vivid academia showing the next ideas and developments in the field. So this years SEMANTiCS conference brought together the European Community in Semantic Web Technology – both from academia and industry.

  • Papers and Presentations: 45 (50% of them industry talks)
  • Posters: 10 (out of 22)
  • A marketplace with 11 permanent booths
  • Presented Vocabularies at the 1st Vocabulary Carnival: 24
  • Attendance: 225
  • Geographic Coverage: 21 countries

This year’s SEMANTiCS was co-located and connected with a couple of other related events, like the German ISKO, the Multilingual Linked Open Data for Enterprises (MLODE 2014) and the 2nd DBpedia Community Meeting 2014. This wisely connected gatherings brought people together and allowed transdisciplinary exchange.

Recapitulatory speaking: This SEMANTiCS has opened up new sights on Semantic Technologies, when it comes to

  • industry use
  • problem solving capacity
  • next generation development
  • knowledge about top companies, institutes and people in the sector
Categories: Blogroll

Visits: Mapping the Places you Have Visited

Information aesthetics - Thu, 2014-09-04 08:02


Visits [v.isits.in] automatically visualizes personal location histories, trips and travels by aggregating geotagged one's Flickr collection with a Google Maps history. developed by Alice Thudt, Dominkus Baur and prof. Sheelagh Carpendale, the map runs locally in the browser, so no sensitive data is uploaded to external servers.

The timeline visualization goes beyond the classical pin representation, which tend to overlap and are relatively hard to read. Instead, the data is shown as 'map-timelines', a combination of maps with a timeline that convey location histories as sequences of maps: the bigger the map, the longer the stay. This way, the temporal sequence is clear, as the trip starts with the map on the left and continues towards the right.

A place slider allows the adjusting of the map granularity, reaching from street-level to country-level.

Read the academic research here [PDF], or watch a explanatory video below.

Categories: Blogroll

More quick links

Greg Linden's Blog - Wed, 2014-09-03 17:59
More of what caught my attention lately:
  • The overwhelming majority of smartphone users set up their phone once, then barely ever download a new app again ([1] [2])

  • Cool and successful use of speculative execution in cloud computing for games, trading off extra CPU and bandwidth for the ability to hide network latency ([1])

  • Infrared vision on your phone ([1] [2])

  • How easy is it to get people to memorize hard-to-crack random 56-bit passwords, equivalent to about 12 random letters or 6 words? ([1] [2])

  • Desalination needs warm water, data centers need to be cooled, why not put them together? Clever idea. ([1])

  • It's easy to overhype this, but it's still pretty cool, transmitting data (0 and 1 bits) directly brain-to-brain without implants (using magnetic stimulation of the brain and EEG reading of the brain, both from the surface of the scalp) with relatively low error rates (5-15%). Data rates are extremely low at 2-3 bits/minute, but it's still interesting that it's possible at all. ([1])

  • Xiaomi's remarkable iPhone clone ([1])

  • Has Amazon sold less than 35k Fire phones? ([1] [2])

  • Facebook publishes a paper which details how its ad targeting works and suggests they will be doing more personalization in the future ([1] [2])

  • "Having a multiyear project with no checks along the way and the promise of one big outcome is not a highly successful approach, in or outside government" ([1] [2])

  • More evidence patent trolls cause real harm. Trolled firms "dramatically reduce R&D spending". ([1])

  • "Using nothing more than a laptop ... [they could] alter the normal timing pattern of the [traffic] lights, turning all the lights along a given route green, for instance, or freezing an intersection with all reds" ([1])

  • Interesting data visualization showing how CD took over in music sales, then got replaced by downloads, all over the last two decades or so ([1])

  • Neat charts on how the strike zone expands on 3 ball counts and contracts on 2 strike counts ([1])

  • Cute SMBC comic on "What is the fastest animal?" ([1])

  • Great SMBC comic on job interviews ([1])
Categories: Blogroll

Culturegraphy: the Cultural Influences and References between Movies

Information aesthetics - Mon, 2014-09-01 13:52


Culturegraphy [culturegraphy.com], developed by "Information Model Maker" Kim Albrecht reveals represent complex relationships of over 100 years of movie references.

Movies are shown as unique nodes, while their influences are depicted as directed edges. The color gradients from blue to red that originate in the1980s denote the era of postmodern cinema, the era in which movies tend to adapt and combine references from other movies.

Although the visualizations look rather minimalistic at first sight, their interactive features are quite sophisticated and the resulting insights are naturally interesting. Therefore, do not miss out the explanatory movie below.

Via @albertocairo .

Categories: Blogroll

A World of Terror: the Impact of Terror in the World

Information aesthetics - Thu, 2014-08-28 12:11


A World of Terror [periscopic.com] by Periscopic shows the reach, frequency and impact of about 25 terrorism groups around the world.

The visualization exists of 25 smartly organized pixel plots that are displayed as ordered small multiples. Ranging from Al-Qa'ida and the Taliban to less known organizations like Boko Haram, the plots reveal which ones are more deadly, are more recently active, or have been historically more active. In addition, all data can be filtered over time.

The data is based on the Global Terrorism Database (GTD), the most comprehensive and open-source collection of terrorism data available.

Categories: Blogroll
Syndicate content