Here Here [herehere.co], developed by Future Social Experiences (FuSE) Labs at Microsoft Research, expresses neighborhood-specific public data by mapping it as text labels and cartoon-like iconography.
The data is based on New York City's 311 non-emergency data stream, consisting of the concerns and issues as reported by New York residents via email, phone calls, or text messages. Each day, HereHere pulls this 311 data for each neighborhood and identifies the most compelling, important 311 request types, after which the system generates appropriate cartoons and text that represent a neighborhood's typical reactions.
The iconographic communication approach is coined as 'characterization', and hypothesized to bring immediacy and a human scale to an otherwise overwhelming amount of abstract information. Next to developing an intriguing publicly available map, FuSE Labs wants to understand how this characterization can be a tool for data engagement, and aims to measure the impact of how people relate to their community when they can interact with data in this way.
There are quite a few visualizations of sorting algorithms out there, such as at sorting-algorithms.com and sortvis.org. "Sorting" [sorting.at], developed by Nokia data visualization designer Carlo Zapponi, brings some innovation to this field by tackling the issue educationally (explaining algorithm step by step) as well as artistically.
The project was initiated to create visual representations of sorting algorithms with the hope of discovering patterns in their visual footprints. It provides an interactive walk-through that guides the reader step after step along the process of ordering a lists of integer numbers for a selection of sorting algorithms.
- Rayid Ghani (Chief Scientist at Obama 2012)
- Brian Kingsbury (Speech Recognition @ IBM)
- Jorge Nocedal (who did LBFGS)
We’ve been somewhat disorganized in advertising this. As a consequence, anyone who has not submitted an abstract but would like to do so may send one directly to me (email@example.com title NYASMLS) by Friday March 14. I will forward them to the rest of the committee for consideration.
"Game on!" by Fathom Information Design is an exploratory visualization prototype that allows users to parse through a basketball game's data, to investigate the behaviors and patterns in terms of the statistics and locations of players.
Based on a vast collection of performance statistics as well as real-time tracked ball and player positions, the tools allows one to explore some potentially interesting patterns, such as each player's recurrent locations, standings, and alignments according to their team position, the concentration of movement around the 3-point mark, any personally preferred shooting spots, or the fact that players tend to transition into offence along the sides of the court
The actual data was acquired by linking noteworthy game event markings to a smart computer-vision algorithm that analyzes top-down video footage, which results in a large set of X, Y, Z positions for each player and the ball for every video frame.
. The 3D Trajectories of the Tennis Ball during the Final ATP Matches
. The NYTimes Visualization of Live World Cup Football Statistics
. VisualSport: Social Visualization of (Live) World Cup Football Statistics
. Adidas Match Tracker: Experience Soccer Games Like a Data Geek
. Guardian Interactive Chalkboards: Map and Share Soccer Game Events
- Cool new tech, especially for mobile, detecting gesture movements from the changes they make to ambient wireless signals, uses a fraction of the power of other techniques (  )
- Also for mobile: "The big trick here is ... two [camera] lenses with two different focal lengths. One lens is wide-angle, while the other is at 3x zoom ... magnify more distant subjects ... improved low-light performance ... noise is reduced ... just as we would if we had one big imaging sensor instead of two little ones ... [and] depth analysis allows ... [auto] blurring out of backgrounds in portrait shots, quicker autofocus, and augmented reality." ()
- "These are not the first artificial muscles to have been created, but they are among the first that are inexpensive and store large amounts of energy" ()
- "Tesla is a glimpse into a future where cars and computers coexist in seamless harmony" ()
- "Fields from anthropology to zoology are becoming information fields. Those who can bend the power of the computer to their will – computational thinking but computer science in greater depth – will be positioned for greater success than those who can’t." ( )
- The CEOs of Amazon, Facebook, Google, Microsoft, Twitter, Netflix, and Yahoo have CS degrees
- Details on fixing healthcare.gov. What's so impressive is how much they changed the culture in such a short time, from a hierarchical structure where no one would take any responsibility to an egalitarian one where everyone was focused on solving problems. ()
- Clever idea, advertise to find experts on the Web and then get them to answer questions for free by enticing them into playing a little quiz game ( )
- "A key to Google’s epic success was the discipline the company maintained around its hiring ... During his first seven years, the executive team met every week to review every single hiring candidate." ( )
- "Peter Norvig, Google's research director, said recently that the company employs 'less than 50% but certainly more than 5%' of the world's leading experts on machine learning" ()
- Yahoo is trying to rebuild its research group, which was destroyed by its previous CEO (     )
- Software increasingly needs to be aware of its power consumption, the cost of power, and the availability of power, and be able to reduce its power consumption when necessary ( )
- "Viewers with a buffer-free experience watch 226% more and viewers receiving better picture quality watch 25% longer" ()
- Gaming the most popular lists in the app stores: "Total estimated cost to reach the top ten list: $96,000" ( )
- "The Rapiscan 522 B x-ray system used to scan carry-on baggage in airports worldwide ... runs on the outdated Windows 98 operating system, stores user credentials in plain text, and includes a feature called Threat Image Projection used to train screeners by injecting .bmp images of contraband ... [that] could allow a bad guy to project phony images on the X-ray display." ()
- "It would appear that a surprising number of people use webcam conversations to show intimate parts of their body to the other person." ()
- "Ohhh there's not another cable company, is there? Oh that's right we're the only one in town." ()
- It "sounds like it's straight out of a sci-fi horror flick: they thawed some 30,000-year-old permafrost and allowed any viruses present to infect some cells" ()
- Very funny if you (or your kids) are a fan of Portal, educational too, and done by NASA ()
- NPR's "Wait Wait" did a segment on Amazon's "Customers who bought this", very funny ()
Google recently launched a dedicated Maps Gallery [google.com] to showcase a collection of hand-picked maps from several preferred organizations, such as the National Geographic, the U.S. Geological Survey or the City of Edmonton. It is the goal that in the future, people will find most maps not through the gallery, but via the standard search results.
The included maps range from the somewhat unappealing population statistics map based based on data from the World Bank, over an intriguing overview map of all fastfood location in the US, to the beautifully rendered Dominican Republic AdventureMap by the National Geographic.
Participants who apply for the program and are selected by Google receive free access to the enterprise version of Google Maps Engine, which includes specific connectors that facilitates easy importation of public data.
In a new exhibition titled Beautiful Science: Picturing Data, Inspiring Insight [bl.uk], the British Library pays homage to the important role data visualization plays in the scientific process.
The exhibition can be visited from 20 February until 26 May 2014, and contains works ranging from John Snow's plotting of the 1854 London cholera infections on a map to colourful depictions of the Tree of Life.
"Science is beautiful... but we can also bring an aesthetic to it with makes it so much more impactful and can allow to have your ideas a much greater reach"
SEMANTiCS 2014 will take place in Leipzig (Germany) this year from September 4-5. The International Conference on Semantic Systems will be co-located with several workshops and other meetings, e.g. the 2nd DBpedia community meeting.
SEMANTiCS conference (formerly ‘I-Semantics’) focuses on transfer and industry-related applications of semantic systems and linked data.
Here are some of the options for end-users, vendors and experts to get involved (besides participating as a regular attendee and the option to submit a paper):
- Submit an Industry Presentation: http://www.semantics.cc/open-calls/industry-presentations/
- Sponsoring / Marketplace / Exhibition: http://www.semantics.cc/sponsoring
- Become a reviewer: http://www.semantics.cc/open-calls/call-for-participation/call-for-reviewers/
The organizing committee would be happy to have you on board of the SEMANTiCS 2014 in Leipzig.
Visualising Mill Road [visualisingmillroad.com] by Lisa Koeman, Vaiva Kalnikaite and Yvonne Rogers from ICRI Cities was a community project that combined citizen participation and public data visualization to inform a community on what other members of that community think of specific local issues.
The subjective opinions from local inhabitants were gathered by voting devices that were installed within several local shops. The results from this survey were visualized on the pavement in front of the shops, with the help of local artists. As more questions were asked, the infographic visualization became a bit bigger every other day.
. "Infovis Graffiti: Spray Painting Infographics in the Wild"
. "Broadsides: Showing Infographics... in the Street"
. a recent project of our own in the realm of "Street Infographics" (PDF).
AIBRA, short for American Intercity Bus Riders Association, has recently released a detailed map [kfhgroup.com] containing all the intercity bus lines currently in operation within the U.S. Not surprisingly, the resulting transportation grid correlates closely with population density, which in itself varies widely across the country.
While its first goal is to foster a positive perception of the current intercity bus system, at least in theory, the map should also be able to help you out to get from anywhere to anywhere.
Via The Huffington Post.
Do you know what correlation, variance, frequency distributions, sampling and standard errors are? If not, you now have to chance to learn each of these statistical concepts via the medium of... modern dance.
Initiated by Lucy Irving (Middlesex University) and Andy Field (University of Sussex), who, the project "Communicating Psychology to the Public through Dance" was funded by BPS Public Engagement with additional funding attracted from IdeasTap.
It consists of 4 YouTube movies that present several rather complicated psychological constructs and statistical procedures by a series of gracious and well-coordinated dancing gestures (together with some inevitable powerpoint-based textual explanations). The expectation is that, as well as being fun and educational, these films will demystify and take some of the fear out of statistics, by demonstrating that thinking about them in new ways may make them easier to comprehend.
Watch the 4 movies below.
The concept is original, yet simple. Assistant Professor of Arts Technology Rick Valentin and his partner created a life-size physical visualization of all the lint that they collected from their clothes dryer during the last year.
The work thus consists of the actual lint, which was needle-felted onto a 26 foot (8m) long canvas banner as a sort of dot plot that references the metaphor of DNA analysis.
Watch the video below explaining the art work.
I’ve had several people ask me what the numbers in ACL reviews mean — and I can’t find anywhere online where they’re described. (Can anyone point this out if it is somewhere?)
So here’s the review form, below. They all go from 1 to 5, with 5 the best. I think the review emails to authors only include a subset of the below — for example, “Overall Recommendation” is not included?
The CFP said that they have different types of review forms for different types of papers. I think this one is for a standard full paper. I guess what people really want to know is what scores tend to correspond to acceptances. I really have no idea and I get the impression this can change year to year. I have no involvement with the ACL conference besides being one of many, many reviewers.APPROPRIATENESS (1-5) Does the paper fit in ACL 2014? (Please answer this question in light of the desire to broaden the scope of the research areas represented at ACL.) 5: Certainly. 4: Probably. 3: Unsure. 2: Probably not. 1: Certainly not. CLARITY (1-5) For the reasonably well-prepared reader, is it clear what was done and why? Is the paper well-written and well-structured? 5 = Very clear. 4 = Understandable by most readers. 3 = Mostly understandable to me with some effort. 2 = Important questions were hard to resolve even with effort. 1 = Much of the paper is confusing. ORIGINALITY (1-5) Is there novelty in the developed application or tool? Does it address a new problem or one that has received little attention? Alternatively, does it present a system that has significant benefits over other systems, either in terms of its usability, coverage, or success? 5 = Surprising: Significant new problem, or a major advance over other applications or tools that attack this problem. 4 = Noteworthy: An interesting new problem, with clear benefits over other applications or tools that attack this problem. 3 = Respectable: A nice research contribution that represents a notable extension of prior approaches. 2 = Marginal: Minor improvements on existing applications or tools in this area. 1 = The system does not represent any advance in the area of natural language processing. IMPLEMENTATION AND SOUNDNESS (1-5) Has the application or tool been fully implemented or do certain parts of the system remain to be implemented? Does it achieve its claims? Is enough detail provided that one might be able to replicate the application or tool with some effort? Are working examples provided and do they adequately illustrate the claims made? 5 = The application or tool is fully implemented, and the claims are convincingly supported. Other researchers should be able to replicate the work. 4 = Generally solid work, although there are some aspects of the application or tool that still need work, and/or some claims that should be better illustrated and supported. 3 = Fairly reasonable work. The main claims are illustrated to some extent with examples, but I am not entirely ready to accept that the application or tool can do everything that it should (based on the material in the paper). 2 = Troublesome. There are some aspects that might be good, but the application or tool has several deficiencies and/or limitations that make it premature. 1 = Fatally flawed. SUBSTANCE (1-5) Does this paper have enough substance, or would it benefit from more ideas or results? Note that this question mainly concerns the amount of work; its quality is evaluated in other categories. 5 = Contains more ideas or results than most publications in this conference; goes the extra mile. 4 = Represents an appropriate amount of work for a publication in this conference. (most submissions) 3 = Leaves open one or two natural questions that should have been pursued within the paper. 2 = Work in progress. There are enough good ideas, but perhaps not enough in terms of outcome. 1 = Seems thin. Not enough ideas here for a full-length paper. EVALUATION (1-5) To what extent has the application or tool been tested and evaluated? Have there been any user studies? 5 = The application or tool has been thoroughly tested. Rigorous evaluation on a large corpus or via formal user studies support the claims made for the system. Critical analysis of the results yields many insights into the limitations (if any). 4 = The application or tool has been tested and evaluated on a reasonable corpus or with a small set of users. The results support the claims made. Critical analysis of the results yields some insights into the limitations (if any). 3 = The application or tool has been tested and evaluated to a limited extent. The results have been critically analyzed to gain insight into the system's performance. 2 = A few test cases have been run on the application or tool but no significant evaluation or user study has been performed. 1 = The application or tool has not been tested or evaluated. MEANINGFUL COMPARISON (1-5) Do the authors make clear where the presented system sits with respect to existing literature? Are the references adequate? Are the benefits of the system/application well-supported and are the limitations identified? 5 = Precise and complete comparison with related work. Benefits and limitations are fully described and supported. 4 = Mostly solid bibliography and comparison, but there are a few additional references that should be included. Discussion of benefits and limitations is acceptable but not enlightening. 3 = Bibliography and comparison are somewhat helpful, but it could be hard for a reader to determine exactly how this work relates to previous work or what its benefits and limitations are. 2 = Only partial awareness and understanding of related work, or a flawed comparison or deficient comparison with other work. 1 = Little awareness of related work, or insufficient justification of benefits and discussion of limitations. IMPACT OF IDEAS OR RESULTS (1-5) How significant is the work described? Will novel aspects of the system result in other researchers adopting the approach in their own work? Does the system represent a significant and important advance in implemented and tested human language technology? 5 = A major advance in the state-of-the-art in human language technology that will have a major impact on the field. 4 = Some important advances over previous systems, and likely to impact development work of other research groups. 3 = Interesting but not too influential. The work will be cited, but mainly for comparison or as a source of minor contributions. 2 = Marginally interesting. May or may not be cited. 1 = Will have no impact on the field. IMPACT OF ACCOMPANYING SOFTWARE (1-5) If software was submitted or released along with the paper, what is the expected impact of the software package? Will this software be valuable to others? Does it fill an unmet need? Is it at least sufficient to replicate or better understand the research in the paper? 5 = Enabling: The newly released software should affect other people's choice of research or development projects to undertake. 4 = Useful: I would recommend the new software to other researchers or developers for their ongoing work. 3 = Potentially useful: Someone might find the new software useful for their work. 2 = Documentary: The new software useful to study or replicate the reported research, although for other purposes they may have limited interest or limited usability. (Still a positive rating) 1 = No usable software released. IMPACT OF ACCOMPANYING DATASET (1-5) If a dataset was submitted or released along with the paper, what is the expected impact of the dataset? Will this dataset be valuable to others in the form in which it is released? Does it fill an unmet need? 5 = Enabling: The newly released datasets should affect other people's choice of research or development projects to undertake. 4 = Useful: I would recommend the new datasets to other researchers or developers for their ongoing work. 3 = Potentially useful: Someone might find the new datasets useful for their work. 2 = Documentary: The new datasets are useful to study or replicate the reported research, although for other purposes they may have limited interest or limited usability. (Still a positive rating) 1 = No usable datasets submitted. RECOMMENDATION (1-5) There are many good submissions competing for slots at ACL 2014; how important is it to feature this one? Will people learn a lot by reading this paper or seeing it presented? In deciding on your ultimate recommendation, please think over all your scores above. But remember that no paper is perfect, and remember that we want a conference full of interesting, diverse, and timely work. If a paper has some weaknesses, but you really got a lot out of it, feel free to fight for it. If a paper is solid but you could live without it, let us know that you're ambivalent. Remember also that the authors have a few weeks to address reviewer comments before the camera-ready deadline. Should the paper be accepted or rejected? 5 = This paper changed my thinking on this topic and I'd fight to get it accepted; 4 = I learned a lot from this paper and would like to see it accepted. 3 = Borderline: I'm ambivalent about this one. 2 = Leaning against: I'd rather not see it in the conference. 1 = Poor: I'd fight to have it rejected. REVIEWER CONFIDENCE (1-5) 5 = Positive that my evaluation is correct. I read the paper very carefully and am familiar with related work. 4 = Quite sure. I tried to check the important points carefully. It's unlikely, though conceivable, that I missed something that should affect my ratings. 3 = Pretty sure, but there's a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper's details, e.g., the math, experimental design, or novelty. 2 = Willing to defend my evaluation, but it is fairly likely that I missed some details, didn't understand some central points, or can't be sure about the novelty of the work. 1 = Not my area, or paper is very hard to understand. My evaluation is just an educated guess. PRESENTATION FORMAT Papers at ACL 2014 can be presented either as poster or as oral presentations. If this paper were accepted, which form of presentation would you find more appropriate? Note that the decisions as to which papers will be presented orally and which as poster presentations will be based on the nature rather than on the quality of the work. There will be no distinction in the proceedings between papers presented orally and those presented as poster presentations. RECOMMENDATION FOR BEST LONG PAPER AWARD (1-3) 3 = Definitely. 2 = Maybe. 1 = Definitely not.
Selfie City [selfiecity.net], developed by Lev Manovich, Moritz Stefaner, Mehrdad Yazdani, Dominikus Baur and Alise Tifentale, investigates the socio-popular phenomenon of self-portraits (or selfies) by using a mix of theoretic, artistic and quantitative methods.
The project is based on a wide, sophisticated analysis of tens of thousands of selfies originating from 5 different world cities (New York, Sao Paulo, Berlin, Bangkok, Moscow), with statistical data derived from both automatic image analysis and crowd-sourced human judgements (i.e. Amazon Mechanical Turk). Its analysis process and its main findings are presented through various interactive data visualizations, such as via image plots, bar graphs, an interactive dashboard and other data graphics.
Accordingly, Selfie City is able to provide quick access to all female, heavily left-leaning, widely smiling, glasses-wearing selfies from New York, if you would be interested in this particular demographic. It also demonstrates some preliminary indications on how most people that are featured on selfies are relatively young, how significantly more women take selfies than men, and how women tend to strike more extreme poses.
Several research groups around the world in the area of mobility and transportion optimization are exploring the use of a particular slime mould, Physarum polycephalum (the "many-headed slime"), to establish the most efficient routes around congested cities and countries.
The moeba-like creature forages for food by sending out branches (plasmodia) from a central location, with a speed of approximately 1cm per hour in optimum conditions. Even though it forms long, sprawling networks, it biologically still remains a single cell. As the creature uses its tentacles to explore for nearby food sources, and then thins out those part that do not contribute, it is able to find the most effective way of linking together scattered sources of food, or even find the shortest path through a maze.
As a by-product of this biomimicry-inspired optimization, it also creates some intriguing physical maps.
Several more detailed news articles on this research method are available at The Guardian, New Scientist and Discover Magazine. Alternatively, the academic paper "Rules for Biologically Inspired Adaptive Network Design" can be found here.
Each country on the low-polygon count, interactive 3D globe can be selected, in order to reveal how much aid money is directed from and to that specific country. A separate stacked flow chart can be further explored to investigate how popular destinations and origins trend over time.
I should make a blog where all I do is scatterplot results tables from papers. I do this once in a while to make them eaiser to understand…
I think the following are results are from Yee Whye Teh’s paper on hierarchical Pitman-Yor language models, and in particular comparing them to Kneser-Ney and hierarchical Dirichlets. They’re specifically from these slides by Yee Whye Teh (page 25), which shows model perplexities. Every dot is for one experimental condition, which has four different results from each of the models. So a pair of models can be compared in one scatterplot.
- ikn = interpolated kneser-ney
- mkn = modified kneser-ney
- hdlm = hierarchical dirichlet
- hpylm = hierarchical pitman-yor
My reading: the KN’s and HPYLM are incredibly similar (as Teh argues should be the case on theoretical grounds). MKN and HPYLM edge out IKN. HDLM is markedly worse (this is perplexity, so lower is better). While HDLM is a lot worse, it does best, relatively speaking, on shorter contexts — that’s the green dot, the only bigram model that was tested, where there’s only one previous word of context. The other models have longer contexts, so I guess the hierarchical summing of pseudocounts screws up the Dirichlet more than the PYP, maybe.
The scatterplot matrix is from this table (colored by N-1, meaning the n-gram size):
In recent years, there’s been an explosion of free educational resources that make high-level knowledge and skills accessible to an ever-wider group of people. In your own field, you probably have a good idea of where to look for the answer to any particular question. But outside your areas of expertise, sifting through textbooks, Wikipedia articles, research papers, and online lectures can be bewildering (unless you’re fortunate enough to have a knowledgeable colleague to consult). What are the key concepts in the field, how do they relate to each other, which ones should you learn, and where should you learn them?
Courses are a major vehicle for packaging educational materials for a broad audience. The trouble is that they’re typically meant to be consumed linearly, regardless of your specific background or goals. Also, unless thousands of other people have had the same background and learning goals, there may not even be a course that fits your needs. Recently, we (Roger Grosse and Colorado Reed) have been working on Metacademy, an open-source project to make the structure of a field more explicit and help students formulate personal learning plans.
Metacademy is built around an interconnected web of concepts, each one annotated with a short description, a set of learning goals, a (very rough) time estimate, and pointers to learning resources. The concepts are arranged in a prerequisite graph, which is used to generate a learning plan for a concept. In this way, Metacademy serves as a sort of “package manager for knowledge.”
Metacademy also has wiki-like documents called roadmaps, which briefly overview key concepts in a field and explain why you might want to learn about them; here’s one we wrote for Bayesian machine learning.
Many ingredients of Metacademy are drawn from pre-existing systems, including Khan Academy, saylor.org, Connexions, and many intelligent tutoring systems. We’re not trying to be the first to do any particular thing; rather, we’re trying to build a tool that we personally wanted to exist, and we hope others will find it useful as well.
Granted, if you’re reading this blog, you probably have a decent grasp of most of the concepts we’ve annotated. So how can Metacademy help you? If you’re teaching an applied course and don’t want to re-explain Gibbs sampling, you can simply point your students to the concept on Metacademy. Or, if you’re writing a textbook or teaching a MOOC, Metacademy can help potential students find their way there. Don’t worry about self-promotion: if you’ve written something you think people will find useful, feel free to add a pointer!
We are hoping to expand the content beyond machine learning, and we welcome contributions. You can create a roadmap to help people find their way around a field. We are currently working on a GUI for editing the concepts and the graph connecting them (our current system is based on Github pull requests), and we’ll send an email to our registered users once this system is online. If you find Metacademy useful or want to contribute, let us know at feedback _at_ metacademy _dot_ org.
The full-window infographic consists of a collection of interactive line graphs that smoothly animate between different arguments. Overall, the piece explores the question how the U.S. population spends money, and how that has changed over time.
Consequently, U.S. spending data is analyzed in terms of its correlation with GDP and its overall composition (e.g. food, clothing, gasoline). Apparent trends, like the impact of the housing crisis, the increasing popularity of online shops, and the importance of cars and gasoline are highlighted and succinctly discussed.