Skip navigation.
Home
Semantic Software Lab
Concordia University
Montréal, Canada

Blogroll

Room Sharing for ICML (and COLT, and ACL, and IJCAI)

Machine Learning Blog - Mon, 2016-05-02 13:32

My greatest concern with the many machine learning conferences in New York this year was the relatively high cost that implied, particularly for hotel rooms in Manhattan. Keeping the conference affordable for graduate students seems critical to what ICML is really about.

The price becomes much more reasonable if you can find roommates to share the price. For example, the conference hotel can have 3 beds in a room.

This still leaves a coordination problem: How do you find plausible roommates? If only there was a website where the participants in a conference could look for roommates. Oh wait, there is. Conferenceshare.co is something new which might measurably address the cost problem. Obviously, you’ll want to consider roommate possibilities carefully, but now at least there is a place to meet.

Note that the early registration deadline for ICML is May 7th.

Categories: Blogroll

Quora session

Machine Learning Blog - Tue, 2016-04-19 08:46

I’m doing a Quora Session today that may be of interest. I’m impressed with both the quality and quantity of questions.

Categories: Blogroll

ICML registration is live

Machine Learning Blog - Fri, 2016-04-08 15:30

Here. I would recommend registering early because there is a difficult to estimate(*) chance you will not be able to register later.

The program is shaping up and should be of interest. The 9 Tutorials(**), 4 Invited Speakers, and 23 Workshops are all chosen, with paper decisions due out in a couple weeks.

Early Full (after May 7) Student 510 640 Regular 840 1050

These numbers are as aggressively low as the local chairs and I can sleep with at night. The prices are higher than I’d like (New York is expensive), but a bit lower than last year, particularly for students(***).

(*) Relevant facts:

  1. ICML 2016: submissions up 30% to 1300.
  2. NIPS 2015 in Montreal: 3900 registrations (way up from last year).
  3. NIPS 2016 is in Barcelona.
  4. ICML 2015 in Lille: 1670 registrations.
  5. KDD 2014 in NYC: closed@3000 registrations 1 week before the conference.

I tried to figure out how to setup a prediction market to estimate what will happen this year, but didn’t find an easy-enough way to do that.

(**) I kind of wish we could make up the titles. How about: “Go is Too Easy” and “My Neural Network is Deeper than Yours”?

(***) Sponsors are very generous and are mostly giving to defray student costs. Approximately every dollar of the difference between Regular and Student registration is due to company donations. For students, also note that there will be some scholarship opportunities to defray costs coming out soon.

Categories: Blogroll

Insights into Nature’s Data Publishing Portal

Semantic Web Company - Wed, 2016-03-30 05:05

In recent years, Nature has adopted linked data technologies on a broader scale. Andreas Blumauer was intrigued to discover more about the strategy and technologies behind. He had the opportunity to talk with Michele Pasin and Tony Hammond who are the architects of Nature’s data publishing portal.

 

Semantic Puzzle: Nature’s data publishing portal is one of the most renowned ones in the linked data community. Could you talk a bit about its history? Why was this project initiated and who have been the brains behind it since then?

Michele Pasin: We have been involved with semantic technologies at Macmillan since 2010. At the time it was primarily my colleague Tony Hammond who saw the potential of these technologies for metadata management and data sharing. Tony set up the data.nature.com portal in April 2012 (and expanded in July 2012), in the context of a broader company initiative aimed at moving towards a ‘digital first’ publication workflow.

The data.nature.com platform was essentially a public RDF output of some of the metadata embedded in our XML articles archive. This included a SPARQL endpoint for data about articles published by NPG from 1845 through to the present day. Additionally the datasets include NPG product and subject ontologies. These datasets are available under a Creative Commons Zero waiver.

The data.nature.com platform was only for external use though, so it was essentially detached from the products end users would see on nature.com. Still, it allowed us to mature a better understanding of how to make use of these tools within our existing technology stack. It is important to remember that in the years the company has been investing a considerable amount of resources on an XML-centered architecture, so finding a solution that could leverage the legacy infrastructure with these new technologies has always been a fundamental requirement for us.

More recently, in 2013 we started working on a new hybrid linked data platform, this time with a much stronger focus on supporting our internal applications. That’s pretty much around the time I joined the company. In essence, we made the point that in order to achieve stronger interoperability levels within our systems we had to create an architecture where RDF is core to the publishing workflow as much as XML is. (By the way if you are interested in the details of this, we presented a paper about this at ISWC 2014.) As part of this phase, we also built a more sophisticated set of ontologies used for encoding the semantics of our data, together with improved versions of the datasets previously released.

The nature.com ontologies portal came out in early 2015 as the result of this second phase of work. On the portal one can find extensive documentation about all of our models, as well as periodical downloads in various RDF formats. The idea is to make it easier for people – both within the enterprise and externally – to access, understand and reuse our linked data.

At the same time, since user engagement level on data.nature.com was not as good as expected, we decided to terminate that service. In the future, we plan to keep releasing periodic snapshots of the datasets and the ontologies we are using, but not a public endpoint in the immediate future.

Semantic Puzzle: As one of your visions you’re stating that your “primary reason for adopting linked data technologies is quite simply better metadata management”. How did you deal with metadata before you started with this transition? What has changed since then, also from a business point of view?

Michele Pasin:  Our pre–linked data approach to dealing with metadata and enterprise taxonomies is probably not unheard of, especially within similar sized companies: a vast array of custom-made solutions, varying from simple word documents sitting in someone’s computer, to Excel spreadsheets or, in the best of cases, database tables in one of our production systems. Of course, there were also a number of ad-hoc applications/scripts responsible for the reading/updating of these metadata sources, as often they would be critical to one or more system in the publishing workflow (e.g. think of the journal’s master list, or the list of approved article-types).

It is worth stressing that the lack of a unified technical infrastructure aspect was a key problem, of course, but not the only one. In fact I would argue that addressing the lack of a centralized data governance approach was even more crucial. For example, most often you would not know who/which department was in charge of a particular controlled vocabulary or metadata specification. In some cases, no single source of truth was actually available, because different people/groups were in charge of specific aspects of a single specification (due to their differing interests).  

Hence you need a certain amount of management buy-in to implement such a wide-ranging approach to metadata; moving to a single platform and technical solution based on linked data was fundamental, but an equally fundamental organizational change was also needed. Even more so, if one considers that this is not a time-boxed project but rather an ongoing process, an approach which pays off only as much as you can guarantee that as new products and services get launched, they all subscribe to the same metadata management ‘philosophy’.

Semantic Puzzle: One of the promises of Linked Data is that by “using a common data model and a common naming architecture, users can begin to realize the benefits and efficiencies of web scaling.” Could you describe a bit more in detail into which eco-system your content workflows and publishing processes are embedded (internally and externally) and why the use of standards is important for this?

Tony Hammond: We operate with an XML-based workflow for documents where we receive XML from our suppliers and store that within an XML database (MarkLogic). Increasingly we are beginning to move towards a dynamic publishing solution from that database. We are also using the database to provide a full-text search across all our content. In the past we had various workflows and a small number of different DTDs to reconcile, although we are currently converging on a single DTD. To facilitate search across this mixed XML content we abstracted certain key metadata elements into a common header. This was managed organically and was somewhat unpredictable both in terms of content model and naming.

By moving to a linked data solution for managing our metadata which is based on a single, core ontology we bypass our normalized metadata header and start to build on a new simpler data model (triples) with a common naming architecture. In effect, we have moved from a nominally normalized metadata to a super-normalized metadata which uses web standards for data (URI, RDF, OWL).

Semantic Puzzle: Your contents are also multimedia (image, video, …). How do you embed this non-textual contents into your linked data ecosystem? Which gateways, tools and connectors are used to bridge your linked data environment with multimedia?

Tony Hammond: Some years ago we embarked on a new initiative internally to streamline our production workflows. Our brief was to support a distributed content warehouse where digital assets would be stored in various locations. The idea was to abstract out our storage concerns and to maintain pointers to the various storage subsystems along with other physical characteristics required for accessing that storage.

In practice our main content was housed as XML documents within a MarkLogic XML database and associated media assets (e.g. images) were primarily stored on the filesystem with some secondary asset types (e.g. videos) being sourced from cloud services.

To relate a physical asset (e.g. an XML document, or a JEPG file) to the underlying concept (e.g. an article, or an image) we made use of XMP packets (a technology developed by Adobe Systems and standardized through ISO) which as simple RDF/XML descriptions allowed us to capture metadata about physical characteristics and to relate those properties to our data model. An XMP packet is a description of one physical resource and could be simply linked to the related conceptual resource.

We started this project with an RDF triplestore for maintaining and querying our metadata, but over time we moved towards a hybrid technology where our semantic descriptions were buried within XML documents as RDF/XML descriptions and could be queried within an XML context using XQuery to deliver a highly performant JSON API. These semantic descriptions enclosed minimal XMP documents which described the storage entities.

Semantic Puzzle: Nature links its datasets to external ones, e.g. to DBpedia or MeSH. Who exactly is benefiting from this and how?

Michele Pasin:  I would say that there are at least two reasons why we did this. First, we wanted to maximize the potential reuse of our datasets and models within the semantic web. Building owl:sameAs relationships to other vocabularies, or marking up our ontology classes and properties with subclass/subproperty relationships pointing to external vocabularies is a way to be good ‘linked data citizens’. Moreover, this is a deliberate attempt to counterbalance one of our key design principles: minimal commitment to external vocabularies. This approach to data modeling means that we tend to create our own models and define them within our own namespaces, rather than building production-level software against third party ontologies. It is worth pointing out that this is not because we think our ontologies are better – but because we want our data architecture to reflect as closely as possible the ontological commitment of a publishing enterprise with decades of established business practices, naming conventions etc. In other words, we aimed at creating a very cohesive and robust domain model, one which is resilient to external changes but that also supports semantic interoperability by providing a number of links and mappings to other semantic web standards.

Pointing to external vocabularies is a way to be good ‘linked data citizens’

The second reason for creating these links is to enable more innovative discovery services. For example, a nature.com subject page about photosyntesis could surface encyclopedic materials automatically retrieved from DBpedia; or it could provide links to highly cited articles retrieved from PubMed using MeSH mappings. This just scratches the surface of what one could do. The real difficulty is, how to do it in such a way that the overall user experience improves, rather than adding up to the information overload the majority of internet users already have to deal with. So at the moment, while the data people (us) are focusing on building a rich network of entities for our knowledge graph, the UX and front end teams are exploring design and interaction models that truly take advantage of these functionalities. Hopefully we see these activities continue to converge!

Semantic Puzzle: How do you deal with data quality management in general, and how can linked data technologies help to improve it?

Tony Hammond: We can distinguish between two main types of data: documents and ontologies. (And by ontologies we also comprehend thesauri and taxonomies.) Our documents are created by our suppliers using XML and are amenable to some data validations. We use automated DTD validation in our new workflow and by hand DTD validation in the older workflows. We also use Schematron rulesets to validate certain data points but these address only certain elements. We have a couple hundred Schematron rules which implement various business rules and are also synchronized with our ontologies.

Our ontologies, on the other hand, are by their nature more curated datasets. These are mastered as RDF Turtle files and stored within GitHub. These are currently maintained by hand, although we are beginning now to transition some of our taxonomies to the PoolParty taxonomy manager. We have a build process for deploying the ontologies to our XML database where they are combined with our XML documents. During this build process we both validate the RDF as well as running SPIN rules over the datasets which can validate data elements as well as expanding the dataset with new triples from rules-based inferencing.

Semantic Puzzle: For a publisher like Nature it is somehow “natural” that Linked Data is used. How could other industries make use of these principles for information management?

Tony Hammond: The main reason for using linked data is not to do with publishing the data (and indeed many other data models are generally used for data publishing), but with the desire to join one dataset with other datasets – or rather, the data within a dataset to the data within other datasets. It is for this reason that we make use of URIs as common (global) names for data points. Linking data is not just a goal in publishing data but applies equally when consuming data from various sources and integrating over those data sources within an internal environment. Indeed, arguably, the biggest use case for linked data is within private enterprises rather than surfaced on the open web. Once that point is appreciated there is no restriction on any industry in being more disposed to using linked data than any other, and it is used as a means to maximize the data surface that a company operates over.

The biggest use case for linked data is within private enterprises rather than surfaced on the open web

Semantic Puzzle: Where are the limits of Linked Data from your perspective, and do you believe they will ever be exceeded?

Tony Hammond: The limits to using linked data are more to do with top-down vs bottom-up approaches in dealing with data, i.e. linked data vs big data, or data curation vs data crunching. Linked data makes use of global names (URIs), schemas, ontologies. It is highly structured, organized data.

Now, whether it is feasible to bring this level of organization to data at large or whether data crunching will provide the appropriate insights over the data is an open question. Our expectation is that we will still need to use ontologies – and hence linked data – as an organizing principle, or reference, to guide us in processing large datasets and for sharing those data organizations. The question may be how much human curation is required in assembling these ontologies.

Michele Pasin: On a more practical level, I’d say that the biggest problem with linked data is still its rather limited adoption on a large scale. I’m referring in particular to the data publishing and reuse aspect. On this front, we really struggled to get the levels of uptake the business was expecting from us. Consider this: we have been publishing metadata for our entire archive since 2012 (approx. 1.2m documents, resulting in almost half a billion triples). However very few people made use of these data, either in the form of bulk downloads or via the SPARQL API we once hosted (and that was then retired due to low usage). This is in stark contrast with other – arguably less flexible – services we make available, e.g. the OpenSearch APIs, or a JSON REST service, which often see significant traffic.

Last year we gave a paper at the Linked Science workshop (affiliated with ISWC 2015) with the specific intent to address the problem within that community. What seemed to emerge is that possibly this has to do with the same reason why this technology has been so useful to us. RDF is an extremely flexible and powerful model, however, when it comes to data consumption and access, the average user cares more about simplicity than flexibility. Also, outside linked data circles we all know that the standard tech for APIs is JSON and REST, rather than RDF and SPARQL.

Lowering the bar to the adoption of semantic tech

The good news though is that we are seeing more initiatives aimed at bridging these two worlds. One that we are keeping an eye on, for example, is JSON-LD. The way this format hides various RDF complexities behind a familiar JSON structure makes it an ideal candidate for a linked data publishing product with a much wider user base. Which is exactly what we are looking for: lowering the bar to the adoption of semantic tech.

 

About Michele Pasin

Michele Pasin is an information architect and product manager with a focus on enterprise metadata management and semantic technologies.

Michele currently works for Springer Nature, a publishing company resulting from the May 2015 merger of Springer Science+Business Media and Holtzbrinck Publishing Group’s Nature Publishing Group, Palgrave Macmillan, and Macmillan Education.

He has recently taken up the role of product manager for the knowledge graph project, an initiative whose goal is to bring together various preexisting linked data repositories, plus a number of other structured and unstructured data sources, into a unified, highly integrated knowledge discovery platform. Before that, he worked on projects like nature.com’s subject pages (a dynamic section of the website that allow users to navigate content by topic) and the nature.com ontologies portal (a public repository of linked open data).

He holds a PhD in semantic web technologies from the Knowledge Media Institute (The Open University, UK) and advanced degrees in logic and philosophy of language from the University of Venice (Italy). Previously, he was a research associate at King’s College Department of Digital Humanities (London), where he developed on a number of cultural informatics projects such as the People of Medieval Scotland and the Art of Making in Antiquity. Online Portfolio: http://www.michelepasin.org/projects/

Michele Pasin will give a keynote at this year’s SEMANTiCS conference.

About Tony Hammond

Tony Hammond is a data architect with a primary focus in the general area of machine-readable description technologies. He has been actively involved in developing industry standards for network identifiers and metadata frameworks. He has had experience working on both sides of the scientific publishing information chain, from international research centres to leading publishing houses. His background is in physics with astrophysics.

Tony currently works for Springer Nature, a publishing company resulting from the May 2015 merger of Springer Science+Business Media and Holtzbrinck Publishing Group’s Nature Publishing Group, Palgrave Macmillan, and Macmillan Education.

Categories: Blogroll

AI's not AI

Data Mining Blog - Sun, 2016-03-27 14:04

There has been a lot of commentary recently on issues relating to an experimental chat bot that Microsoft has (or had) launched named (after, perhaps, a river in Scotland) Tay. After a brief existence online, the bot was removed due to behaviours perceived as offensive which it was persuaded to engage in. Peter Lee of MSR has this to say about it. While there is much to learn from what transpired, the thing that irks me the most is the continued use of the term Artificial Intelligence to describe these systems - Lee actually calls it an 'artificial intelligence application'. Experimenting with these interactive agents is, no doubt, a useful activity that will teach us much about how humans will interact with actual AI entities in the future, but calling a chat bot of this nature an artificial intelligence application is like calling the icing on a cake, a cake. Communicating with humans is essential to artificial intelligence; communicating as a peer in human language with not much else going on 'upstairs' is not, however, a demonstration of artificial intelligence.

Related articles How the Tech Media Keeps Artificial Intelligence at a Distance The Economist gets in on the AI Fluff AI, Artificial Birds and Aeroplanes
Categories: Blogroll

International Semantic Web Community meets in Leipzig, Sept. 12-15, 2016

Semantic Web Company - Wed, 2016-03-23 05:27

At the annual SEMANTiCS Conference, experts from academia and industry meet to discuss semantic computing, its benefits and future business implications. Since 2005, SEMANTiCS has been attracting the opinion leaders in semantic web and big data technology, ranging from information managers and software engineers, to commerce experts and business developers as well as researchers and IT architects, when it comes to defining the future of information technology.

The SEMANTiCS 2016 takes place from September 12th to 15th at the second oldest university of Germany – the Leipzig University. Leipzig University hosts several departments in particular AKSW focused on Linked Data and Semantic Web and is therefore THE European hotspot, when it comes to graph-based technologies and knowledge engineering.

You want to be a part of the SEMANTiCS Conference and are interested to get in touch with the following audiences?

  • IT professionals & IT architects
  • Software developers
  • Knowledge Management Executives
  • Innovation Executives
  • R&D Executives

Calls are open now. Industrial presentation offer a platform to reach a huge network of practicioners and users to get feedback and academic submission are published in the well-known ACM-ICPS series (deadline 21st April, 23% acceptance rate). To submit your contribution, please visit the section calls on our website. To attend the workshops, the tutorials or to enjoy the talks in one of the offered sessions, please visit our registration site.

You want to partner with SEMANTiCS 2016? Then get a sponsor package or become an exhibitor! For more details, please click here.

To be up-to-date, stay tuned and follow us on facebook, twitter (@SemanticsConf) or visit our website for the latest news.

Categories: Blogroll

AlphaGo is not the solution to AI

Machine Learning Blog - Sun, 2016-03-13 18:46

Congratulations are in order for the folks at Google Deepmind who have mastered Go.

However, some of the discussion around this seems like giddy overstatement. Wired says Machines have conquered the last games and Slashdot says We know now that we don’t need any big new breakthroughs to get to true AI. The truth is nowhere close.

For Go itself, it’s been well-known for a decade that Monte Carlo tree search (i.e. valuation by assuming randomized playout) is unusually effective in Go. Given this, it’s unclear that the AlphaGo algorithm extends to other board games where MCTS does not work so well. Maybe? It will be interesting to see.

Delving into existing computer games, the Atari results (see figure 3) are very fun but obviously unimpressive on about ¼ of the games. My hypothesis for why is that their solution does only local (epsilon-greedy style) exploration rather than global exploration so they can only learn policies addressing either very short credit assignment problems or with greedily accessible polices. Global exploration strategies are known to result in exponentially more efficient strategies in general for deterministic decision process(1993), Markov Decision Processes (1998), and for MDPs without modeling (2006).

The reason these strategies are not used is because they are based on tabular learning rather than function fitting. That’s why I shifted to Contextual Bandit research after the 2006 paper. We’ve learned quite a bit there, enough to start tackling a Contextual Deterministic Decision Process, but that solution is still far from practical. Addressing global exploration effectively is only one of the significant challenges between what is well known now and what needs to be addressed for what I would consider a real AI.

This is generally understood by people working on these techniques but seems to be getting lost in translation to public news reports. That’s dangerous because it leads to disappointment. The field will be better off without an overpromise/bust cycle so I would encourage people to keep and inform a balanced view of successes and their extent. Mastering Go is a great accomplishment, but it is quite far from everything.

Edit: Further discussion here, CACM, here, and KDNuggets.

Categories: Blogroll

How PoolParty and ISO 25964 fit together

Semantic Web Company - Fri, 2016-03-04 04:36

The release of the ISO standard for thesauri “ISO 25964 Part 1: Thesauri for information retrieval” in 2011 was a huge step, as it replaced standards that dated back to 1986 (ISO 2788) and 1985 (ISO 5964). By that, methodologies from a pre-Web era, when thesauri where rather developed to be published on paper have been further developed. The new standard also brought a shift from a term-based model to a concept-based model stating: “Each term included in a thesaurus should represent a single concept (or unit of thought)” – from: ISO 25964 Part 1, page 15. That brings it close to Semantic Web based data models like SKOS and also shows that formerly disconnected communities are now working together.

Term vs. Concept based

We are frequently asked whether PoolParty is compatible with ISO 25964. Our basic answer always is “Yes, of course” as the data model defined in the standard can be mapped to SKOS + SKOS-XL (see: http://www.niso.org/schemas/iso25964/#skos). On the other hand we also have to point out that the ISO standard defines a very comprehensive model for managing all sorts of thesauri. In contrast, SKOS focuses rather on a simple data model that allows to manage all kinds of KOS (incl. classification schemes) that can be extended if more complexity is needed. In my view, this difference also reflects the two principal ways of approaching thesaurus projects:  “top down” approach (ISO 25964) vs. “bottom up” approach (SKOS). Since we at SWC have always been following the principle “start simple and add compexity as you go/need it”, it’s quite clear where we reside: With PoolParty’s ontology management and custom schema management, taxonomists can go far beyond SKOS’s expressivity.

ISO 25964 also includes a chapter about “Guidelines for thesaurus management software” – so I tried to figure out to which degree this is covered by PoolParty. The results can now be found in PoolParty documentation.

So if you’re asked the next time “Is PoolParty compatible with ISO 25964?”, you will answer hopefully “Yes, of course – just take a look at the documentation”:

 

Categories: Blogroll

Semantic Web Company Named to KMWorld’s 2016 ‘100 Companies That Matter in Knowledge Management’

Semantic Web Company - Tue, 2016-03-01 13:13

Semantic Web Company, the leading provider of graph-based metadata, search, and analytic solutions, today announced that it has been named to KMWorld’s 2016 list of the ‘100 Companies That Matter in Knowledge Management’. This award is another important milestone for the broad acceptance of Semantic Web standards in enterprises.

“Only last year, our standards-based platform PoolParty Semantic Suite got acknowledged as Trend-Setting Product by KMWorld. We are delighted to be recognized now as an industry leader in innovation and service from KMWorld. Semantic Web technologies have an ever-increasing impact on the management of data and information of many knowledge-intensive organizations,” says Andreas Blumauer, co-founder and CEO of the Semantic Web Company.

 

KMWorld Editor-in-Chief Sandra Haimila agrees, “Being named to our list of 100 Companies That Matter in Knowledge Management is a prestigious designation because it represents the best in innovation, creativity and functionality. The 100 Companies offer solutions designed to help users and customers find what they need whenever and wherever they need it … and what they need is the ability to access, analyze and share crucial knowledge.”

More information can be found in the March print issue of KMWorld Magazine and online at www.kmworld.com.

About KMWorld

KMWorld is the leading information provider serving the Knowledge Management systems market and covers the latest in content, document and knowledge management, informing more than 30,000 subscribers about the components and processes – and subsequent success stories – that together offer solutions for improving business performance. KMWorld is a publishing unit of Information Today, Inc. www.kmworld.com

About Semantic Web Company

The Semantic Web Company was founded in 2004 and is acknowledged as a global leader in Semantic Web technologies. The company is the vendor of PoolParty Semantic Suite and is involved in R&D projects with a volume of more than 16 million EUR.

A team of Linked Data experts provides consulting and integration services for semantic data and knowledge portals. Boehringer Ingelheim, Credit Suisse, European Commission, Roche, Red Bull, and The World Bank are among many other customers, which have successfully adopted Semantic Web solutions.

More information can be found online at www.semantic-web.at.

Categories: Blogroll

Learning to avoid not making an AI

Machine Learning Blog - Wed, 2016-02-24 16:08

Building an AI is one of the most subtle things people have ever attempted with strong evidence provided by the durable nature of the problem despite attempts by many intelligent people. In comparison, putting a man on the moon was a relatively straightforward technical problem with little confusion about the nature of the solution.

Building an AI is almost surely a software problem since the outer limit for the amount of computation in the human brain is only 10^17 ops/second (10^11 neurons with 10^4 connections operating at 10^2 Hz) which is within reach of known systems.

People tend to mysticize the complexity of unknown things, so the “real” amount of computation required for a human scale AI is likely far less—perhaps even within reach of a 10^13 flop GPU.

Since building an AI is a software problem, the problem is complexity in a much stronger sense than for most problems. The effective approach for dealing with complexity is to use modularity. But which modularity? A sprawl of proposed kinds of often incompatible and obviously incomplete modularity exists. The moment when you try to decompose into smaller problems is when the difficulty of solution is confronted.

For guidance, we can consider what works and what does not. This is tricky, because the definition of AI is less than clear. I qualify AI with by degrees of intelligence in my mind—a human level AI is one which can accomplish the range of tasks which a human can. This includes learning complex things (language, reasoning, etc…) from a much more basic state.

The definition seems natural but it is not easily tested via the famous Turing Test. For example, you could imagine a Cyc-backed system passing a Turing Test. Would that be a human-level AI? I’d argue ‘no’, because the reliance on a human-crafted ontology indicates an incapability to discover and use new things effectively. There is a good science fiction story to write here where a Cyc-based system takes over civilization but then gradually falls apart as new relevant concepts simply cannot be grasped.

Instead of AI facsimiles, learning approaches seem to be the key to success. If a system learned from basic primitives how to pass the Turing Test, I would naturally consider it much closer to human-level AI.

We have seen the facsimile design vs. learn tension in approaches to AI activities play out many times with the facsimile design approach winning first, but not always last. Consider Game Playing, Driving, Vision, Speech, and Chat-bots. At this point the facsimile approach has been overwhelmed by learning in Vision and Speech while in Game Playing, Driving, and Chat-bots the situation is less clear.

I expect facsimile approaches are one of the greater sources of misplaced effort in AI and that will continue to be an issue, because it’s such a natural effort trap: Why not simply make the system do what you want it to do? Making a system that works by learning to do things seems a rather indirect route that surely takes longer and requires more effort. The answer of course is that the system which learns what might otherwise be designed can learn other things as needed, making it inherently more robust.

Categories: Blogroll

New York Machine Learning Deadlines

Machine Learning Blog - Tue, 2016-01-26 09:27

There’s a number of different Machine Learning related paper deadlines that may interest.

January 29 (abstract) for March 4 New York ML Symposium Register early because NYAS can only fit 300. January 27 (abstract)/February 2 (paper) for July 9-15 IJCAI The biggest AI conference February 5(paper) for June 19-24 ICML Nina and Kilian have 850 well-vetted reviewers. Marek and Peder have increased space to allow 3K people. February 12(paper) for June 23-26 COLT Vitaly and Sasha are program chairs. February 12(proposal) for June 23-24 ICML workshops Fei and Ruslan are the workshop chairs. I really like workshops. February 19(proposal) for June 19 ICML tutorials Bernhard and Alina have invited a few tutorials already but are saving space for good proposals as well. March 1(paper) for June 25-29 UAI Jersey City isn’t quite New York, but it’s close enough May ~2 for June 23-24 ICML workshops Varies with the workshop.
Categories: Blogroll

Web 2: But Wait, There's More (And More....) - Best Program Ever. Period.

Searchblog - Thu, 2011-10-13 13:20
I appreciate all you Searchblog readers out there who are getting tired of my relentless Web 2 Summit postings. And I know I said my post about Reid Hoffman was the last of its kind. And it was, sort of. Truth is, there are a number of other interviews happening... (Go to Searchblog Main)
Categories: Blogroll

Help Me Interview Reid Hoffman, Founder, LinkedIn (And Win Free Tix to Web 2)

Searchblog - Wed, 2011-10-12 12:22
Our final interview at Web 2 is Reid Hoffman, co-founder of LinkedIn and legendary Valley investor. Hoffman is now at Greylock Partners, but his investment roots go way back. A founding board member of PayPal, Hoffman has invested in Facebook, Flickr, Ning, Zynga, and many more. As he wears (at... (Go to Searchblog Main)
Categories: Blogroll

Help Me Interview the Founders of Quora (And Win Free Tix to Web 2)

Searchblog - Tue, 2011-10-11 13:54
Next up on the list of interesting folks I'm speaking with at Web 2 are Charlie Cheever and Adam D'Angelo, the founders of Quora. Cheever and D'Angelo enjoy (or suffer from) Facebook alumni pixie dust - they left the social giant to create Quora in 2009. It grew quickly after... (Go to Searchblog Main)
Categories: Blogroll

Help Me Interview Ross Levinsohn, EVP, Yahoo (And Win Free Tix to Web 2)

Searchblog - Tue, 2011-10-11 12:46
Perhaps no man is braver than Ross Levinsohn, at least at Web 2. First of all, he's the top North American executive at a long-besieged and currently leaderless company, and second because he has not backed out of our conversation on Day One (this coming Monday). I spoke to Ross... (Go to Searchblog Main)
Categories: Blogroll

I Just Made a City...

Searchblog - Mon, 2011-10-10 14:41
...on the Web 2 Summit "Data Frame" map. It's kind of fun to think about your company (or any company) as a compendium of various data assets. We've added a "build your own city" feature to the map, and while there are a couple bugs to fix (I'd like... (Go to Searchblog Main)
Categories: Blogroll

Help Me Interview Vic Gundotra, SVP, Google (And Win Free Tix to Web 2)

Searchblog - Mon, 2011-10-10 14:03
Next up on Day 3 of Web 2 is Vic Gundotra, the man responsible for what Google CEO Larry Page calls the most exciting and important project at this company: Google+. It's been a long, long time since I've heard as varied a set of responses to any Google project... (Go to Searchblog Main)
Categories: Blogroll

Help Me Interview James Gleick, Author, The Information (And Win Free Tix to Web 2)

Searchblog - Sat, 2011-10-08 21:16
Day Three kicks off with James Gleick, the man who has written the book of the year, at least if you are a fan of our conference theme. As I wrote in my review of "The Information," Gleick's book tells the story of how, over the past five thousand or... (Go to Searchblog Main)
Categories: Blogroll

I Wish "Tapestry" Existed

Searchblog - Fri, 2011-10-07 15:34
(image) Early this year I wrote File Under: Metaservices, The Rise Of, in which I described a problem that has burdened the web forever, but to my mind is getting worse and worse. The crux: "...heavy users of the web depend on scores - sometimes hundreds - of services,... (Go to Searchblog Main)
Categories: Blogroll

Help Me Interview Steve Ballmer, CEO of Microsoft (And Win Free Tix to Web 2)

Searchblog - Fri, 2011-10-07 13:17
Day Two at Web 2 Summit ends with my interview of Steve Ballmer. Now, the last one, some four years ago, had quite a funny moment. I asked Steve about how he intends to compete with Google on search. It's worth watching. He kind of turns purple. And not... (Go to Searchblog Main)
Categories: Blogroll
Syndicate content