(Note – for a good introduction to the history and current state of AI, see my colleague Frank Chen’s presentation here
In the last couple of years, magic started happening in AI. Techniques started working, or started working much better, and new techniques have appeared, especially around machine learning (‘ML’), and when those were applied to some long-standing and important use cases we started getting dramatically better results. For example, the error rates for image recognition, speech recognition and natural language processing have collapsed to close to human rates, at least on some measurements.
So you can say to your phone: ‘show me pictures of my dog at the beach’ and a speech recognition system turns the audio into text, natural language processing takes the text, works out that this is a photo query and hands it off to your photo app, and your photo app, which has used ML systems to tag your photos with ‘dog’ and ‘beach’, runs a database query and shows you the tagged images. Magic.
There are really two things going on here – you’re using voice to fill in a dialogue box for a query, and that dialogue box can run queries that might not have been possible before. Both of these are enabled by machine learning, but they’re built quite separately, and indeed the most interesting part is not the voice but the query. In fact, the important structural change behind being able to ask for ‘Pictures with dogs at the beach’ is not that the computer can find it but that the computer has worked out, itself, how to find it. You give it a million pictures labelled ‘this has a dog in it’ and a million labelled ‘this doesn’t have a dog’ and it works out how to work out what a dog looks like. Now, try that with ‘customers in this data set who were about to churn’, or ‘this network had a security breach’, or ‘stories that people read and shared a lot’. Then try it without labels (‘unsupervised’ rather than ‘supervised’ learning).
Today you would spend hours or weeks in data analysis tools looking for the right criteria to find these, and you’d need people doing that work – sorting and resorting that Excel table and eyeballing for the weird result, metaphorically speaking, but with a million rows and a thousand columns. Machine learning offers the promise
that a lot of very large and very boring analyses of data can be automated – not just running the search, but working out what the search should be to find the result you want.
That is, the eye-catching demos of speech interfaces or image recognition are just the most visible demos of the underlying techniques, but those have much broader applications – you can also apply them to a keyboard, a music recommendation system, a network security model or a self-driving car. Maybe.
This is clearly a fundamental change for Google. Narrowly, image and speech recognition mean that it will be able to understand questions better and index audio, images and video better. But more importantly, it will answer questions better, and answer questions that it could never really answer before at all. Hence, as we saw at Google IO
, the company is being recentred
on these techniques. And of course, all of these techniques will be used in different ways to varying degrees for different use cases, just as AlphaGo uses a range of different techniques. The thing that gets the attention is ‘Google Assistant – a front-end using voice and analysis of your behaviour to try both to capture questions better and address some questions before they’re asked. But that’s just the tip of the spear – the real change is in the quality of understanding of the corpus of data that Google has gathered, and in the kind of queries that Google will be able to answer in all sorts of different products. That’s really just at the very beginning right now.
The same applies in different ways to Microsoft, which (having missed mobile entirely) is creating cloud-based tools to allow developers to build their own applications on these techniques, and for Facebook (what is the newsfeed if not a machine learning application?), and indeed for IBM. Anyone who handles lots of data for money, or helps other people do it, will change, and there will be a whole bunch of new companies created around this.
On the other hand, while we have magic we do not have HAL 9000 – we do not have a system that is close to human intelligence (so-called ‘general AI’). Nor really do we have a good theory as to what that would mean – whether human intelligence is the sum of techniques and ideas we already have, but more, or whether there is something else. Rather, we have a bunch of tools that need to be built and linked together. I can ask Google or Siri to show me pictures of my dog on a beach because Google and Apple have linked together tools to do that, but I can’t ask it to book me a restaurant unless they’ve added an API integration with Opentable. This is the fundamental challenge for Siri, Google Assistant or any chat bot (as I discussedhere) – what can you ask?
This takes us to a whole class of jokes
often made about what does and does not count as AI in the first place:
- “Is that AI or just a bunch of IF statements?”
- “Every time we figure out a piece of it [AI], it stops being magical; we say, ‘Oh, that’s just a computation
- “AI is whatever isn’t been done yet”
These jokes reflect two issues. The first is that it’s not totally apparent that human intelligence itself is actually more than ‘a bunch of IF statements’, of a few different kinds and at very large scale, at least at a conceptual level. But the second is that this movement from magic to banality is a feature of all technology and all computing, and doesn’t mean that it’s not working but that it is. That is, technology
is in a sense anything that hasn’t been working for very long. We don’t call electricity technology, nor a washing machine a robot, and you could replace “is that AI or just computation?” with “is that technology or just engineering?”
I think a foundational point here is Eric Raymond’s rule
that a computer should ‘never ask the user for any information that it can autodetect, copy, or deduce’ – especially, here, deduce. One way to see the whole development of computing over the past 50 years is as removing questions that a computer needed to ask, and adding new questions that it could ask. Lots of those things didn’t necessarily look like questions as they’re presented to the user, but they were, and computers don’t ask them anymore:
- Where do you want to save this file?
- Do you want to defragment your hard disk?
- What interrupt should your sound card use?
- Do you want to quit this application?
- Which photos do you want to delete to save space?
- Which of these 10 search criteria do you want to fill in to run a web search?
- What’s the PIN for your phone?
- What kind of memory do you want to run this program in?
- What’s the right way to spell that word?
- What number is this page?
- Which of your friends’ updates do you want to see?
It strikes me sometimes, as a reader of very old science fiction, that scifi did indeed mostly
miss computing, but it talked a lot about ‘automatic’. If you look at that list, none of the items really look like ‘AI’ (though some might well use it in future), but a lot of them are ‘automatic’. And that’s what any ‘AI’ short of HAL 9000 really is – the automatic pilot, the automatic spell checker, the automatic hardware configuration, the automatic image search or voice recogniser, the automatic restaurant-booker or cab-caller… They’re all clerical work your computer doesn’t make you do anymore, because it gained the intelligence, artificially, to do them for you.
This takes me to Apple.
Apple has been making computers that ask you fewer questions since 1984, and people have been complaining about that for just as long – one user’s question is another user’s free choice (something you can see clearly in the contrasts between iOS and Android today). Steve Jobs once said that the interface for iDVD should just have one button: ‘BURN’. It launched Data Detectors
in 1997 – a framework that tried to look at text and extract structured data in a helpful way – appointments, phone numbers or addresses. Today you’d use AI techniques to get there, so was that AI? Or a ‘bunch of IF statements’? Is there a canonical list of algorithm that count as AI? Does it matter? To a user who can tap on a number to dial instead of copy & pasting, is that a meaningful question?
This certainly seems to be one way that Apple is looking at AI on the device. In iOS 10, Apple is sprinkling AI through the interface. Sometimes this is an obviously new thing, such as image search, but more often it’s an old feature that works better or a small new feature to an existing application. Apple really does seem to see ‘AI’ as ‘just computation’.
Meanwhile (and this is what gets a lot of attention) Apple has been very vocal that companies should not collect and analyse user data, and has been explicit that it is not doing so to provide any of these services. Quite what that means varies a lot. Part of the point of neural networks is that training them is distinct from running them. You can train a neural network in the cloud with a vast image set at leisure, and then load the trained system onto a phone and run it on local data without anything leaving the device. This, for example, is how Google Translate works on mobile – the training is done in advance in the cloud but the analysis is local
. Apple says
it’s doing the same for Apple Photos – ‘it turns out we don’t need your
photos of mountains to train a system to recognize mountains. We can get our own pictures of mountains’. It also has APIs to allow developers to run pre-trained neutral networks locally with access to the GPU. For other services it’s using ‘differential privacy’, which uses encryption to obfuscate the data such that though it’s collected by Apple and analyses at scale, you can’t (in theory) work out which users it relates to.