Raphaël Bastide, Handmade Deep Dream (2015). If this were a real Deep Dream image these would be dogs probably.
Antony Antonellis, A-ha Deep Dream (2015).
By way of recap: Deep Dream uses a machine vision system typically used to classify images that is tweaked so that it over-analyzes images until it sees objects that aren't "really there." The project was developed by researchers at Google who were interested in the question, how do machines see? Thanks to Deep Dream, we now know that machines see things through a kind of fractal prism that puts doggy faces everywhere.
It seems strange that Google researchers would even need to ask this question, but that's the nature of image classification systems, which generally "learn" through a process of trial and error. As the researchers described it,
we train networks by simply showing them many examples of what we want them to learn, hoping they extract the essence of the matter at hand (e.g., a fork needs a handle and 2-4 tines), and learn to ignore what doesn't matter (a fork can be any shape, size, color or orientation). But how do you check that the network has correctly learned the right features? It can help to visualize the network's representation of a fork.
Strangely, the same question was actually posed during Rhizome's Seven on Seven conference this year, by Adam Harvey, Mike Krieger, and Trevor Paglen. The group was able to reverse engineer a machine vision system to generate what could be described as its archetypal image of a "goldfish." It looked like a kind of orange blur against a blank background. Their goal, though, was not to check the accuracy of the system exactly, but to begin to better understand how such systems might be altering visual culture and society.
In the case of the Google researchers, they didn't simply try to isolate these archetypal images, but to tweak parts of existing images until they became recognizable to the machine as particular objects.
Johan Nordberg, Inside an Artificial Brain (2015). (H/T Kari Altmann.)
I was curious about why all the Deep Dream images seemed to have eyes in them, and dog faces. This is part of the appeal, I think; the promise of Deep Dream is that it allows people to "see" algorithms, which are often invoked in the modern press as a kind of all-powerful sorcery. Now that these hidden forces are finally visible, we know that they are actually eyes hidden in everything, watching us. So on one level, these visualizations are deeply satisfying representations of digital wizarding.
This Reddit thread, though, offered some more useful insight. An image classification system must be "trained" on an image set. Before it can identify a fork, it must be fed a number of pictures of forks, so that it can analyze their key characteristics. The dataset used to train the Deep Dream system, according to knowledgeable Reddit user emptv, is called ImageNet, which contains many image sets of dogs. In an alternative quickly proposed by other Reddit users, Deep Dream could alternatively be trained on the dick algorithm, and then it would see dicks everywhere.
According to the New York Times, ImageNet was initiated by computer scientists at Stanford and Princeton in 2007 after running up against the limits of image captions supplied by internet users. They wanted to train image classification systems to recognize images based on clearly captioned photos, not the kind of trollish, inane labels slapped onto images by most internet users. They built a database of 14 million human-labeled images. "Each year, ImageNet employs 20,000 to 30,000 people," says the Times, "who are automatically presented with images to label, receiving a tiny payment for each one." (About those tiny payments...)
In particular, the subset of the ImageNet data used by Google is from a smaller group of images released in 2012 as part of an important annual image-recognition competition/conference. The contest is like this:
Presented with an image of some kind, the first task is to decide whether it contains a particular type of object or not. For example, a contestant might decide that there are cars in this image but no tigers. The second task is to find a particular object and draw a box around it. For example, a contestant might decide that there is a screwdriver at a certain position with a width of 50 pixels and a height of 30 pixels.
And in 2012, as it happens, there was also a bonus task: "Fine-grained classification on 120 dog sub-classes!"
This is a papillon?
And of course, as one of the teams from 2012 put it, "since bodies of dogs are highly deformable, the parts being most reliably detectable are their heads....Therefore, we use a simple head detector by applying a hough circle transform to find eyes and noses." (I mean, I get that deformable is a word that computer vision people use a lot, but still, are you listening to yourselves?) There seem to be no humans in the subset of images used in the Deep Dream release, perhaps because of likely qualms about what it might mean to "classify" humans, but no such qualms apply to other species; there are many other species of animal as well, probably all of which have "highly deformable" bodies and more easily detectable eyes.
The fact that dog recognition was identified as an additional task might have been a way of making ImageNet more appealing to the interests of the general population. In the Times article cited above, a Google researcher is quoted saying that "Most people are more interested in Lady Gaga or the iPod Mini than in this rare kind of diplodocus." Dog breeds are a kind of happy medium. They appeal to the classificatory mania that machine vision researchers seem to have inherited from the 19th century Natural History Museum. And, they are popular on the internet.
So this is what we get. We clicked on doggy pictures so much, and now everything is turning into weird half-doggy monsters. And the dick pics we clicked on are coming for us next.