The Hidden Labor of Teaching Machines to See

Sebastian Schmieg's Segmentation.Network, now showing on the front page of Rhizome.org, plays back over 600.000 segmentations manually created by Mechanical Turk workers for Microsoft's COCO image recognition dataset. The COCO dataset – short for Common Objects in Context – is derived from photos on Flickr and is used in machine learning for training and testing purposes.

To teach a machine to see objects, one must show it a large number of images of objects. But before that, one must answer the question, what is an object?

The creators of Microsoft COCO (Common Objects in Context), a new image set for machine learning, began with the idea that image recognition has been limited by an overreliance on iconic views of objects.

For example, when performing a web-based image search for the object category “bike,” the top-ranked retrieved examples appear in profile, unobstructed near the center of a neatly composed photo.

In such images, a clear boundary has been drawn between an object and its context. Amid the chaos and clutter of everyday life, such boundaries are often harder to draw. The subject gives birth to the object.

The subjects who gave birth to Microsoft COCO included "several children ranging in ages from 4 to 8 [who] were asked to name every object they see in indoor and outdoor environments." After a list of categories were compiled, the researchers used keywords to collect 328,000 images of complex everyday scenes from Flickr ("which tends to have fewer iconic images.") These were then given to Mechanical Turk workers, who were asked to identify and outline particular images within them. They had to determine which kinds of objects were present in an image, label each one, and draw an outline around it. In all, as The Creators Project notes, 70,000 worker hours were spent in the creation of the image set.

The resulting drawings created by this labor force are the basis of Sebastian Schmieg's recent web-based work, Segmentation.Network, now on the front page. The work makes visible some of the hidden labor that goes into the black box of machine vision, the many subjectivities that contribute to a machine's ability to "give birth to the object."

As Schmieg writes,

the piece addresses machine vision as an act of conscious selection: what can and should be seen by machines and what will remain unrecognised or deemed irrelevant is separated by distinct lines.

Hence, neural networks and artificial intelligence in general can be considered a collective and rather introspective endeavor and achievement.