Unbound: The Politics of Scanning

There's a great scene in the first episode of House of Cards where the ambitious young journalist Zoe Barnes is sitting on the floor of her rented apartment's living room scanning the half-shredded documents of an education bill that was forwarded to her by her source/lover Frank Underwood, the Majority Whip. She's drinking wine, taking notes on her laptop, and scanning on her small all-in-one desktop printer/scanner. The next day she shows up at the office of the newspaper where she works with a 3000-word text and the 300-page document scanned, prompting her editors that "We should get this online right away."

Barnes's character is young and ambitious. Later in the season she moves on to work for a site called "Slugline," an early-Politico-like newswire, where "journalists post news directly from their phones." Her obsession with technology is used as a narrative device in the series to set her apart from her older, more conservative editors at the newspaper. And her ambition to upload information to the newspaper's site as soon as possible, to give the public the raw data before it can be filtered or analyzed, stands for her idealism.

The romanticized image of the scanner is based on the assumption that by scanning and uploading we make information available, and that that is somehow an invariably democratic act. Scanning has become synonymous with transparency and access. But does the document dump generate meaningful analysis, or make it seem insignificant? Does the internet enable widespread distribution, or does it more commonly facilitate centralized access? And does the scanner make things transparent, or does it transform them? The contemporary political imaginary links the scanner with democracy, and so we should explore further the political possibilities, values, and limitations associated with the process of scanning documents to be uploaded to the internet.

What are the political possibilities of making information available? A thing that is scanned was already downloaded, in a sense. It circulated on paper, as widely as newspapers or as little as classified documents. And interfering with its further circulation is a time-honored method of keeping a population in check. Documents are kept private; printing presses shut down. Scanning printed material for internet circulation has the potential to circumvent some of these issues. Scanning means turning the document into an image, one that is marked by glitches and bearing the traces of editorial choices on the part of the scanner. Although certain services remain centralized and vulnerable to political manipulation, such as the DNS addressing system, and government monitoring of online behavior is commonplace, there is still political possibility in the aggregate, geographically dispersed nature of the internet. If the same document is scanned, uploaded, and then shared across a number of different hosts, it becomes much more difficult to suppress. And it gains traction by circulation.

Collect and disperse

 Grounds of the former estate of Viktor Yanukovych, Mezhyhirya. Via Flickr.

When Viktor Yanukovych—who was the President of Ukraine from 2010 and until he was ousted following violent mass demonstrations all over the country—fled to Russia in February 2014, hundreds of folders were spotted in the reservoir in the ex-President's estate. A group of journalists and activists arrived at the site and thus began the project to save and make available the papers documenting Yanukovych's corrupt, violent regime. They did not leave the mansion for days, drying and scanning the documents—a scenario that raises the bar considerably on Zoe Barnes. In an interview with Mashable, one of the members of the team (the technologist who built the website onto which they uploaded content), explained that their "first priority was to get the information out. The situation has turned out very well so far, but at the start we didn't have any idea whether everybody would be removed. So we wanted to get documents secured, and get them out in public to show that this is about transparency and accountability."

There are 23,456 documents on the group's website. Earlier this summer, they organized a public event in the ex-President's compound dedicated to the connection between investigative journalism, digital activism, and leaks. And they won the Reporters Without Borders Award at The Bobs, the best of online activism awards. Currently, the Ukrainian government is moving ahead with extensive anti-corruption legislation, designed to address public concerns raised in the aftermath of these disclosures. The importance of the Yanukovych leaks project is exactly in diving deep into that which was kept secret (journalists were never allowed within a safe distance of the President's estate) and making it available. And the enormous impact of this project is still being analyzed and reported. It is a pointed example of the tremendous democratizing effects that can be sparked by the act of scanning. The leaks group provide the first step in analyzing the time and impact of Yanukovych's rule by making the raw information available. But can that be called reportage? When we use terms like "democratizing," we should also bring up the question of responsibility, in this case meaning not only for making something available, but for generating analysis and public discourse around it.

We are living in an age where activism is marked by the information economy, where it is undeniable that Wikileaks is one of the organizations most influential in shaping the international political reality. And where making information available does, in fact, oftentimes mean making a difference. Because the release of documents is viewed as a positive, even heroic gesture, the analysis thereof may be lackluster. The visual image of the scanned documents provides the caché of accuracy and transparency, even in the absence of necessary mediation. Scans are raw material, not journalism. They offer support to a story and give the impression of truthfulness. Wikileaks, for example, benefits enormously from the expanse of the internet, allowing it to dump all of the information it makes available on its website, thus shifting the role of newspapers to no longer publish information, but rather, to organize it. As Julian Assange said, "It's too much; it's impossible to read it all, or get the full overview of all the revelations."

Scribd aesthetics

Partly as a result of examples like the Yanukovych document dump, scanned pages embedded in any news story have become an incredibly strong tool in reporting, a badge of transparency and credibility. By sharing a PDF, journalists are telling readers that their analysis is rooted in uncompromised information.

One of the visual tropes of the journalistic scan is the embed, in which a PDF is displayed in an iframe document embedded within other contexts. In most cases, this is made possible by Scribd's branded reader, which it offers to media partners ranging from the New York Times and the Chicago Tribune to the Huffington Post, TechCrunch, or MediaBistro. In a story analyzing a certain document, embedding said document within a Scribd reader—the advantage of which is that it's not downloadable and also that it allows readers to look at the content without needing to install a PDF reader on their browser—gives the story credibility. No one expects readers to go through the entire document attached to a story: that is why we read the analysis. But the image of reliability in reporting, especially post-Wikileaks, includes scanned documents, preferably with somewhat blurry text or at least with information crossed out in order to protect confidential information.

Scribd developed an HTML5 technology specifically to allow a way to share full documents planted in other stories, bringing it closer to what Wired described in 2008 as "its goal of becoming the YouTube of online document sharing." And like YouTube, Scribd makes downloading seem unnecessary. But it isn't. When documents face possible censorship, for example, sharing them on Scribd makes it particularly easy to block access. The act of scanning is intended to create an easily duplicated file, one that can be copied into and shared from anyone's hard drive. But as with so many other online cultural forms, the scan is more often hosted with convenient, centralized, easily controlled services, limiting its potential for political disruption.


While Scribd hosts scanned documents on centralized servers, other services centralize the indexes that are used to access PDFs. The sheer volume of scanned works now available means that their accessibility is often governed by algorithms, many of which involve Optical character recognition (OCR), the process  of converting scanned images of text into searchable text. OCR has become extremely noticeable since Google includes text scanned via Google Books and analyzed through OCR in its search results.

The full-text search option in books is extremely useful in academia and is the central aspect of the HathiTrust Digital Library, a partnership of dozens of universities that allows users to search the texts of millions of books in its collection. The search hits only include the full citation (including page number) of books in which keywords are mentioned, and allows permitted users to access said books online. HathiTrust was taken to court for violating copyright law, which resulted in June in the Second Circuit Court of Appeals in New York State deciding that the full-text search offered by HathiTrust is "transformative," thus falling under fair use. From the court documents: "a transformative work is one that serves a new and different function from the original work and is not a substitute for it." The US court decision is fascinating in the way it considers how technology alters our use of text, because the content of HathiTrust is (theoretically) the same as the original text; merely transforming this text into data allowed it to serve a "new and different function."

The ruling that OCR technology is transformative paves the road for much circumvention of copyright laws on scanned documents. The push-pull between the contemporary digital media landscape and copyright laws tests the assumption that availability is a fundamental good and accessibility (via search) is a proprietary service. It remains to be seen how this ruling will be translated in a world of Google Books. (The Authors Guild, which is the institution that took HathiTrust to court, also filed a class-action lawsuit against Google Books, which was dismissed in 2013.) The case of Google Books and its unknown future (Google reserves the right to offer paid subscriptions to the content it scans, for example) is just one example of the fact that scanning information does not mean liberating it. In fact, oftentimes, it means some corporations stand to gain a lot from facilitating access to said information—and then charging an entry fee. When the printed page takes a digital, informational form, it does not mean that free access will inevitably follow. While many countries are in the planning stages of digitizing their cultural heritage, Google Books attempts to present itself as a shortcut, free for governments and potentially very expensive for the public. 

The Politics of Circulation

Jenny Holzer, Top Secret 24 Black U.S. Government Document (2011). Sprüth Magers.

The loosening of control of information offered by a simple desktop scanner is changing our media landscape. In the information economy, making knowledge available can be a substantial source of political agency, and indeed, many of the most shattering political realizations on the international stage were the result of whistleblowers and larger organizations offering public access to knowledge that was intentionally withheld. The double representation—of technology and of political dissent—associated with the act of scanning in these contexts is emblematic of this moment in which we are still regulating the seemingly infinite amount of information that can be shared online. The apparent boundlessness of this technology equals the unimaginable amount of information out there, much of which is hidden from the public, much else behind paywalls.

As part of ongoing conversations about the effects of technology on political knowledge and participation, we should reassess the presentation of information and the way we read it. Just what the effect of digital activism on the world stage could be remains to be seen, but access to information definitely has the potential to reshuffle power structures. We just need to be very careful about the way we present it, lest we conflate access and a critical assessment.