I (No Longer) Have a Web Site: Access, Authenticity, and the Restoration of GeoCities

"I Have a Website." JPEG version of screenshot from One Terabyte of Kilobyte Age Photo Op.

One year ago, a system developed by artists Olia Lialina and Dragan Espenschied began taking screenshots of now-defunct GeoCities webpages from the late 1990s as they would appear on hardware and software from that time. Every twenty minutes, a new screenshot is automatically uploaded to their Tumblr, One Terabyte of Kilobyte Age Photo Op. Friday marked the one-year anniversary of the project; to celebrate, Espenschied restored the three most reblogged and liked home pages posted there, as tracked by Lialina. This article covers the why and how of this restoration.

It may seem strange to say this about the likes of "Cute Boy Site" or "Divorced Dads Page," but the remains of the GeoCities web hosting service are a vital part of our cultural legacy. In its dial-up heyday, GeoCities was where non-specialist internet users made their first-ever webpages. Today, it exists as a vast, if partial, repository of the anxieties, hopes, and dreams of those creators, and offers a snapshot of the early popular usage of a now-ubiquitous cultural form, the webpage.

In 2009, Yahoo! pulled the plug on the GeoCities service, and millions of pages nearly winked out of existence entirely. This near-miss can be seen as a reflection of broader cultural priorities that undervalue the contributions of mere internet users, and as a stark illustration of the cultural cost of an overly onerous copyright regime, which makes public institutions reluctant to download vulnerable content for preservation purposes. Thankfully, renegade archivist groups the Archive Team, and others of their ilk, stepped in. Employing what they call "distributed preservation of service attacks" (a play on the more malicious "distributed denial of service attacks"), Archive Team harvested nearly 1 terrabyte of data from the GeoCities servers before they went dark, and seeded the data as a torrent.

When it comes to aging, crumbling digital artifacts, capturing the data is only half the battle. One must also provide access to this data, and doing so poses a number of thorny issues, both conceptual and technical. For the past year, artists Olia Lialina and Dragan Espenschied (Rhizome's incoming Digital Conservator) have been working through these problems for their project One Terabyte of Kilobyte Age, which re-performs the Archive Team's GeoCities data for preservationist purposes.

According to Espenschied, strategies for offering access to these pages can be "measured on two axes: authenticity… and ease of access." For example, "Graphical authenticity on the pixel level means that a GeoCities webpage will render exactly as it would to visitors in the time the page was published." To achieve this, one has to factor in the many ways in which our technological context has changed, as with the rise of anti-aliasing. As Espenschied explains, "All current operating systems render characters with smoothed out edges, and this is reflected as much in current web design as the historic aliased pixel text display influenced the web design of the past." You are almost certainly reading this article on an operating system that displays smoothed, anti-aliased text. 

Aliased pixel text display, with blocky, unsmoothed edges.

Then there's MIDI, which was a popular audio file format on the early web. MIDI tracks only store musical notation, rather than recording sound waves, allowing complex melodies to be conveyed via relatively tiny file sizes. When MIDI files were played back, they drew on a library of existing tones that were stored in a user's sound card. These tones have changed as sound cards have become more sophisticated.

Here's a MIDI file, which you can listen to through your own sound card. Here's the same MIDI file as it would have sounded through an AWE sound card, which was popular for DOS and Windows PCs. The card featured a "sound font" in which each instrument was represented by a tiny piece of audio data that was manipulated in real-time for pitch, volume, etc. Since creating a "sound font" that is this versatile and complete—at least 128 virtual "instruments"—is quite laborious, the same set of sounds was used in many sound cards and improved for later versions.

To replicate the sound, Espenschied used the software synth TiMidity, which includes a version of the AWE "sound font" that enthusiasts had ripped from the card. This software probably comes across slightly differently from an original AWE card, but it gives a closer approximation of the late-1990s sonic experience of a MIDI file.

Another issue is that that GeoCities URLs have all expired. A URL (such as http://www.geocities.com/SouthBeach/Lagoon/8322) is a crucial element of the experience of any given page, coloring how it is understood. Some pages might not even work correctly when installed on a different URL, if they include linked files with addresses on a GeoCties server. To present webpages with their original URLs intact, a proxy server must be used, which would simply redirect requests for GeoCities pages to the server where they are stored.

Displaying a webpage from an expired domain with its original URL requires the use of a proxy server.

Finally, as Espenschied observes, screens have changed drastically. All graphical output looks very different on CRT monitors and their specific surface-to-pixel ratios than it does on contemporary flat screens. When looking at historic webpages on a 800×600 pixel 14" CRT screen with a 60Hz refresh rate, it becomes clear why many people decided to use dark backgrounds and bright text for their designs instead of emulating paper with black text on a white background.

Thus, the most "authentic" access to a page from the GeoCities archive would involve historic hardware, historic software, and a proxy server—for most users, an unrealistic scenario.

To bring the GeoCities project to the broadest possible audience while presenting it in a way that is as consistent as possible with the original visual experience, Lialina and Espenschied set up an automated system that takes screenshots of GeoCities webpages and posts them to a Tumblr, One Terabyte of Kilobyte Age Photo Op. In a nutshell, the system loads pages in a virtualized, period-appropriate hardware and software environment and captures screenshots at 800x600 pixels. The Tumblr represents a trade-off between authenticity and ease-of-access.

For the one-year anniversary of One Terabyte of Kilobyte Age Photo Op, Espenschied restored three GeoCities websites that Lialina had determined to be the three most popular. These restorations involved a certain amount of interpretation. As Espenschied writes, 

The pixels generated by contemporary browsers are not the same as the ones rendered by a 1997/98 browser, the URLs are not the original ones; however, the interactivity comes close to the original, the graphics look fine enough and are animated. As a bonus, a first time in website restoration, the embedded MIDI files have been transformed into audio recordings using TiMidity and a Soundblaster AWE32 instrument set.

The three restored webpages and Espenschied's notes on each are presented below: 


I have a website

Espenschied writes,

All material except the counter image was present in the Archive Team’s Geocities torrent distribution. The missing image icq.JPG probably never was uploaded by the user; it is not present on any public GeoCities mirror.

The counter image was lifted from the Wayback Machine. The original URL, http://www.geocities.com/cgi-bin/counter, was probably working with browser referrer information to assign the counter to a certain webpage. The Internet Archive’s web crawler saved the counter showing 0000 over a few years. We will not be able to reconstruct the number of visitors to the page, but at least we can imagine how it looked.

The MIDI file embedded in the page, a version of Celine Dion’s "My Heart Will Go On," is heavily damaged and produces strange noises when played back via TiMidity. I haven’t verified how it would be interpreted on a legacy system, but since the MIDI file specification is not met in this file it will for sure not reproduce a perfect version of the song. (The file is damaged or not present in all public mirrors of GeoCities.)


Cute Boy Site

Espenschied writes,

This simple home page posed no further problems. "As Long As You Love Me" by the Backstreet boys was conserved in a perfect MIDI version. The missing imagedevlayy.jpg never left the author’s hard disk, it is referenced outside of the homepage’s root directory in a folder called "Annies GirlClub."


Divorced Dads Page

Espenschied writes,

From Divorced Dads, the Archive Team's copy only contains the main page. I took some missing pages and this from reocities. The downside of reocities is that there is no Last-Modified header delivered from the server. The upside is that the original HTML is less modified than on the Wayback Machine. Thankfully Wayback delivers original Last-Modified dates in extra HTTP headers, so I was able to transfer this metadata to the reocities copies.

The banner on the bottom was replaced with a generic banner ad from this particular banner exchange service from 2003, as found on the Wayback Machine.

The top of the page features a Java applet called "GeoGuide" that is referenced on many GeoCities home pages. Unfortunately, Java applets have posed issues for webcrawler-based archiving, since they are opaque blobs of code that might load further resources, for example images or object code libraries. Most crawlers wouldn’t even download the applet files because of the low likeliness that they would work later. There is no public mirror of GeoCities available that contains this applet, and until now no screenshot or other form of documentation of GeoGuide was found.

The counter used to be delivered from a personalized URL,http://www.geocities.com/cgi-bin/counter/jacquestheman, the first time it was checked on the Wayback Machine in 2003 was already producing a "file not found." Since GeoCities moved their user tracking to a separate server, visit.geocities.com, I decided to look there and indeed found four zeroes printed in a nice font, still alive. This might be the counter the page’s author customized for himself, or it might not be.

All sub pages use a non-standard font called “Paramount” <FONT FACE="Paramount">. A metadata tag <META NAME="GENERATOR" CONTENT="Mozilla/4.01 [en] (Win95; I) [Netscape]"> hints towards Windows 95 being the platform the pages were created on, but there is no information available about what this font might be. There are some freeware fonts with that name, but no font of such a name was ever included in for example the “Microsoft Plus” packs for Windows that gave users extra features and fonts; Microsoft Office never shipped with a Paramount. Since the choice of font would be too arbitrary and the likeliness of page visitors having exactly this font installed in 1997/98 to actually see it is very low, I decided to leave the browser’s default font in place.


See also: