Big news: My work on Geocities is now part of the European project Digital Art Conservation. For the whole of 2012 I will be working1 for the University of Applied Arts in Switzerland in the Analit research project, which has the stated goal of “moving from case studies to a conservation matrix” in digital archiving.

My job is to make it possible to meaningfully experience Geocities again. Of course this heap of data is a good way to examine how such artifacts can be handled. Reflecting on what I came to know until now, digital conservation and archeology is still struggling with many issues:

  • The logic of business dominates the creation and preservation of digital culture.
  • Objects considered art get a premium conservation treatment. Especially considering internet art this seems weird, because if the “low culture” is not being preserved next to them, the art pieces are removed from any context. Meaningful internet art is not about uploading a self-sufficient piece of data and use the net just as a distribution channel, it it deeply interwoven with the the net as a whole.
  • The second kind of objects waiting in line to be conserved are off-mainstream, weird, outstanding artifacts that do not represent the day-to-day reality of digital life. Of course artifacts that are denying themselves to fed into the mainstream systems are very interesting, but to favor them is distorting what actually happens and smells of hacker elitism.
  • The idea of original objects, authenticity and defined authorship still bleeds over from library science into interpreting digital mass culture.
  • Practices of how digital cultural artifacts were produced or used is not taken into account enough, as pointed out by Camille Paloque-Berges in the panel at Transmediale that Olia, me and her took part in.

Incorporating historic artifacts into new works of art or transferring them into other contexts seems to be the most successful strategy for preservation at the moment. For example, The Deleted City, an interactive piece by Richard Vijgen, is an attractive interface to Geocities data. It does not have to do much with what Geocities was or what it means today, but attracts attention to it and does not work without keeping this data. Museums buying this artwork will have to make sure that they keep a copy of Geocities around.

If the goal is to use Geocities as research material or make it possible to experience it again in different grades of realism or meaningfulness, it seems crucial to work with different versions and collections. There are already a lot of  contradictions, redundancies and at the end of last year, Jason Scott annouced on Twitter that a new pile of Geocities files is available on archive.org. I am also hoping to cooperate with the person behind oocities to merge his and the the ArchiveTeam’s collections. This situation will be typical for future archiving efforts and I will try to create a meaningful  framework to handle this situation with a defined ingression process.

What will guide this “normalization work” is the firm conviction that Geocities belongs to the people. Olia and me certainly hope that it can help building an identity and history for personal computer and net users. So while this blog has some code and instructions sprinkled around, everything will now end up on github, in well-defined order and including documentation. As soon as there is something usable there, you’ll be the first to know!

Also, as the ArchiveTeam with the release of their torrent masterfully pioneered, there are other ways of keeping digital heritage alive and meaningful. We will certainly investigate several possibilities of spreading the knowledge and love.

And let me remind you that there is still this torrent seeding.


  1. Olia is still available for funding! []

The web is almost 20 years old now. Throughout these years, it has transformed from a new medium to new media in the best very sense of the word – it continues to evolve. Its “stupidity”, its neutrality, which lies at the core of internet architecture, allows it to continue to grow but without ever really growing old or mature.

Dealing with an eternally young medium means that we always have to deal with something new – technically as well as ideologically and aesthetically. It means that the dying of web pages, users or services is seen as a natural process. And it makes no sense to speak of a project reaching middle age, because age has no value here. Getting old is something that you don’t do on the net.

The Geocities archive provides us with the experience of getting old. Coming into contact with aged pages is an important lesson that defies the impression that on the net, everything always happens in the present.

Ruins and Templates of Geocities is the first chapter of the publication Still There. It is based on this blog’s posts tagged with “ruins”, “templates” and “alive”.

We made a new work about the web of the 90s — Once Upon. Thanks to top bloggers and twitterers it was reblogged, retweeted and got quite some attention and positive response.

But we can’t ignore that links to “Once Upon” mostly were tagged with humor, web humor, geek humor and the like. Which by itself is ok. After all we are not deadly serious and laugh about architecture and appearance of today’s social networks and smile at our own obsession to design with framesets and tables.

What really troubles us is the idea that we wanted to make fun about WWW of the last century, and that we wanted to show how ugly Facebook etc could look, or how ugly the web looked then. Even the most loyal commentors, even the most nostalgic ones, notice that we make the services “look ugly”1. Though we didn’t. Neither intentionally nor of ignorance.

We actually find our designs beautiful (as well as design of many Geocities pages we analyze in the blog) but here everybody is free to disagree. We really insist that our Once Upon designs are browser specific: They are modular — you see the borders in between data of different formats like pictures, text, and markup itself; and they don’t hide markup — you see borders of tables, borders of frames, scrollbars, default colors of links and background. In a slightly exaggerated manner they show the web’s native aesthetics.

And, sadly, this browser specificity is exactly what is considered to be “ugly”.

Remember the famous text by Dutch typo gurus, founders of Emigre magazine Zuzana Licko and Rudy VanderLans Ambition/Fear? In 1989 they reassured fellow designers:

There is nothing intrinsically “computer-like” about digitally generated images. Low-end devices such as the Macintosh do not yield a stronger inherent style than do the high-end Scitex systems, which are often perceived as functioning invisibly and seamlessly. This merely shows what computer virgins we are. High-end computers have been painstakingly programmed to mimic traditional techniques such as airbrushing or calligraphy, whereas the low-end machines force us to deal with more original, sometimes alien, manifestations. Coarse bitmaps are no more visibly obtrusive than the texture of oil paint on a canvas, but our unfamiliarity with bitmaps causes us to confuse the medium with the message. Creating a graphic language with today’s tools will mean forgetting the styles of archaic technologies and remembering the very basic of design principles.

Believe in “the basic design principles” and desire to overcome limitations of young technologies are mainstream. Professional screen design is all about forgetting. Forgetting and ignoring the browser, the interface, pixels, …

We should do something about it in 2012.

Happy New Year

olia


  1. See for example ubergizmo’s blog post. []


Jason Scott answers Dragan’s question about GeoCities profiles that have apparently survived during VERBINDINGEN/JONCTIONS 13 2011-12-04.

In his opinion these happened to people who payed for Yahoo! Web Hosting, an option suggested to GeoCities users when Yahoo! announced its end. And because of a technical glitch, not only the newly bought domain names lead to the old content, but also the classic GeoCities URLs.

We just looked at Yahoo!’s help and found something else, a supposed premium GeoCities account called GeoCities Plus that promised unupdatable eternity, WWW’s hell :

If you’re a Yahoo! GeoCities Plus customer, your friends and family can still view your web site as usual. However, you can no longer access your files or update your pages with GeoCities tools. To update your site, you’ll need to upgrade to Web Hosting.

You won’t believe, but several hours ago, while surfing and looking for something unrelated to Geocities, we found a Geocities page that still exists! on Geocities!
Not a folder with templates by Yahoo. Not an invisible GIF.

But a real profile of a real user!

Namely famous ASCII artist Joan G. Stark.
http://www.geocities.com/spunk1111/
Last updated in 2001.

We rubbed our eyes, reloaded and Shift-reloaded, but the miracle didn’t disappear. The page is still there. And that’s not all, further research showed that Spunk’s previous account /7373 in SoHo neighborhood is also online. http://www.geocities.com/SoHo/7373/
Last updated in 2001 as well.

Both profiles are almost identical. And there is the 3rd one — http://www.ascii-art.com/.

But it is only an index page –not updated since 2001 and squatted by porn spam — if you click enter you are back at Geocities/SoHo/7373/

What’s going on? How did it happen? Was it forgotten? Protected? Paid? Are there other survivors?

Update: the question about other survivors is answered in comments by Nick and Google http://www.google.ca/search?q=site%3Ageocities.com.
What are all these profiles doing there?

I don’t know how to begin writing about web pages made “In loving memory of -”. They’re too personal and emotionally loaded for a formal analysis. No, writing is already the next issue, I don’t even collect and categorize them, nor do I bookmark or tag them. I don’t take screenshots and can’t even “save the image as”. Which is a trouble because these images and layouts are very strong. Often unique, probably because I’m not the only user who stopped herself from appropriating parts of these tributes.

Pages of web masters in grief are loaded with the belief that through “the network of the networks” you can establish a connection with those who are no longer among us: through links, buttons, forms, applets … These pages are medium specific in the ultimate way — being a system (infrastructure) for communicating with lost ones.

A quote from Scott’s talk at the Personal Digital Archiving conference earlier this year:

“This is a site created by a mother to commemorate her lost son, who died
as an infant. What struck me, if you look at the dates, is that he died
in 1983, a full 15 years before Geocities came along, and her feelings
were still strong in two ways – she wanted to keep his memory alive, and
she saw Geocities as the way to do it.”

“Graphic, Animation, and background by Ivelisse Hernández © 1997 & 1998″

Original URL: http://www.geocities.com/SoHo/Cafe/2625/

The Wizard is still there, but you can’t build anything with it. I’m still guessing why Yahoo! keeps a lot of Geocities supplementary stuff online.

What you get by downloading the Geocities Torrent is not actually an “archive”. It contains many 7zip archive files, but how the data therein is organized is not fit to make statistical analysis. The Geocities Torrent tells the story of a great disaster and salvation, also in its structure.

For example, to simply answer the question “How many Geocities accounts are contained in the torrent?” or “What was the most used divider image in a certain neighborhood?”, counting index.html files is not enough. For this, we need to know the original directory structure on the Geocities server, and since Yahoo! didn’t give anybody access to it directly, we have to rely on the information about it that was available to the Archive Team via HTTP during the time when they made the copy. Also, the Archive Team had to pack the data for optimal distribution, which worked very well, and created an almost entertaining downloading experience.1 But the big amount of symbolic links makes it difficult to do even simple counting.

Users do not like case-sensitive file systems

Geocities used the powerful Apache web server on an unknown Unix-like operating system.2 User account names, neighborhood names plus directory and file names were stored case-sensitive on there, meaning that the file “Hello.html” is different from “hello.html” or “heLLo.html”. Traditionally, most users do not understand why there should be a difference for the same name written in a different case. “Consumer” operating systems (aka Windows and Macintosh) do not distinguish case in the file system. Most users of Geocities didn’t care for case when putting links in their HTML code, for example they could link to http://www.geocities.com/bob/dogs/ when the actual file name on the Geocities server would call for a link like http://www.geocities.com/Bob/Dogs/.

Apparently, Geocities followed two strategies for easing their users’ pain with case-sensitivity:

Symbolic Links by Geocities

They created symbolic links in their file system that pointed from Bob to bob.3 This means that when looking into the directorly bob, it will always contain the same content as the directory Bob, and vice-versa. Symbolic links are a powerful file system feature, however it is very easy to create train wrecks with for example directories that contain a link to themselves: an infinite loop in the file system. What’s worse, when looking at a site through a browser, symbolic links can not be distinguished from a real file or directory. So both Bob and bob would exist as if there were two users instead of one. And the Archive Team of course hadto save both variations, because, without looking inside of each directory, they wouldn’t know if there maybe was another user that went with the same name in lowercase.4

There are many ruins of symbolic links to be found in the torrent, especially of the type that creates infinite loops.

mod_speling

There is a plugin for the Apache web server, mod_speling, which tries to correct wrongly typed URLs and redirects the browser to the actual URL with the correct case. It appears like at some point the Geocities server was equipped with this module — otherwise it would be a miracle how all this could function in general. However, the mirror tool wget used by the Archive Team to copy Geocities, will still save the file under the original request name. So if you ask wget to copy bob, it will be redirected to Bob and save what is found there, but still locally give it the name bob.5 And again it would result in a potentially duplicate file.

This is neither the fault of Archive Team6, the wget developers or Geocities. HTTP and HTML were designed in a certain way, but when millions of users are let loose on a technically well-defined standard, unpredictable things happen.

Where the Archive Team detected duplicate downloads, they replaced them with symbolic links in their copy’s file system. While this makes browsing the data much easier, it also leads to problems about deciding for what operation which type of symbolic link has to be followed. If for example a symlink makes a whole sub-neighborhood exist twice in two different spellings, this symlink should be ignored. A symlink to an user’s account that is stored in YAHOOIDS should be followed though. It would be possible to develop a logic that takes all of this into account, but it will be prone to errors, resulting in some research operations having to be repeated when bugs in research scripts are found. And each run can take ages! So it seems like a good investment to fix the file system before going any further.

Fixing

  1. Most analysis on the Geocities Torrent will have to be conducted through HTTP and an Apache webserver running mod_speling. Redirects will have to be taken into account.
  2. All symbolic links have to be resolved. Steps:
    1. Use the command find . -type l to catch the first level of symbolic links.
    2. Use readlink to determine where symlinks are pointing and replace them with the original files (first rm the symlink, then mv the original to the symlink’s location).
    3. Repeat steps C and B until no symbolic links are left.

    Of course, every round of found symlinks has to be examined manually for infinite loops or obvious traps.

  3. Find directories and with “almost equal” names, e.g. names that would be found by mod_speling. Compare the pairs’ contents. If the contents are equal, decide which is the original and delete the other. If the contents are partly different, merge the contents and keep only one version. If the content is different, keep both versions. (Probably should be done using diff.)

Each of these operations takes from hours to days. So please bear with us for a while :)


  1. How the user accounts were pouring in which the arrival of each 7zip file was simply blissful. []
  2. We know because Apache generates certain kinds of index and error pages that can be found in the torrent. Also, the file system is definitely case-sensivite. []
  3. It is not clear if users were also allowed to create symbolic links. []
  4. If they had taken the time to compare all this, there would probably be no torrent at all. []
  5. Browsers still do the same: How often did you save a PDF file with the name download.php? []
  6. In fact, using the default behaviors of standard software was the best choice in this case, because now the coming about of the data can easily be reconstructed. If the Archive Team had made assumptions on how the Geocities server was configured and had modified wget accordingly, a lot of data might have been lost. []


Olia and Dragan reading chapter “Adding Multimedia to your GeoCities Site” (p 213)


We ordered the book, after finding this review:


Original URL: http://www.geocities.com/PicketFence/1284/oldindex.htm

The EXTERNAL LINK led to Amazon where it is still possible to buy “Creating GeoCities Websites”, and much much cheaper than 12 years ago. $0.10 against $39.99 in 1999. But I wouldn’t recommend to do it. Even in 1999 readers left very skeptical feedback.

May 14, 1999:

“This is absoluately a laugher, an entire book on how to design a website for ONE specific free webpage server, and unfortunately, a heavily contraversial one with their excess amount of involuntary advertising of themselves using pop-up ads [...]“

May 20, 1999:

“This book is a terrible resource on designing web pages. I suppose if you wanted your site to look like every other pitiful GeoCities site out there, then you could find a use for this book.”

And last but really last, I don’t think there will be any more. September 28, 2005:

“the book was printed in 1999, so all of the information i needed about geocities was way outdated. product sucked”