<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>One Terabyte of Kilobyte Age</title>
	<atom:link href="http://contemporary-home-computing.org/1tb/feed" rel="self" type="application/rss+xml" />
	<link>http://contemporary-home-computing.org/1tb</link>
	<description>Digging through the Geocities Torrent</description>
	<lastBuildDate>Sat, 28 Apr 2012 21:34:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>This may take up to 20 sec</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3249</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3249#comments</comments>
		<pubDate>Sat, 28 Apr 2012 21:34:17 +0000</pubDate>
		<dc:creator>olia</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[alive]]></category>
		<category><![CDATA[GIF]]></category>
		<category><![CDATA[last updated 2006]]></category>
		<category><![CDATA[webmaster]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3249</guid>
		<description><![CDATA[Why in the second part of the 90&#8242;s animated GIFs were rarely (outside of porno sites) used to show film and video sequences? Because even half a second of heavily compressed and and downscaled, barely recognizable footage would be still &#8230; <a href="http://contemporary-home-computing.org/1tb/archives/3249">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Why in the second part of the 90&#8242;s animated GIFs were rarely (outside of porno sites) used to show film and video sequences? Because even half a second of heavily compressed and and downscaled, barely recognizable footage would be still too heavy and slow network of that time.</p>
<p>In 1998, Shocking Blue fan Greg converted one second of a TV performance of the group&#8217;s hit song &#8220;Venus&#8221; into a GIF. It is 160×120 pixles, contains 15 frames and weights 93KB. Greg didn&#8217;t dare to confront the visitors of the page with such a huge file.<sup><a href="http://contemporary-home-computing.org/1tb/archives/3249#footnote_0_3249" id="identifier_0_3249" class="footnote-link footnote-identifier-link" title="Just for comparison, the third picture in the Alternative Animated GIF Timeline, a video loop as common in GIFs today, is 461&times;322 pixels, 2MB, 42 frames.
 ">1</a></sup> He uses a static image and suggests to start loading the animation with a click, but to be ready that &#8220;it may take 20 sec&#8221;.</p>
<p>Starting image<br />
<img src="http://contemporary-home-computing.org/1tb/wp-content/uploads/marstill5.jpg" alt="" /></p>
<p>Animated GIF<br />
<img src="http://contemporary-home-computing.org/1tb/wp-content/uploads/shock303.gif" alt="" />  </p>
<p>The moment in the original video<br />
<a href="http://www.youtube.com/watch?v=auoArgmzqN4&#038;t=1m32s"><img src="http://contemporary-home-computing.org/1tb/wp-content/uploads/screenshot_307.png" alt="" title="venus" width="648" height="502" class="alignnone size-full wp-image-3257" /></a></p>
<p>Original URL: <a href="http://www.geocities.com/ofmang/greg/shockblu.html">http://www.geocities.com/ofmang/greg/shockblu.html</a></p>
<div class="footnotes"><hr><ol class="footnotes"><li id="footnote_0_3249" class="footnote">Just for comparison, the third picture in the <a href="http://contemporary-home-computing.org/GIF-Timeline/index1.html">Alternative Animated GIF Timeline</a>, a video loop as common in GIFs today, is 461×322 pixels, 2MB, 42 frames.<br />
 </li></ol></div>]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3249/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>word</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3243</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3243#comments</comments>
		<pubDate>Wed, 25 Apr 2012 21:46:18 +0000</pubDate>
		<dc:creator>olia</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[alive]]></category>
		<category><![CDATA[last updated 2009]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3243</guid>
		<description><![CDATA[Original URL: http://www.geocities.com/vienna/4302/]]></description>
			<content:encoded><![CDATA[<p>
<img src="http://contemporary-home-computing.org/1tb/wp-content/uploads/screenshot_305.png" alt="" title="screenshot_305" width="712" height="150" class="alignnone size-full wp-image-3246" /><br />
<img src="/home/olialia/Desktop/screenshots/screenshot_305.png"></p>
<p>Original URL: <a href="http://www.geocities.com/vienna/4302/">http://www.geocities.com/vienna/4302/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3243/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>V(RM)Lcome to my page!</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3226</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3226#comments</comments>
		<pubDate>Wed, 25 Apr 2012 21:13:36 +0000</pubDate>
		<dc:creator>olia</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[alive]]></category>
		<category><![CDATA[GIF]]></category>
		<category><![CDATA[last updated 2001]]></category>
		<category><![CDATA[welcome]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3226</guid>
		<description><![CDATA[Original URL: http://www.geocities.com/~johanh/ The page is still on(VRM)line.]]></description>
			<content:encoded><![CDATA[<p>
<img src="http://contemporary-home-computing.org/1tb/wp-content/uploads/vrmlcome8.gif" alt="null" /><br />
<br />
Original URL: <a href="http://www.geocities.com/~johanh/">http://www.geocities.com/~johanh/</a><br />
<br />
The page is still on(VRM)line. </p>
]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3226/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Authenticity/Access</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3214</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3214#comments</comments>
		<pubDate>Mon, 09 Apr 2012 15:46:54 +0000</pubDate>
		<dc:creator>drx</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[meta]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3214</guid>
		<description><![CDATA[Access to the remains of Geocities can be measured on two axis: authenticity (how realistic can the harvested data be presented again) and ease of access (what technical requirements and what knowledge are needed to gain access on a certain &#8230; <a href="http://contemporary-home-computing.org/1tb/archives/3214">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Access to the remains of Geocities can be measured on two axis: authenticity (how realistic can the harvested data be presented again) and ease of access (what technical requirements and what knowledge are needed to gain access on a certain level of authenticity).</p>
<p><a href="http://contemporary-home-computing.org/1tb/wp-content/uploads/autheticity.png"><img class="alignnone size-full wp-image-3215" title="autheticity" src="http://contemporary-home-computing.org/1tb/wp-content/uploads/autheticity.png" alt="" width="996" height="609" /></a></p>
<p>Graphical authenticity on the pixel level means that a Geocities web page will render exactly as it would to visitors in the time the page was published. This might be considered as esoteric on the grounds that web authors could never be sure how their creations would appear to their audience, by the nature of HTML itself. However, web authors often were not aware of this fact and people of the 1990&#8242;s used different browsers on different operating systems than we do today. The dominance of for example Netscape browsers and a certain set of plugins meant that most people would experience a web page in a certain way that was very different from accessing it now with Webkit based browsers.</p>
<p>What has changed most significantly in operating systems since the high times of Geocities is the display of text. All current operating systems render characters with smoothed out edges, and this is reflected as much in current web design as the historic aliased pixel text display influenced the web design of the past.</p>
<p>Low barrier access to original visual web culture can be provided by screenshots taken from virtual machines running an historic operating system and browser.</p>
<p>Today, MIDI music files do not sound at all as they used to when they dominated web audio. These audio files can be recorded from historic hardware and operating systems or rendered with emulators for contemporary listeners. This will rip them of any context, but still can give a good, easily accessible impression on how the web sounded at a certain point in time.<sup><a href="http://contemporary-home-computing.org/1tb/archives/3214#footnote_0_3214" id="identifier_0_3214" class="footnote-link footnote-identifier-link" title="The artist Ryder Ripps for example published MP3 recordings of MIDI files as online mix tapes and a vinyl record containing recordings of a selection of classic MIDI files. Of course these projects present only &amp;#8220;snapshots&amp;#8221; of a certain kind of MIDI playback sound and the selection process definitely targets musically interested people of today, but no doubt these efforts will keep the sound present.">1</a></sup></p>
<p>The web pages&#8217; original interactivity can be restored by accessing them from a mirror and employing a browser addon that re-writes the original URLs contained in the HTML to match the mirror&#8217;s URLs.<sup><a href="http://contemporary-home-computing.org/1tb/archives/3214#footnote_1_3214" id="identifier_1_3214" class="footnote-link footnote-identifier-link" title="See Tips for Torrenters on this blog for a version of a browser addon that does this. reocities takes a similar approach and even automatically changed all HTML code so it validates and is more likely to &amp;#8220;work&amp;#8221; on contemporary browsers. While this seems questionable from an archivist&amp;#8217;s point of view it surely allows broader access to the historic data.">2</a></sup></p>
<p>If the data should not be tampered with, neither on the client nor on the mirror server, and URLs shown in the browser&#8217;s address bar should be the exact originals, a proxy server must be put in front of the mirror server. This proxy can transparently re-direct requests to for example <em>http://www.geocities.com/</em> to <em>http://mirror.local/www.geocities.com/</em> without the need for any name server tricks or changing historic HTML code. It is trivial to write such a proxy in <a href="http://nodejs.org/">nodejs</a> for example.</p>
<p>As even the oldest browsers support proxy servers, one can employ this system on a virtualized historic operating system like Windows 95 or Windows 98 with authentic web browsers. This significantly raises the access barrier though, as access to such a proxy must be technically restricted, <a href="http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#access">or the Internet will collapse</a>.<sup><a href="http://contemporary-home-computing.org/1tb/archives/3214#footnote_2_3214" id="identifier_2_3214" class="footnote-link footnote-identifier-link" title="The most common problem with open proxies is, as experience from my project insert_coin shows, that people from Saudi Arabia use up a lot of bandwidth while accessing pornographic material that is blocked in their home country.">3</a></sup> Also, virtual machine software, historic operating systems and historic browsers and plugins are not easily available to the largest part of web users that might be interested in looking at their cultural past.</p>
<p>The largest authenticity surface is covered by using all the before mentioned techniques and running historic hardware. Current virtualization and emulation systems are quite good in re-creating all aspects of 1990&#8242;s computers, except audio. There are just not enough business critical MIDI files out there to make companies like VMWare emulate <a href="http://en.wikipedia.org/wiki/Yamaha_YM3812">OPL3 sound chips</a>. Additionally, all graphical output looks very different on <a href="http://en.wikipedia.org/wiki/Cathode_ray_tube">CRT</a> monitors and their special surface-to-pixel ratios than it does on contemporary flat screens. For example, when looking at historic web pages on a 800×600 pixel 14&#8243; CRT screen with a 60Hz refresh rate, it becomes clear why many people decided to use dark backgrounds and bright text for their designs instead of emulating paper with black text on a white background.</p>
<h3>Balancing</h3>
<p>While restoration work must be done on the right end of the scale to provide a very authentic re-creation of the web&#8217;s past, it is just as important to work on every point of the scale in between to allow the broadest possible audience to experience the most authentic re-enactment of Geocities that is comfortable for consumption on many levels of expertise and interest.</p>
<div class="footnotes"><hr><ol class="footnotes"><li id="footnote_0_3214" class="footnote">The artist Ryder Ripps for example published <a href="http://midi.mx/">MP3 recordings of MIDI files</a> as online mix tapes and a <a href="http://ryder-ripps.com/NOW_THATS_WHAT_I_CALL_MIDI/">vinyl record</a> containing recordings of a selection of classic MIDI files. Of course these projects present only &#8220;snapshots&#8221; of a certain kind of MIDI playback sound and the selection process definitely targets musically interested people of <em>today</em>, but no doubt these efforts will keep the sound present.</li><li id="footnote_1_3214" class="footnote">See <a href="http://contemporary-home-computing.org/1tb/archives/498">Tips for Torrenters</a> on this blog for a version of a browser addon that does this. <a href="http://reocities.com/">reocities</a> takes a similar approach and <a href="http://reocities.com/newhome/makingof.html">even automatically changed all HTML code so it validates</a> and is more likely to &#8220;work&#8221; on contemporary browsers. While this seems questionable from an archivist&#8217;s point of view it surely allows broader access to the historic data.</li><li id="footnote_2_3214" class="footnote">The most common problem with open proxies is, as experience from my project <a href="http://odem.org/insert_coin/">insert_coin</a> shows, that people from Saudi Arabia use up a lot of bandwidth while accessing pornographic material that is blocked in their home country.</li></ol></div>]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3214/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Verify and complete your Geocities Download &#8212; via HTTP!</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3209</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3209#comments</comments>
		<pubDate>Mon, 02 Apr 2012 15:53:42 +0000</pubDate>
		<dc:creator>drx</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[meta]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3209</guid>
		<description><![CDATA[Many people started the adventure of downloading the Geocities Torrent, and while constantly seeding it, I have seen the peer numbers drop to zero a long time ago. Anyway, I do not recommend using the torrent at the moment, since &#8230; <a href="http://contemporary-home-computing.org/1tb/archives/3209">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Many people started the adventure of downloading the Geocities Torrent, and while constantly seeding it, I have seen the peer numbers drop to zero a long time ago. Anyway, I do not recommend using the torrent at the moment, since Jason Scott, now working for the Internet Archive, put all the Geocities data into the <a href="http://archive.org/details/archiveteam-geocities">Geocities Valhalla</a>.</p>
<p>These are still the same files, hard to handle in their sheer amount, and not as neatly organized as the contents of the original torrent. If you like to complete your canceled torrent download, or of you wish to download the whole thing via HTTP, you might like this script:</p>
<p><script src="https://gist.github.com/2283219.js?file=geo-torrent-checksums.pl"></script></p>
<p>I mostly wrote it for myself to check data integrity before the first serious database ingest, but extended it to support downloading. It requires Perl 5.1x, XML::TreePP and md5sum. Hope you will find it useful.</p>
<p>(To all the peeps reading this with a feed reader: The post contains a Perl script, embedded via gist from github.)</p>
]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3209/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Designing for the Future</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3192</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3192#comments</comments>
		<pubDate>Wed, 21 Mar 2012 18:29:29 +0000</pubDate>
		<dc:creator>olia</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[last updated 1997]]></category>
		<category><![CDATA[ruins]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3192</guid>
		<description><![CDATA[New Media researcher Anne Helmond asked if her 1997 &#8220;Unofficial Eric&#8217;s Trip Homepage&#8221; at SunsetStrip/3500/ was archived. It is there. The lo-fi (html) part is almost complete and functional, only guest book and &#8220;Sounds&#8221; are missing. Hi-Fi link leads to &#8230; <a href="http://contemporary-home-computing.org/1tb/archives/3192">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-3193" src="http://contemporary-home-computing.org/1tb/wp-content/uploads/screenshot_295.png" alt="" width="641" height="424" /></p>
<p>New Media researcher <a href="http://www.annehelmond.nl/">Anne Helmond</a> asked if her 1997 &#8220;Unofficial Eric&#8217;s Trip Homepage&#8221; at <em>SunsetStrip/3500/</em> was archived.</p>
<p>It is there. The lo-fi (html) part is almost complete and functional, only guest book and &#8220;Sounds&#8221; are missing. Hi-Fi link leads to a non existing <em>ethi.html</em>.</p>
<p>As I mentioned in <a href="http://contemporary-home-computing.org/still-there/geocities.html">Ruins and Templates of Geocities</a>, we rarely know whether missing files were lost due to glitches during the archiving process, or to the site owner’s lack of skill or failure to maintain the files and links between them. But this time I could ask the user personally  what was there.</p>
<p>Anne  said that probably she never ever made a HiFi version and can&#8217;t remember what she planned for this.</p>
<p><img class="alignnone size-full wp-image-3196" src="http://contemporary-home-computing.org/1tb/wp-content/uploads/screenshot_296.png" alt="" width="522" height="211" /></p>
<p>It is so 1997! I remember myself believing that broadband is just some weeks away and preparing for the nearest future with links like that. This can be it. Broken links were links to the files to be made very very soon.</p>
<p>Original URL: <a href="http://www.geocities.com/SunsetStrip/3500/">http://www.geocities.com/SunsetStrip/3500/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3192/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Organizing the Database</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3170</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3170#comments</comments>
		<pubDate>Wed, 21 Mar 2012 10:11:04 +0000</pubDate>
		<dc:creator>drx</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[meta]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3170</guid>
		<description><![CDATA[A Research Database for Everybody Geocities was a people&#8217;s effort, and it should continue belonging to the people, even after its demise. This is important to keep in mind when doing research on the millions of home pages that have &#8230; <a href="http://contemporary-home-computing.org/1tb/archives/3170">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h3>A Research Database for Everybody</h3>
<p>Geocities was a people&#8217;s effort, and it should continue belonging to the people, even after its demise. This is important to keep in mind when doing research on the millions of home pages that have been rescued. The results of this research should be easy to build new research upon for any interested party.</p>
<p>The scripts and system setup guides I am preparing to be published on GitHub will make it relatively comfortable to work with parts of Geocities that already have been recovered or will be recovered in the future.</p>
<p>Using a database for helping to make sense of this huge body of data is a very straightforward idea, but how to organize it so that it is as <em>portable</em> as the rest of the project? Additionally, the database must be able to work with multiple, contradicting versions of the same file that where harvested by different people at different points in time, using different tools, making different mistakes. And it should be possible to dump the complete database, or parts of it, hand it over to another researcher, and later merge the changes.</p>
<p>Naïve database designs<sup><a href="http://contemporary-home-computing.org/1tb/archives/3170#footnote_0_3170" id="identifier_0_3170" class="footnote-link footnote-identifier-link" title=" &amp;#8230; meaning &amp;#8220;What I used to do until now&amp;#8221; :) ">1</a></sup> usually work with <a href="http://en.wikipedia.org/wiki/Surrogate_key">surrogate keys</a> to uniquely identify records. The simplest form of such a key is a counter that increases with every new database record.<sup><a href="http://contemporary-home-computing.org/1tb/archives/3170#footnote_1_3170" id="identifier_1_3170" class="footnote-link footnote-identifier-link" title=" In for example the PostgreSQL database, these counters are called serial. ">2</a></sup> So the first entry gets an id of 1, the second an id of 2, and so forth. Another approach to generating surrogate keys is randomizing numbers or strings. Because such keys have nothing to do with the records they help identify, they are called &#8220;surrogates.&#8221;</p>
<p>This is fine for a centralized system collecting data in one place, but unfit for a distributed case as outlined above. Without knowing the state of such a counter on one computer it is impossible to add entries on another computer without the possibility of a key collision when the databases are merged again. Even when staying isolated inside one computer, the keys&#8217; values are &#8212; in the case of a serial counter &#8212; determined by the in many cases arbitrary timely order in which records are put into the database.<sup><a href="http://contemporary-home-computing.org/1tb/archives/3170#footnote_2_3170" id="identifier_2_3170" class="footnote-link footnote-identifier-link" title=" For instance, when reading a list of files to add to the database, the order in which these files are read from the disk is typically determined by the order they are stored on disk. This order is quite random, to create a predictable order would mean to sort the file names before ingest. This represents just another step where a lot could go wrong. ">3</a></sup> This is critical because it is likely that it will be infeasible to distribute the over-sized database as a whole, and rather distribute scripts that generate it.</p>
<p>The solution is to use <a href="http://en.wikipedia.org/wiki/Natural_key">natural keys</a> to identify database records. These keys are generated from unique properties of the records they identify and therefore are predictable. This approach leads to certain constraints that will be explained below.</p>
<h3>Truth and opinion</h3>
<p>Trying to normalize a dataset like Geocities seems like heresy in the first place. Most of the insights waiting for discovery inside of it are not <em>enumerable</em>. So the database has to reflect on this by separating truth and opinion.</p>
<p>There is little absolute truth in each file extracted from the torrent, apart from actual data and sparse filesystem metadata it contains: name, size, last-modified date and of course its contents. It is not even possible to say for sure from which URL each file was harvested: the classic wget<sup><a href="http://contemporary-home-computing.org/1tb/archives/3170#footnote_3_3170" id="identifier_3_3170" class="footnote-link footnote-identifier-link" title=" wget is a GNU software for automatic downloading of network data. The archiveteam recently fixed some of its problems and released wget-warc. ">4</a></sup> does not save this information and the original Geocities server used case-insensitive URLs<sup><a href="http://contemporary-home-computing.org/1tb/archives/3170#footnote_4_3170" id="identifier_4_3170" class="footnote-link footnote-identifier-link" title="See Cleaning Up the Torrent">5</a></sup>, so the same file could be retrieved from many different URLs (and indeed many duplicates can be found in the collections). Then there is the classic case of a file being named <code>www.host.com/folder/index.html</code> when it was in fact downloaded from just <code>http://ww.host.com/folder/</code> without the &#8220;index.html&#8221; part, and so on.</p>
<p>So, this is <strong><em>The Truth</em></strong>:</p>
<p><script src="https://gist.github.com/2145757.js?file=table-files.sql"></script></p>
<p>For the simple case of reviving Geocities to a state that can be experienced again in a browser, a mapping of URLs to files is needed. While the URL can be figured out in most cases with a straightforward lower-casing of the file name, it is actually guesswork for some percentage of the files. Especially in the case of duplicates, it is not easy to decide which file to serve for a certain URL.</p>
<p>To solve such cases, a system of <em>agents</em> is introduced.</p>
<p><script src="https://gist.github.com/2145757.js?file=table-agents.sql"></script></p>
<p>Every <em>opinionated</em> entry in the database is signed by an <em>agent</em>. An agent might be a script that does some analysis of a file to extract information. Later, a human agent might correct mistakes the script has made, or a later version of the script might add new information.</p>
<p>A table that can be customized for each user/researcher contains information how much weight each agent has. In the case of contradicting information, the information created by the agent with the highest weight wins. &#8212; Usually, humans should win over robots.</p>
<p><script src="https://gist.github.com/2145757.js?file=table-urls.sql"></script></p>
<p><em>Different agents can have the same opinion</em> as the agent name is part of the natural key for the entry. This is vital for the weighting system to work. But <em>each agent can have only one opinion on each topic</em>. Sloppy scripts that made a mistake might undo it by adding corrected information, but under a different agent name &#8212; this is why the agent name for scripts should always contain the version number of the script.</p>
<p>Humans might need different identifiers, too, if they intend to erase a mistake they have made before. I suggest using twitter handles as identifiers, and &#8220;namespacing&#8221; them with dashes if required.<sup><a href="http://contemporary-home-computing.org/1tb/archives/3170#footnote_5_3170" id="identifier_5_3170" class="footnote-link footnote-identifier-link" title=" Twitter, or any other service that allows only unique names, could be used as an authority; however this will only become a problem when hundreds of people start to research Geocities, which seems quite unlikely at the moment. ">6</a></sup> For example, while human agent &#8220;despens&#8221; made some bad decisions, &#8220;despens-wiser&#8221; might overwrite them. Of course this is only required for <em>publicly released</em> database records or scripts.</p>
<p>Stay tuned as the database gets fleshed out more with additional tables. These were just examples.</p>
<h3>About SQL</h3>
<p>There has been a development of &#8220;new wave&#8221; databases &#8211; e.g. <a href="http://couchdb.apache.org/">couchdb</a> and <a href="http://www.mongodb.org/">mongodb</a>, summarized under <a href="http://en.wikipedia.org/wiki/NoSQL"><em>NoSQL</em></a>. While they are rightfully praised for their performance and flexibility, they do not really address any problems that come up when handling complex semantic relations like it is required, as I believe, for working with something like Geocities. Instead, they make this power and complexity optional by pushing it out of the database system into the software that wants to make use of this data.</p>
<p>My main motivation for creating this database is to amass knowledge and interpretations about Digital Folklore, giving researchers (including Olia and myself) a tool to verify assumptions and find surprising correlations. SQL appears like a bit of a weird language, but it is tried-and-tested, can solve complex tasks and knowledge about it is wide-spread among archivists and researchers.</p>
<p>I decided to build everything on the <a href="http://www.postgresql.org/">PostgreSQL</a> database, which is a free software project, refined since 1989, with mature documentation and some nice graphical frontends<sup><a href="http://contemporary-home-computing.org/1tb/archives/3170#footnote_6_3170" id="identifier_6_3170" class="footnote-link footnote-identifier-link" title=" I prefer pgadmin. ">7</a></sup> ; plus, PostgreSQL is actually reasonably fast when it comes to <em>writing</em> records.</p>
<div class="footnotes"><hr><ol class="footnotes"><li id="footnote_0_3170" class="footnote"> &#8230; meaning &#8220;What I used to do until now&#8221; :) </li><li id="footnote_1_3170" class="footnote"> In for example the PostgreSQL database, these counters are called <a href="http://www.postgresql.org/docs/8.1/static/datatype.html#DATATYPE-SERIAL">serial</a>. </li><li id="footnote_2_3170" class="footnote"> For instance, when reading a list of files to add to the database, the order in which these files are read from the disk is typically determined by the order they are stored on disk. This order is quite random, to create a predictable order would mean to sort the file names before ingest. This represents just another step where a lot could go wrong. </li><li id="footnote_3_3170" class="footnote"> <a href="http://www.gnu.org/software/wget/">wget</a> is a GNU software for automatic downloading of network data. The archiveteam recently fixed some of its problems and released <a href="http://archiveteam.org/index.php?title=Wget_with_WARC_output">wget-warc</a>. </li><li id="footnote_4_3170" class="footnote">See <a href="http://contemporary-home-computing.org/1tb/archives/2948">Cleaning Up the Torrent</a></li><li id="footnote_5_3170" class="footnote"> Twitter, or any other service that allows only unique names, could be used as an authority; however this will only become a problem when hundreds of people start to research Geocities, which seems quite unlikely at the moment. </li><li id="footnote_6_3170" class="footnote"> I prefer <a href="http://www.pgadmin.org/">pgadmin</a>. </li></ol></div>]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3170/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Another city going down in flames</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3166</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3166#comments</comments>
		<pubDate>Sun, 18 Mar 2012 00:21:18 +0000</pubDate>
		<dc:creator>drx</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3166</guid>
		<description><![CDATA[FortuneCity, a free home page hosting service just like Geocities, will be closing April 30th 2012. ArchiveTeam is busy copying it, and if you want to help, look at the instructions. The distributed copying system with a centralized tracker seems quite &#8230; <a href="http://contemporary-home-computing.org/1tb/archives/3166">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://fortunecity.com/">FortuneCity</a>, a free home page hosting service just like Geocities, will be closing April 30th 2012. ArchiveTeam is busy copying it, and if you want to help, look at the <a href="http://archiveteam.org/index.php?title=FortuneCity">instructions</a>. The distributed copying system with a <a href="http://focity.heroku.com/">centralized tracker</a> seems quite sophisticated. While the Geocities download was organized on a wiki page, this time everything is fully automated and smooth. I am happy to contribute three computers in three different networks!</p>
<p>I am still working on the Geocities database, and it will be fit to hold data from FortuneCity as well. Time to buy a new harddisk.</p>
]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3166/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Last Updated: Never</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3139</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3139#comments</comments>
		<pubDate>Sun, 11 Mar 2012 20:34:38 +0000</pubDate>
		<dc:creator>olia</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[related]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3139</guid>
		<description><![CDATA[I was happy to present our imaginary social networks and services of 1997 &#8212; Once Upon &#8212; at Unlike Us, hosted by the Institute of the Network Cultures in Amsterdam. It was a great event with a truly interested and &#8230; <a href="http://contemporary-home-computing.org/1tb/archives/3139">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-3150" title="screenshot_291" src="http://contemporary-home-computing.org/1tb/wp-content/uploads/screenshot_291.png" alt="" width="600" height="139" /><br />
<img src="http://contemporary-home-computing.org/1tb/wp-content/uploads/screenshot_292.png" alt="" title="screenshot_292" width="736" height="250" class="alignnone size-full wp-image-3161" /></p>
<p>I was <a href="http://networkcultures.org/wpmu/unlikeus/2012/03/10/olia-lialina-and-97-web-melancholy/">happy to present</a> our imaginary social networks and services of 1997 &#8212; <a href="http://1x-upon.com/">Once Upon</a> &#8212;  at <a href="http://networkcultures.org/wpmu/unlikeus/2-amsterdam/">Unlike Us</a>, hosted by the Institute of the Network Cultures in Amsterdam.</p>
<p>It was a great event with a truly interested and competent audience.  The only problem was that in the end it was mostly about Facebook, even though in The Netherlands <a href="http://www.hyves.nl/">Hyves</a> has still more users than Facebook.</p>
<p>That&#8217;s why it was very important to see  the presentation by <a href="http://www.philbu.net/">Philipp Budka</a> from the Department of Social and Cultural Anthropology at the University of Vienna. He talked about <a href="http://myknet.org/">Myknet</a>, an obscure  Canadian social network and web hosting service.</p>
<blockquote><p><a href="http://ci-journal.net/index.php/ciej/article/view/568/450">[...] a system of personal homepages intended for remote First Nations users in Northern Ontario. This free of charge, free of advertisements, locally-supported online social environment grew from a constituency base of remote First Nations in a region where numerous communities lived without adequate residential telecom service well into the millennium (Ramirez, Aitkin, Jamieson, &amp; Richardson, 2003; Fiser, Clement, &amp; Walmark, 2006). MyKnet.org now hosts over 30,000 registered user accounts, of which approximately 25,000 represent active homepages. This is particularly notable considering that the system primarily serves members of Northern Ontario’s First Nations whose combined population is approximately 45,000 (occupying a geographic area comparable to the size of France). Equally significant is that over half of this population is under the age of 25, making MyKnet.org primarily a youth-driven online social environment.</a></p></blockquote>
<p>A <a href="http://myknet.org/users.php">simple alphabetic list of all the users</a> makes it possible to go from account to account. You can also look at the last updated ones and see that there were already almost 400 pages updated today. It is only around 14:00 in Ontario now. </p>
<p>The first impression: Myknet users make their pages in a special way that is obviously influenced by some simple and not strict content management system or a  custom made site builder. Would be interesting to get access to it. </p>
<p>Clearly, the pages are made for low resolution displays and slow connections.</p>
<p>The most exciting fact is that they look exactly like classic homepages, but many are used like tweets or status updates. Visually  and structurally it is an unexpected mix. See  examples on the top of the article.<br />
Many users have moved to Facebook, <a href="http://christinethomas.myknet.org/">leaving their facebook badge</a> on Myknet &#8212; functioning like classic &#8220;this page has moved&#8221; notices.</p>
<p>Many accounts are deleted or were only updated in the last decade, which is interpreted like it never happened by the database.</p>
<p><img class="alignnone size-full wp-image-3147" title="screenshot_289" src="http://contemporary-home-computing.org/1tb/wp-content/uploads/screenshot_2891.png" alt="" width="459" height="487" /></p>
]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3139/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Who wants to supervise my PhD?</title>
		<link>http://contemporary-home-computing.org/1tb/archives/3099</link>
		<comments>http://contemporary-home-computing.org/1tb/archives/3099#comments</comments>
		<pubDate>Mon, 05 Mar 2012 15:10:57 +0000</pubDate>
		<dc:creator>olia</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[last updated 2000]]></category>
		<category><![CDATA[meta]]></category>
		<category><![CDATA[neighborhoods]]></category>

		<guid isPermaLink="false">http://contemporary-home-computing.org/1tb/?p=3099</guid>
		<description><![CDATA[Working Title: From Heartland Neighborhood to Pinterest.com. The Rise and Fall of &#8220;organizing and sharing the things you love&#8221; Culture. Fig.1.1 Homepage of heartlandhelpinghands, Heartland&#8217;s satellite. Fig.1.2 Pinterest.com with menu &#8220;Everything&#8221; unfolded.]]></description>
			<content:encoded><![CDATA[<p> Working Title:<br />
<strong>From Heartland Neighborhood to Pinterest.com.<br />
The Rise and Fall of &#8220;organizing and sharing the things you love&#8221; Culture.</strong></p>
<p>Fig.1.1<br />
 Homepage of <a href="http://www.geocities.com/heartlandhelpinghands/"> heartlandhelpinghands</a>, <a href="http://www.bladesplace.id.au/geocities-neighborhoods-suburbs.html#Heartland">Heartland&#8217;s</a>  satellite.<br />
<img class="alignnone size-large wp-image-3102" title="heartland helping hands" src="http://contemporary-home-computing.org/1tb/wp-content/uploads/screenshot_286-1024x739.png" alt="" /></p>
<p>Fig.1.2<br />
Pinterest.com with menu &#8220;Everything&#8221; unfolded.<br />
<img src="http://contemporary-home-computing.org/1tb/wp-content/uploads/pinterest.png" alt="" title="pinterest" width="1017" height="766" class="alignnone size-full wp-image-3121" /></p>
]]></content:encoded>
			<wfw:commentRss>http://contemporary-home-computing.org/1tb/archives/3099/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

