Tero-dump wikipedia to html converter

Wikipedia is the worlds biggest free encylopedia. Tero-dump extracts wikipedia to normal html pages, so that it can be browsed offline, mirrored and copied to cdroms, laptops and pda's.

Wikipedia.org project is based on participation. If tero-dump is used while online, you can easily edit articles on the live wikipedia by clicking the heading of an article. Links to non-existant articles are directed to wikipedia.org too. Non existant article links are marked with a [?].

Pages are made lighter than normal, based on printable=yes profile. Images are not yet included, but if you copy upload/ directory from wikipedia.org it should work automagically (look at the img src links).

First page has subject and alphabethical index of Wikipedia. There is no search engine included. With new browsers, the alphabethical index works well with typeahed find. If you just copy the html files to your hard disk, you can use the same programs to search wikipedia/tero-dump as any other files, such as locate (remember to updatedb) or swish++.

Tero-dump creates a static html dump of wikipedia.org SQL database.

Currently, tero-dump is a combination of editing wikipedia source, bunch of mod_rewrite rules and command line options to wget. When it is in a sensible form, it will be put here...

Now hosted by Helia: Download wikipedia-terodump-0.1.tar.bz static html (164 MB)

ftp.funet.fi mirror

$ md5sum wikipedia-terodump-0.1.tar.bz.md5
434b9537b62763defbc64eee637694cb wikipedia-terodump-0.1.tar.bz

Screenshot

I am open to feedback at karvinen+terodump at-sign iki.fi

<<Tero Karvinen's homepage

Misc

My student recently created a new version. See Virtaperko 2005: Wikipedia to DVD.

The following is probably of no help to anyone, but I put some of the snippets here to avoid losing them...

wikipedia-terodump-source-2003-11-06-alpha.tar.bz - index.php, list.php, modified cvs source

wget params

#/etc/httpd/conf/httpd.conf
RewriteEngine on
RewriteLog /var/log/httpd/rewrite.log
RewriteLogLevel 1

RewriteRule ^/wikipedia/.{1,2}/index.html$      /home/tee/public_html/wikipedia/list.php [L]
RewriteRule ^/wikipedia/.{1,2}/(.+).html        /home/tee/public_html/wikipedia/wiki.phtml?printable=yes&title=$1 [L]
RewriteRule ^/wikipedia/index.html$     /home/tee/public_html/wikipedia/index.php [L]
RewriteRule ^/wikipedia/upload/(.*)     /home/tee/public_html/wikipedia/upload/$1 [L]