Personal Home Page of Edmund Horner

Chrysophylax-Search

Chrysophylax was his name, for he was of ancient and imperial lineage, and very rich. He was cunning, inquisitive, greedy, well-armoured, but not over bold. And he was mortally hungry.

— J. R. R. Tolkien, Farmer Giles of Ham
This page refers to chrysophylax-search/1.7, which is being replaced by a new version. chrysophylax-search/2.0 will become active in March 2003, and will feature several new improvements. The majority of data gathered by previous versions will be carried over.

What is it?

Chrysophylax-Search an experimental web robot—experimental in the sense that it isn't expected to work perfectly. It was written by Edmund Horner in 2002, and follows numerous earlier attempts in the same vein.

This page is maintained in the spirit of the "Share results" section of Martijn Koster's Guidelines for Robot Writers. The robot itself is also meant to be good citizen: it traverses the web slowly; keeps full results; obeys robots.txt; and provides contact information in the HTTP headers. As well as this, its operator has paid close attention to it and has fixed numerous bugs, and has oberseved proper operation for thousands of pages since the last bug fix.

What does it look like?

The robot is written in PHP and uses a MySQL database. Running the robot is typically done from the command line, since the output of the script is pure plain text without even a Content-Type header. A variety of scripts for querying the database are also in development.

Download: search-refresh.php, search-create.sql. These scripts have only been tested on the author's system, and in all liklihood will not work on yours without alteration.

What has it done so far?

(This snapshot was taken 6 March 2003.) The main area of focus at the moment is the distribution of server software in various domains, particularly in New Zealand. This table shows the number of sites running the major web server software in chosen domains. Progress has become slow in the .nz domains, and I believe that chrysophylax-search know knows about the majority of New Zealand web servers.

New Zealand Generic Top-Level Domains All Domains
.ac.nz .co.nz .cri.nz .gen.nz .govt.nz .iwi.nz .mil.nz .net.nz .org.nz .school.nz Total .biz .com .edu .gov .info .int .mil .net .org
Apache 1 (Unix) 194 894 1 13 73 1 98 189 7 1470 8 1371 83 175 30 3 5 337 463 5978
1 (Win32) 2 17 1 20 9 1 3 2 58
2 (Unix) 3 2 5 20 3 4 1 7 25 97
2 (Win32) 1 1 2 1 8 1 1 15
Total 237 1058 1 15 94 1 102 229 7 1744 10 1969 105 204 32 5 5 382 537 7383
Microsoft IIS/3 1 1 1 3
IIS/4 64 263 1 2 37 1 10 53 431 72 10 72 6 2 14 1141
IIS/5 71 615 4 1 126 1 3 18 82 1 922 3 378 23 132 2 8 37 58 2620
IIS/6 5 5
Total 136 879 5 3 163 1 4 28 135 1 1355 3 455 34 204 2 14 39 72 3771
Netscape Enterprise/3 1 48 4 2 5 60 32 2 41 10 1 6 216
Enterprise/4 16 1 17 44 3 69 1 7 1 1 169
Enterprise/6 4 4 11 2 14 5 1 43
Total 1 68 4 2 6 81 88 9 127 1 22 2 8 436
Others 15 187 2 26 13 23 6 272 186 3 41 5 2 23 29 899
All Servers 389 2192 8 18 287 4 4 143 393 14 3452 13 2698 151 576 39 6 43 446 646 12489

The author is actively working on scripts to summarise the database. He also intends to make some raw data available. (As the search engine has relatively customisable behaviour, and has been mostly used for indexing an non-public site, some data cannot be published without consideration to privacy.)

Where does it see itself in 5 years' time?

The TODO list for Chrysophylax-Search is roughly as follows:

Copyright (C) 2002, 2003, 2004 Edmund Horner.     $Id: index.html 2412 2007-01-23 03:08:21Z Edmund $