Downloads

On this page you can download a selection of software-tools and other resources which are either useful in connection with our online-utilities and the results generated by them, or which we find helpful for general CGI-programming. We will extend this collection in future as required.


Treptos - a simple DOS-to-UNIX and UNIX-to-DOS text file converter

Treptos is a freeware utility for Windows designed to convert text files from the DOS-format into UNIX-format and vice versa. All files generated by Site-Scan, e.g. the Google sitemap and the URL-list file, are in the UNIX format, i.e. each line is terminated by "\012". Because DOS respectively Windows are using "\015\012" as the new-line terminator this can lead to problems when you want to view or modify these files on a Windows system, or if your server requires them to be uploaded in DOS-format. With Treptos you can easily convert these text files from one format into another, which also makes it a valuable tool if you develop and deploy your scripts on different target systems.

Download: trep100.zip

Note: As the Mac OS is using "\015" as the new-line terminator, this might lead to additional problems. Should you come across a freeware utility to convert UNIX-style text files into the Mac-format please feel free to contact us, we will add it here if found suitable.


7-Zip - a file archiver with high compression ratio

7-Zip is a software utility for compressing / decompressing files on Windows systems, distributed under the GNU LGP-license. You can use this tool for handling the compressed version of the Google sitemap, sitemap.xml.gz as generated by Site-Scan. The following additional formats are supported by 7-Zip:

  • Packing / unpacking: 7z, ZIP, GZIP, BZIP2 and TAR
  • Unpacking only: RAR, CAB, ISO, ARJ, LZH, CHM, Z, CPIO, RPM, DEB and NSIS

Besides offering support for various formats and archiving high compression ratios, it offers a comprehensive command line version. This enables the use via system-calls from CGI-scripts, shell-scripts and other programs, making it a powerful Windows replacement for the gzip / gunzip commands on UNIX systems.

Download: 7z442.exe (Windows 32-bit)

Note: Other versions of 7-Zip, including Windows MSI-packages and x64 versions, are available directly from www.7-zip.org.


RFC2396 - Uniform Resource Identifiers (URI) : Generic Syntax

This RFC (Request For Comment) document describes the grammar and basic functionality of URIs. As the Google sitemaps project and therefore Site-Scan are mainly about URIs / URLs, it gives you detailed information about their actual meaning, their syntax and how they work.

Download: RFC2396.pdf


RFC2616 - Hypertext transfer Protocol -- HTTP/1.1

This RFC (Request For Comment) document contains the specifications for the hypertext transfer protocol 1.1 (HTTP/1.1). The way Site-Scan crawls your website and processes the information gathered from it are based on these specifications. It contains detailed information about HTTP messages, status code definitions, header fields etc, giving you the necessary background information to understand how Site-Scan collects the information displayed on the result page and what is the actual meaning behind them.

Download: RFC2616.pdf


Code snippets

Here you can find a collection of useful snippets related to CGI-programming, Perl and PHP. While Site-Scan runs on an Apache server under Linux, the development and testing of our software and dynamic webpages takes place on a Windows-platform, using both Apache and Microsoft-IIS servers. While it might seem that this only causes additional headache, it actually ensures that our scripts are fully portable and can be used on the platforms mentioned above without any modifications required.

PHP-pages, modification date and "If-Modified-Since" requests

Several search-engines, including Google and Yahoo!, are sending requests including the "If-Modified-Since" header field in order to avoid downloading content which didn't change since the last crawl, thus avoiding wastage of bandwidth. However, most servers ignore this conditional field when serving PHP-pages and therefore cause the search-engines to download the complete entities of the requested URLs. A second problem is the that by default the "Last-Modified" field is not included in the response header, making it impossible for search-engines (and Site-Scan) to determine the actual modification date of the page. When running PHP as an Apache-module this problem can be solved by simply including "last_modified = 1" into php.ini, but this doesn't work when running under IIS.

The following snippet solves both of the problems mentioned above server-independent, as it relies on environment variables rather than on Apache-specific functions. The first part compares the date of the "if-Modified-Since" field with the actual time the source file was last modified and replies with a "304 - Not Modified" response where applicable. The second part adds the "Last-Modified" field to the response header, again based on the actual modification date of the source file. In case your server already added a "Last-Modified" field to the response header it will simply overwrite this field without causing any harm.

Note: When using this code make sure you insert it at the very top of your page, before any document-type declaration or the <html> tag!

<?php
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']) && getlastmod() <= strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
header("HTTP/1.1 304 Not Modified");
exit;
}
$last_mod = gmdate('D, d M Y H:i:s \G\M\T', getlastmod());
header("Last-Modified: $last_mod");
?>

PHP and custom error pages

A common problem are custom error pages which do not return the correct HTTP response code to the client, e.g. they are returning "200 - OK" instead of "404 - Not Found" when the requested URL in fact was not found on the server. This does not only cause Site-Scan to possibly display invalid results, but it might also cause search-engines to index invalid URLs for your website. Many search-engines, including Google, therefore check whether your server replies with the proper response code and might stop indexing your webpages if not. In order to add the proper HTTP response code to the header of your custom error pages you can can use the header()-function again, e.g. if you set up a custom error page for "404 - Not Found" simply insert the following lines, as before, right at the top of your page:

<?php
header("HTTP/1.1 404 Not Found");
?>

However, this solution requires you to set up a separate custom error page for each possible error and therefore requires a bit of work. If you are running your website on Apache (at least on one of the newer versions) you can set up a single custom error page for all possible error codes, and furthermore display the actual problem on this webpage itself by using the "REDIRECT_STATUS" and "REDIRECT_URL" environment variables, which store the original error code and the request-URL causing the error, example:

<?php
$errors = array('400' => 'Bad Request',
'401' => 'Unauthorized',
'402' => 'Payment Required',
'403' => 'Forbidden',
'404' => 'Not Found',
'405' => 'Method Not Allowed',
'406' => 'Not Acceptable',
'407' => 'Proxy Authentication Required',
'408' => 'Request Timeout',
'409' => 'Conflict',
'410' => 'Gone',
'411' => 'Length Required',
'412' => 'Precondition Failed',
'413' => 'Request Entity Too Large',
'414' => 'Request-URI Too Long',
'415' => 'Unsupported Media Type',
'416' => 'Request Range Not Satisfiable',
'500' => 'Internal Server Error',
'501' => 'Not Implemented',
'502' => 'Bad Gateway',
'503' => 'Service Unavailable',
'504' => 'Gateway Timeout',
'505' => 'HTTP Version Not Supported');
if($_SERVER['SERVER_PORT'] == 443) {
$protocol = 'https://';
}
elseif($_SERVER['SERVER_PORT'] == 80) {
$protocol = 'http://';
}
$error_url = $protocol . $_SERVER['SERVER_NAME'] . $_SERVER['REDIRECT_URL'];
$error_num = $_SERVER['REDIRECT_STATUS'];
$error_msg = $errors["$error_num"];
header("HTTP/1.1 $error_num $error_msg");
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title><?=$error_num?> - <?=$error_msg?></title>
...
</head>
<body>
When requesting the URL <?=$error_url?> the following problem occurred: <?=$error_num?> - <?=$error_msg?>
...
</body>
</html>

Note: When setting up custom error pages using these snippets do not forget to add the required redirect to the custom error pages to your server configuration, either via the "Custom Errors" tab under IIS or by using the "ErrorDocument" directive under Apache!

Server powered by
www.ipserverone.com
Scan-technology by M-Press Systems