In this section you can find the latest news, updates and changes regarding our online-utilities. Please note that this website is a work in progress and we are constantly adding new features to it, mainly based on the feedback from our users. Any changes will be announced here first in order to keep you up-to-date, so should you experience changes in the results while using our online-tools please check out this section for any relevant news.

Currently the following limitations apply to Site-Scan and X-Site-Scan:

Maximum number of URLs: 2000
Maximum total scanning time: 1800 seconds
Timeout limit per request: 30 seconds
Minimum delay between scans: 60 minutes
Maximum data per request: 250 kBytes
Content-types currently parsed: text/html

URLs excluded by default filters, sorted by MIME-classes:

application/*: *.ai, *.bin, *.class, *.dll, *.dmg, *.dms, *.eps, *.exe, *.gtar, *.jar, *.js, *.lha, *.lzh, *.ps, *.so, *.swf, *.tar, *.zip
audio/*: *.au, *.kar, *.mid, *.midi, *.mp2, *.mp3, *.mpga, *.ra, *.ram, *.snd, *.wav
image/*: *.bmp, *.cgm, *.gif, *.ief, *.jpe, *.jpeg, *.jpg, *.png, *.tif, *.tiff, *.wmf
text/*: *.css
video/*: *.avi, *.mov, *.mpe, *.mpeg, *.mpg, *.qt, *.wmv

Latest news:


Topic: Site-Scan now creates ROR/RSS sitemap-feeds

Site-Scan now generates sitemap-feeds in the RSS 2.0 format with additional usage of the ROR (Resource Of Resource) sitemap namespace-extension. Therefore you can use these sitemap-feeds for Google and all other search-engines, with the additional information for modification date and update-period made available using the respective ROR namespace-extension. As usual Site-Scan translates the native character encoding of your webpages into UTF-8, so you can use these feeds for websites encoded in Chinese, Japanese, Cyrillic etc while making sure the title and content of your webpages are displayed properly in the generated feeds.


Topic: Added new option "Include default filters " to Site-Scan

Because a number of people are unfamiliar with modifying their robots.txt file or are reluctant to do so we added a new option "Include default filters" to Site-Scan. If selected Site-Scan will skip certain URLs based on registered MIME file-extensions, making sure only URLs which belong in to your sitemap are included. Files excluded by the default filters include certain applications, images, audio- and video-files etc., a more detailed description of this new option can be found on our webpage on using Site-Scan. The MIME extensions currently excluded are listed on top of this page, we might alter this list in future based on your feedback.

Topic: Added "Clicks-from-Homepage" to the Site-Scan results

Visitors of websites tend to loose patience when they have to follow too many links before they can view the information they are looking for. Some search-engines honour this now and will rank pages higher which are easy to reach when starting from your homepage. In order to show you how deep each resource is located within your navigation we added the "Clicks-from-Homepage" value, or "C-f-H" in short. It displays the minimum number of links you have to follow when starting from your homepage in order to view the respective resource in a browser. In case this value exceeds 3 Site-Scan will display a warning, we recommend you ensure that it is possible to navigate to your important website-content without exceeding this limit.

Topic: X-Site-Scan now uses "referer" when requesting external links

Because some external URLs require a referring URL in order to response correctly to a request X-Site-Scan now uses the URL of the webpage from where the external link was extracted for this value. Please note that X-Site-Scan requests each external URL only once and removes all anchor-fragments from any external URLs found before issuing the request.

Topic: Use of dynamic fonts in cascading style sheets

With the increasing number of information gathered by our online-tools viewing the results on screens with smaller resolutions is increasingly difficult. In order to improve the readability of our results as well as our webpages we modified our CSS-files which are now mainly using dynamic fonts instead of fixed font-sizes. For best viewing we recommend browsers which allow for easy adjustment of the font-size, from our tests we know that especially Opera is doing a very good job as it zooms text and graphics together without disturbing the basic page-layout.


Topic: Conversion to XHTML 1.0 Strict

In an effort to ensure clean coding of our webpages and inter-operability with different browsers we upgraded our website to XHTML 1.0 Strict. This does not only include the normal webpages, but also all error and message pages as well as the result-pages produced by our online tools. Should you face problems with the rendering of the new pages we strongly recommend to check for the newest version of your browser, in case this doesn't solve your problem please feel free contact us, informing us about your OS, browser-type and version.


Topic: Load-indicator added

We added a load indicator to the pages Site-Scan start and X-Site-Scan start which shows the percentage of currently active scan-engines relative to the maximum number of scan-engines permitted. Should you intend to request a scan for a large website or should you know from previous scans that the connection between your and our server is rather slow we recommend to wait until the actual server load drops below approximately 50 % before starting the scan, you can update the load-indicator by using your browser's refresh button.


Topic: New tool added

We added a new online-utility, called X-Site-Scan. Like Site-Scan it will crawl your website and extract the links from all webpages scanned, however instead of verifying your internal links and generating a Google sitemap for you this tool collects and verifies all external links found on your website. The results generated by this tool include a list of all external links found together with their status and some additional useful information. And it will not only show you exactly which of your external links need to be double-checked or updated, it also gives you detailed information in case an error occurred while requesting an URL instead of displaying a generic code "500", further information can be found on the X-Site-Scan main page and the related webpages.

With the introduction of this tool it was necessary to apply some changes to Site-Scan as both engines share the same database and some other related software. We also used this opportunity to streamline certain portions of the code and to remove some bugs, as well as to modify the layout of our website. Should you face any problems which occurred after these changes please contact us.


New major release

We now added support for the rel="nofollow" attribute, which is also supported by the major search-engines. Because this required us to completely rewrite the parsing module in order to keep Site-Scan flexible for further extensions, we also used the opportunity to add a couple of other additional features:

Topic: Attribute rel="nofollow"

As mentioned above, Site-Scan now supports the rel="nofollow" attribute. When it comes across a tag containing this attribute, it will ignore any link within this tag. This filter is active by default, however it can be deactivated by deselecting "Include meta tags" from the Site-Scan options before starting the scan.

Topic: Duplicate content

Besides locating and displaying broken links Site-Scan will now also find any URLs on your website which point to identical / duplicate content. This might not only lead to reduced ranking of the respective webpages, but also to incomplete indexing of your website depending on the search-engine in question. Please refer to our webpage on Site-Scan results how to identify and display these URLs.

Topic: Charset origin

The charsets used by Site-Scan for decoding the content of your webpages will now not only be displayed on your result pages, but also be flagged with their origin, based on Mozilla-compatible priorities. This will help you to verify that user-agents are using the correct charset for the decoding and display of your webpages and to trace down any problems related to server settings, Unicode-signatures or document meta tags.

Topic: Link extraction

With the new parsing module in place we added a number of tags from which Site-Scan will extract the links while crawling your website. This might lead to a substantial increase of URLs listed in your results, mainly because we added the <img>-tag. In order to control the content of your sitemap and to reduce the number of less important URLs displayed in your results please read the section about filter-settings on our webpage explaining how to use Site-Scan.

Topic: Canonical and non-canonical URLs

The earlier versions of Site-Scan automatically corrected any URLs extracted from your website to the form of the root-URL given by you when requesting the scan. In order to support detection of duplicate content we removed this feature, Site-Scan will now display the canonical as well as the non-canonical form of each URL if found on your webpages, helping you to locate absolute URLs or base tags containing the unwanted form of your domain name.

Note: We are aware of the fact that there is a good chance that the new version of Site-Scan might contain some bugs, due to the number of modifications we applied to it. Should you come across any problems please send us your feedback.


Topic: Scan results

In order to give you some additional information about your server we added the following points to the Site-Scan statistics: Your server's IP-address and port number, your server's geographical location and whether your server supports keep-alive for connections or not. A detailed description of these values can be found on our Site-Scan results page. Furthermore Site-Scan now also generates a compressed version of the Yahoo! URL-list, available as urllist.txt.gz.

Note: We also used this opportunity to upgrade some software-packages on our server. Should you face any problems while using Site-Scan or should you notice any changes in your Site-Scan results please send us your feedback.


Topic: New tool added

We added a new tool which enables you to check the Google PageRank™ for your webpages. While this information is also available from the Google toolbar, which you can download for Firefox and Internet Explorer, this tool - as all our other online-utilities - does not require you to install any software on your computer and works browser- and OS-independent, you can try the tool here.


Topic: Scan results

We added support for Unicode-signatures, also known as BOMs (Byte Order Marks). If Site-Scan comes across documents containing a BOM it will use this as the base for decoding the content, ignoring any other charsets which might be defined in the HTTP response headers. To make it easy for you to identify the actual source of the charset used, charsets retrieved from Unicode-signatures are marked on the result page with the flag [BOM]. This update should also rectify problems with incomplete retrieval of meta tag content and links from pages which contain a Unicode-signature and additionally have a charset defined in the respective HTTP response header fields.

Topic: HTTP response header check

Besides enabling support for Unicode-signatures, in case of a GET request this tool will now also flag the origin of each HTTP response field, i.e. [S] for server generated fields, [D] for fields generated from meta tags inside the document and [BOM] for charsets retrieved from Unicode-signatures, making it easier to check your server settings and html-code in case of any problems.


Topic: Sitemap generation

We corrected a bug which caused Site-Scan to copy over <changefreq>-values from the old sitemap even if they contained uppercase characters, which are not valid enumerations. Site-Scan now not only checks for the correct spelling of these values, but will also automatically convert them to lowercase if required.


Topic: New tool added

In order to enable you to take a look at your server's actual HTTP response headers and to directly check the effects of changes made to your webpages or server settings without scanning your whole website again we added a simple and easy-to-use tool. This tool allows to send a GET or HEAD request for a single URL and displays the content of your server's HTTP response header for this request, together with the response code and, in case of a redirection, the target-URL. In order to give it a try it please proceed to our HTTP header webpage.

We also used this opportunity to streamline certain portions of our scan-engines' code as well as to shorten a number of URLs. Should you come across broken links or notice changes in your Site-Scan results please contact us.


Topic: Scan results

In addition to the content-type Site-Scan will now also display the charset mapped to each URL's entity. This parameter is used for some media types to specify the charset used for the content, however for media subtypes of the "text" type (and therefore for all webpages based on "text/html") the charsets used should be limited to the ones registered with IANA. Site-Scan will not only show which charset your server is mapping to each entity, in case the URL points to data of the "text" type it will also verify whether the charset used is actually registered with IANA, making sure the content can be interpreted by HTTP/1.1-compliant user-agents including search-engines.


Topic: Scan results

Site-Scan will now also send you an email if a scan request was aborted, giving you the actual reason for the abortion. This will help you to identify and solve possible problems with your website.


Topic: Scan options

Site-Scan now permits to save the scan-options selected by you for your next session, limited to a maximum time of 90 days. As the storage of these options is based on a cookie stored by your browser, make sure your browser-settings permit the storage of cookies from our website in case you want to make use of this feature.


Topic: Scan results

Besides generating a Google sitemap, Site-Scan now also generates an URL-list file for submitting your website with Yahoo!. This file contains exactly the same URLs as your Google sitemap, but in a plain text format. Please note that the file is encoded in UTF-8, therefore it might be necessary to convert the charset of the file in case your server maps a different charset to the file extension .txt and your URLs should contain non-ASCII characters.

On our webpage Site-Scan results you will find more detailed information about this new feature.


Topic: Scan results

A newly added feature greatly simplifies the troubleshooting of your website in case your Site-Scan results identified any possible problems:

  • While crawling your website, Site-Scan builds a database-table containing all URLs scanned as well as all webpages on your site containing links to these URLs.
  • If your Site-Scan results indicate a problem for a certain URL, simply click on this URL in your result page and a new page listing all webpages containing links to this URL will open.
  • If you click on any URL listed on this page, the respective webpage will be opened in a new window and you can check your html source-code by using the respective function of your browser.

On our webpage Site-Scan results you will find more detailed information about this new feature.

Topic: Scan options

In order to keep your result pages slim and to reduce the amount of data transferred while scanning your website, you now have the option not only to tell Site-Scan to skip URLs from being scanned by means of file-extension, but also via file- or directory-name. This feature is based on the standard robot exclusion rules, however the respective entries in your robots.txt file have to start with "./" (dot-slash) instead of a single slash. Please note that this feature is only available if you set up a separate set of rules for user-agent Site-Scan, our section using Site-Scan contains a detailed example on how to use this new feature.


Topic: Scan behaviour

In order to make Site-Scan even more flexible and to simplify further development of this utility, we applied some major changes to the scan-engine core:

  • Site-Scan will now first check the content-type for each collected URL, and will only request the content of those files it is able to parse, which is currently limited to "text/html".
  • Because of the above, it is no more necessary to block certain file-extensions by default in order to limit the required bandwidth, giving you absolute freedom in deciding which URLs you want to appear in your sitemap.
  • As this could lead to an excessive number of unwanted URLs showing up in your scan results, you can now tell Site-Scan to completely ignore URLs based on disallowing file-extensions in your robots.txt file.
  • As a result of the above we are now able to increase the limits applying to Site-Scan, thus making it suitable for an increased number of websites.
  • Because we significantly increased the maximum total scanning time, the scan delay will now start once the request is actually taken over by one of the scan-engines, thus avoiding expiry of the scan delay before the request was actually processed.

For a more detailed explanation of these new features, how they effect the scanning process and how to adjust your Site-Scan settings please refer to using Site-Scan.

Topic: Scan results

The following information were added to the Site-Scan results:

  • Average throughput (located in the Site-Scan statistics): This value shows you how many KBit/s Site-Scan was able to process when scanning your website, giving you an indication about the speed and quality of the connection during the scan.
  • Content (located in the detailed Site-Scan results table): This column contains the content-type for each URL scanned, showing you whether your server correctly defined the media-type of the URL and whether Site-Scan parsed the content of this URL.
  • In case Site-Scan was automatically redirected by the server when trying to access the root-URL specified by you, the results will now also display that a redirect took place and for which root-URL the scan was originally requested.

For more details about the interpretation of your scan results please refer to Site-Scan results.


Topic: Scan behaviour

Site-Scan now permits an initial redirection to an external root-URL when requesting the root-URL given by you. Please note that in case such a redirection is detected, Site-Scan will update its internal database with the root-URL it is redirected to, and the scan results as well as the Google sitemap will contain only URLs based on this new root-URL.


Topic: Blocked extensions

The following extensions have been added to the list of file extensions blocked by default: .avi, .exe, .zip

Topic: Scan options

Site-Scan now permits to select the default values for <changefreq> and <priority>, which will be used in case old values for this attributes are not available from an existing sitemap. This is especially useful if a sitemap is created for the very first time.


Topic: General

Release of Site-Scan beta for public use. Your feedback is welcome in order to verify the correct function of Site-Scan on different server platforms and technologies. Before notifying us about any problems, please read the section using Site-Scan thoroughly to check for known issues, and include your exact root-URL, the time and date of the scan and a detailed description of the problem when contacting us.

Server powered by
Scan-technology by M-Press Systems