Site-Scan

Site-Scan is not only an online Google sitemaps generator with numerous scanning-options and flexible filter settings, but it will also collect a number of valuable additional information about your webpages while crawling your site. Main features include:

  • As an online-utility Site-Scan does not require command-line access to your server, neither does it require you to install any software on your server, desktop or notebook. All what is required is a browser and an email address to request for a scan, view the results and download your sitemap.

  • For the same reason, Site-Scan is platform and OS independent.

  • It creates sitemaps in several formats, currently including the Google sitemaps protocol, as ROR/RSS-syndication-feeds and the Yahoo! URL-list text format. All Sitemaps are available in compressed and uncompressed versions.

  • The crawling process and the content of the generated sitemap is controlled by flexible filter settings, either by rules set in your robots.txt file and / or by the inclusion of the meta tags found in your webpages and links. And in order to avoid any interference with other robots crawling your site you can define a special set of rules for the user-agent "Site-Scan" in your robots.txt file.

  • Site-Scan crawls your site without any limitation to the number of sub-directories, the only limitation is the maximum permitted length for URLs.

  • Site-Scan's intelligent crawling algorithms minimise the required bandwidth, i.e. it will only request the content of resources which it can actually parse.

  • You can choose what additional attributes to include in your sitemap, and what default values you want to use for them.

  • If selected, Site-Scan will parse your existing sitemap and use the values found for <priority> and <changefreq> for the generation of your new sitemap. This means that if you only updated some webpages the new sitemap is ready for use without requiring any manual changes.

  • While crawling your website Site-Scan collects valuable information about each webpage scanned, e.g. content-type, charset, source of charset-definition, content of robot meta tags and whether an URL is allowed or disallowed by the rules set in your robots.txt file.

  • If Site-Scan comes across broken links it will flag these URLs in your results. Additionally, a database is generated during the scanning process, if the results show a broken link simply click on the URL and Site-Scan shows you all webpages containing links to this URL.

  • Site-Scan also flags URLs which point to duplicate content. As with broken links, simply click on the URL and Site-Scan will list all other URLs pointing to a resource with duplicate content.

  • Site-Scan also shows you some general statistics about your server, like software version, physical location, protocol used, whether Keep-Alive is enabled or not etc.

  • Last but not least, Site-Scan offers mechanisms against abuse, i.e. you can block other people from requesting a scan for your website.

But because it is an online utility Site-Scan also has some unavoidable restrictions:

  • In order to take care of our CPU and memory resources we have to limit the maximum number of URLs, the scanning time and the data requested for each resource, the currently active limits can be found on our news webpage.

  • While Site-Scan can process data in excess of several MBit/s, the actual scanning speed for your server depends on a number of different factors, e.g. routing issues, current load on the internet, response times of your server etc. Furthermore Site-Scan and X-Site-Scan spawn more than 400 background-processes when fully loaded, and all these processes have to share the connection from our server to the internet.

Because of the above limitations Site-Scan is mainly targeted at small to medium size websites, generation of sitemaps for large websites is far more effectively done with software-utilities running directly on the respective webserver.

First-time users:

Before you start Site-Scan, we strongly recommend you thoroughly read the section Site-Scan usage on this website in order to generate a Google sitemap with the content you are looking for. Please note that our Site-Scan gives you several options for the scanning of your website and the generation of your sitemap, and using them effectively requires some basic preparations from your side. Once you have prepared your website, you can start Site-Scan.

Regular users:

Before starting another Site-Scan to generate your Google sitemap consider to take a look at our Site-Scan news in order to keep yourself up-to-date with any changes and to fully utilize the latest improvements we implemented.

Server powered by
www.ipserverone.com
Scan-technology by M-Press Systems