X-Site-Scan Usage

Basically X-Site-Scan is a modified version of Site-Scan, therefore similar rules apply. However, because it was specifically designed to verify your external links there are some differences which will be explained in the following. If you are not familiar with the usage of Site-Scan we strongly recommend to read this section first before you commence here.

Your email-address (required)

Same as with Site-Scan, this is the email address where the link for accessing the results will be mailed to.

Your start-URL (required)

While Site-Scan is designed to generate sitemaps for your website and requires you to enter the root-URL (or domain-name) for your website, X-Site-Scan also permits you to only scan single webpages, parts of your website or the entire website. Together with the traversing option the start-URL is the decisive factor which part(s) of your website will be scanned, as explained later on under "Allow traversing".

Include robots.txt (optional, default: Yes)

This effect of this option also differs slightly from Site-Scan. While Site-Scan will extract links from webpages which are disallowed by the rules in robots.txt, but exclude these webpages from your sitemap, X-Site-Scan will completely ignore disallowed pages and not extract any internal or external links from them. This might change in future, but at the moment we think it is the most practical solution. And as with Site-Scan, you can avoid any interference with other robots crawling your website by defining a special set of rules for user-agent "X-Site-Scan".

Include meta tags (optional, default: Yes)

If selected, X-Site-Scan will completely ignore all links on pages which contain the robot meta tags "Nofollow" or "None". Additionally, links found on pages not containing these meta tags but containing the rel="nofollow" attribute, which is supported by most major search-engines, will also be ignored. If deselected, these meta tags and attributes have no influence on the scanning process.

Allow traversing (optional, default: Yes)

The only option not available for Site-Scan. If traversing is disallowed, X-Site-Scan will only extract the external links from the start-URL given by you, i.e. from a single webpage. If traversing is allowed, X-Site-Scan will follow and crawl all internal links pointing to webpages in the same directory and subdirectories. Some examples:

  • Traversing disallowed:

  • Start-URL: http://myDomain.com/myPage.html
  • > X-Site-Scan only extracts links from this webpage

  • Start-URL: http://myDomain.com/
  • > X-Site-Scan will only extract links from the document served for this request, most probably something like http://myDomain.com/index.html

  • Traversing allowed:

  • Start-URL: http://myDomain.com/myDir/myPage.html
  • > X-Site-Scan will follow links and crawl webpages located in http://myDomain/myDir/, http://myDomain/myDir/mySubdir1/, http://myDomain/myDir/mySubdir2/ etc.

  • Start-URL: http://myDomain.com/
  • > X-Site-Scan will crawl your entire website

Please note: Whether traversing is allowed or disallowed has no influence on the other options selected, e.g. if you request a scan with traversing allowed and your start-URL points to a webpage containing the robot meta tag "Nofollow", X-Site-Scan will not follow any internal URLs if the option "Include meta tags" was activated. The same applies to rules set in your robots.txt file.

So after you decided on your start-URL and the options to use you might start X-Site-Scan.

Server powered by
Scan-technology by M-Press Systems