Site-Scan Results

This section gives you detailed information about how to interpret the results you received from your Site-Scan, what steps might be necessary to get exactly the Google sitemap with the content you are looking for and what you can do to improve the accessibility of your webpages.

Scan-Statistics:

The scan statistics give you a brief overview about the scanning process:

  • Scan started:

    The time when Site-Scan first accessed your server, making it easy to locate Site-Scan's activities in your logs. According to RFC2616 origin servers must include a date header field in their response, if it is not included or it is in an invalid format Site-Scan will use its own time and mark the displayed date red. If your server replied with a valid date header field, but the time differs by more than one minute from the time of our server, which is synchronised with an official time server, Site-Scan will mark the date orange and you should consider adjusting your server's time.

  • Server software:

    This value shows how your server identified itself to Site-Scan. Please note that here less is better than more, giving too much information about the modules used and their respective version might enable hackers to exploit known vulnerabilities in your software, thus helping them to compromise your server.

  • Server protocol:

    Your HTTP-version supported by your server. While most servers nowadays support HTTP/1.1, there are still a number of servers using HTTP/1.0.

  • Server address:

    Your server's IP-address and port number.

  • Server location:

    Your server's geographical location. While there are several options to resolve it, we decided to use the free GeoLiteCity database, available from http://www.maxmind.com/, as this offers the advantage that the required data is available from a database located on our server. Depending on whether your IP is contained in the database it will show City + Country, Country only or "Unknown". For further information please refer to maxmind's website.

  • Keep-Alive:

    Tells you whether your server supports persistent connections or not. If Keep-Alive is enabled, the overhead for requesting URLs is greatly reduced, making the scanning-process considerably faster and more efficient, for more details please refer to RFC 2616 available form our download page. If Site-Scan reaches the maximum scanning time while crawling your website and Keep-Alive is disabled, we strongly recommend to enable Keep-Alive, please refer to your server's documentation.

  • URLs requested:

    The total number of URLs Site-Scan requested from your server while crawling your site. While Site-Scan issues a HEAD-request for each URL, it will only issue a GET-request for URLs which point to content it can parse.

  • URLs scanned:

    The actual number of URLs pointing to resources which Site-Scan could parse and therefore issued a GET request in order to download and scan the content.

  • Data scanned:

    The amount of data Site-Scan processed while crawling your website. This value does not show the actual amount of data transmitted via the TCP/IP-connection, as it does not include HTTP-headers etc.

  • Time required:

    The time Site-Scan needed from issuing the first request until pinging your file-not-found response, the last request issued by Site-Scan.

  • Average throughput:

    The average throughput during the scanning process in KBits/s. This value is depending on several different factors, like available bandwidth, number of scan-engines currently active, network routing etc, and therefore does not necessarily reflect the response time of your server. Furthermore, it will decrease considerably if your scan results show that most of the URLs requested were not of the content-type text/html, due to the required overhead in communication. Please also refer to the point "Keep-Alive".

Note: If the values for URLs requested or Time required are displayed in red, it means that Site-Scan reached the respective limit for this value and aborted the scan, therefore your scan results are most likely incomplete. Please refer to the Site-Scan news for the limits currently applicable. If Keep-Alive is disabled and Site-Scan reaches the maximum scanning time before finishing the scan, we strongly recommend to change the settings of your server in order to permit persistent connections.

Scan-Options selected:

Displaying the actual options selected by you when requesting for the scan:

  • Include default filters:

    Whether your scan was performed with applying the default filters or not. If the "Include default filters" option was selected Site-Scan ignored all URLs which ended with certain registered MIME file-extensions, a list of the file-extensions currently excluded can be found on our webpage Site-Scan news.

  • Include meta tags:

    Whether your website was scanned with or without obeying the robot meta tags found on your webpages. The option you selected before starting the scan is influencing the way Site-Scan is crawling your website, please refer to Site-Scan usage.

  • Include robots.txt:

    This section gives you detailed information whether you selected to include the rules set in your robots.txt file, whether Site-Scan found a robots.txt file on your server and which user-agent and corresponding rules were used for the scan.

    If Site-Scan was unable to locate the robots.txt file on your server, all further information will be hidden. If you deselected "Include robots.txt" before starting Site-Scan, no rules will be displayed. If Site-Scan found the robots.txt file and you selected to include it, it will show the user-agent it used for the scan and the corresponding rules it extracted. Please note that Site-Scan will only display the rules it is actually using, i.e. "Disallow", all other rules set by you will be ignored and are not displayed.

    Note: When parsing your robots.txt file, Site-Scan will first try to find the user-agent "Site-Scan", whereby the spelling is treated case-insensitive. If it cannot find it, it will look for the rules defined for user-agent "*", the anonymous user-agent. If these rules are not defined either, no rules will apply to the scan. For further information please refer to using Site-Scan, robots.txt file.

Detailed Scan-Results:

Here you can find a list of all URLs requested during the scan. In addition it shows for each URL whether it was included into your new sitemap or not, and - if not - for what reason the URL was excluded. In order to give you an fast overview of your scan results, each cell is displayed in a specific colour code, making it easy to spot possible problems even with a large number of URLs scanned. In the following section we will explain how to interpret these results.

  • URLs:

    This column contains an alphabetically sorted list of all URLs Site-Scan scanned while crawling your website. Currently the following colour codes are available for this value:

    Green: (The URL is included in your sitemap) A green colour code means that Site-Scan crawled the URL and the document it is pointing to does exist, and that it is not excluded from adding it to your new sitemap via robots.txt or meta tags.

    Orange: (The URL is included in your sitemap) An orange colour code means that Site-Scan crawled the URL and the resource it is pointing to does exist, and that it was not excluded from adding it to your new sitemap via robots.txt or meta tags. However, Site-Scan found other URLs pointing to identical (duplicate) content, which is bad news as it might be punished by some search-engines. Typical reasons causing this problem:

    • Your webpages contain links pointing to your index page including the filename, e.g. http://www.myDomain.com/index.html. In this case Site-Scan will list your root-URL, e.g. http://www.myDomain.com/, as well as this URL, but both point to the same webpage. You should replace all these links with links to your root-URL, furthermore you might consider to install a permanent redirect (301) in case other websites contain these pointing with the full path to your index-page.

    • You are using custom error pages, but these error pages do not return the correct status code, instead they return "200 - OK". E.g., if Site-Scan comes across broken links and is redirected to these custom error pages, it will receive the same content for each broken link found. Please make sure that your custom error pages return the correct HTTP status code.

    • You renamed some of your webpages and installed a permanent redirect for requests for the old URLs, but you forgot to update these URLs on some of your own webpages. In this case the old URLs and the new URLs will point to the same content, please update all links on your webpages.

    Red: (The URL is not included in your sitemap) A red colour code means that the URL was not included into your sitemap for one of the following reasons:

    • The HTTP status code is different from "2xx", the respective file couldn't be found

    • The respective URL is excluded by the rules set in your robots.txt file for the given user-agent

    • The webpage for this URL included the robot meta tag "noindex" and is therefore excluded

    If you wanted this URL to be included, check why it has been excluded and modify your settings so that it will be included during the next Site-Scan.

    Note: If your results indicate a problem for a certain URL, Site-Scan makes it very easy to trace down the source of the problem. For example, if the results show a broken link, indicated by the status code "404", simply click on the respective URL and Site-Scan will display a new webpage containing a list of all your webpages which contain links to this URL. Furthermore, in case Site-Scan located URLs pointing to identical (duplicate) content it will display a second list including all URLs pointing to this content. If you want to check the html-code of any webpage listed here, simply click on the respective URL and the page will be displayed in a new window, you can now use the relevant function of your browser to display the source-code of the page and search it for possible errors or typos.

  • Status code:

    This column shows the HTTP status code Site-Scan received from your server when it tried to access the resource for this URL. Typical HTTP status codes showing up in the Site-Scan results include:

    • 200 - OK: The request was successful. If the requested document was not of the type "text/html", it means that Site-Scan requested the entity-header fields for the respective resource, in case the document was of the type "text/html" Site-Scan additionally requested and received the corresponding entity completely.

    • 206 - Partial Content: In order to avoid excessive scanning times, Site-Scan limits the amount of data downloaded when issuing a GET-request for a resource. If the size of the requested resource exceeds the limit set, Site-Scan will only process the resource partially, which will then be indicated by a HTTP status code 206, meaning that the respective webpage was not scanned completely.

      Note: Depending on the server platform and the actual settings some servers reply to these requests with different responses, e.g. certain servers reply with 206 even though the requested resource was completely downloaded, while other servers reply with 200 even if the actual size of the resource exceeded the limit and was therefore not completely downloaded. If in doubt you can check the actual size of the resource in question with our HTTP response header tool.

    • 401 - Unauthorized: Requests for the respective URLs require user authentication, therefore neither Site-Scan nor search-engine have access to the respective entities. URLs which point to these resources are excluded from the sitemap.

    • 404 - Not Found: The server couldn't locate any resource which matches the requested URL. This status code usually identifies broken links, to identify the webpages containing the broken links simply click on the respective URLs in your results page, and Site-Scan will display all webpages containing links to this URL from the database which was built during the scanning process.

    • 500 - Internal Server Error: The server encountered an unexpected condition and was unable to fulfil the request. As this errors sometimes occur sporadically, Site-Scan will re-issue the request for a limited number of times, so in case you see this code in your results there might be a persistent problem with your server.

    For a detailed explanation of all possible HTTP status codes and their meaning please refer to section 10 of RFC 2616, available from our download page (this document defines the HTTP/1.1-standard). Currently the following colour codes are available for this value:

    Green: (The respective URL is not excluded from your sitemap) A green colour code means Site-Scan received status code "200", request succeeded for the respective URL, and it can therefore be included into your sitemap.

    Orange: (The respective URL is not excluded from your sitemap) An orange colour code means Site-Scan received status code "206", partial content for the respective URL (the content-length of the file exceeded the limit set by Site-Scan and was therefore not completely scanned). It is possible that not all links were extracted from this document, however it is included in the sitemap.

    Red: (The respective URL is excluded from your sitemap) A red colour code means Site-Scan received anything else but status code "2xx", therefore the request for the respective URL was not successful and it is excluded from your sitemap. The actual status code Site-Scan received is listed in the cell, explaining why it couldn't access the file.

  • Content:

    This column shows the content-type of the file the respective URL is pointing to. Please note that this value has no influence on whether an URL is included or excluded from your sitemap, but rather shows you whether your server correctly identified the respective resource and whether Site-Scan actually parsed its content. Currently the following colour codes are available for this value:

    Green: (The content of the resource was parsed) A green colour code means that Site-Scan could parse the content of this resource, and therefore downloaded and processed it. This is currently only the case for files of the type "text/html".

    Orange: (The content of the resource was not parsed) An orange colour code means Site-Scan could not parse the content of this resource, and therefore didn't download and process the content. However, it successfully requested the header for this file and therefore could confirm its existence.

    Red: (The content of the resource was not parsed) A red colour code means Site-Scan could not identify the content-type of the resource because because it is not specified or the resource the URL is pointing to could not be located on the server (e.g. HTTP response code 404), therefore the content was not downloaded and processed.

    Note: If a link contains a "type"-attribute and the content of this attribute is a valid registered MIME-type Site-Scan will use this declaration and overwrite any content-type declaration found in the server response headers when requesting for this URL. In order to show you it used the content-type declaration from the "type"-attribute it will flag it with [T].

  • Charset:

    For certain media types it is possible to label the content with a charset in order to allow user-agents to correctly interpret the content of the resource. In case of the media subtype "text", the charset used for the content should be registered with IANA to make sure that most user-agents can properly interpret and / or display the content of the resource. Site-Scan will not only show which charset your server mapped to the resource, but in case it is of the media subtype "text" it will also verify that the charset used is registered with IANA and help you to locate possible typos. And in order to be transparent it will also let you know whether the charset is currently supported by Site-Scan and therefore the content was properly decoded before it was parsed. As with "Content" this value has no influence on whether an URL is included or excluded from your sitemap. Currently the following colour codes are available for this value:

    Green: (The content is labeled with an IANA registered charset) In case the content-type is of the subtype "text", a green colour code means that the data was labeled with a charset, the charset is registered with IANA and Site-Scan recognised the charset and was therefore able to decode the data before parsing it. For content types other then the "text" type this value will always be displayed in green, independent from whether it was labeled with a charset or whether the charset is registered with IANA.

    Orange: (The content is labeled with an IANA registered charset) An orange colour code means that the content was labeled with a charset and the charset is registered with IANA. However, Site-Scan currently does not support this charset and was therefore unable to decode the content before parsing it. A special case is when a resource of the media subtype "text" is not labeled with any charset. In this case Site-Scan will decode the content with the default ISO-8859-1 charset as per RFC 2616, but please make sure that this is the actual charset used for the encoding of this resource.

    Red: (The content is labeled with a not IANA registered charset ) A red colour code means that the content was labeled with a charset which is not registered with IANA, in this case chances are high that most user-agents (including search-engines) will have problems to decode and interpret the content. Site-Scan could will also display this value in red if the resource the URL is pointing to could not be located on the server (e.g. HTTP response code 404), in this case the value displayed will be "Unknown".

    Note: Besides displaying the charset used for decoding the respective content and verifying it against IANA-registered charsets, Site-Scan will also flag the charset with its origin. Possible flags for the origin of the charset-definitions in order of priority are:

    • [S]: The origin of the charset-definition is the server response header. Most server platforms allow to define default charsets based on content-type, file-extensions, directories and even for single URLs. For example, if you are using Apache you can use the "AddDefaultCharset" and "AddCharset" directives for this purpose. If a charset definition is found in the server response header this definition has the highest priority and overwrites all other charset-definitions found.

    • [BOM]: The origin of the charset-definition is a Byte Order Mark (BOM). Documents encoded in UTF-8, UTF-16 and UTF-32 are containing a 2-3 bytes right at the beginning of the document, defining the actual encoding used. These are known as a Byte Order Mark or Unicode Signature, and are compulsive for documents encoded in UTF-16 and UTF-32, while it is optional for documents encoded in UTF-8. Site-Scan currently supports UTF-8, UTF-16LE, UTF-16BE, UTF-32LE and UTF-32BE, whereby LE stands for Little Endianness and BE for Big Endianness. If a BOM is found, this charset-definition will be used if none was defined in the server response headers.

    • [D]: The origin of the charset-definition is a document meta tag. Documents based on HTML usually contain a charset-defintion in a meta tag located in the head section, e.g. "<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />". While this is the most commonly used origin for the charset-defintion, it has the lowest priority compared to the other two origins.

  • Robots.txt:

    This column shows whether rules are defined within robots.txt for the respective URL, and whether they allow it to be included or not. Currently the following colour codes are available for this value:

    Grey: (The respective URL is not excluded from your sitemap) A grey colour code means that you excluded the rules set in your robots.txt file by deselecting "Include robots.txt" before starting the scan. This has the same effect as if no robots.txt file could be located, and the respective URL is not blocked from being included into your sitemap, which is in line with the robots exclusion standard.

    Green: (The respective URL is not excluded from your sitemap) A green colour code means that Site-Scan found the robots.txt file, you selected to include the rules found in there, and the rules did not disallow the inclusion of the respective URL into your sitemap.

    Orange: (The respective URL is not excluded from your sitemap) An orange colour code means that Site-Scan couldn't locate the robots.txt file, even though you selected "include robots.txt". In case you uploaded a robots.txt file to your server, make sure the spelling of the filename is correct (all lowercase!) and that you uploaded it into the root-directory. The respective URL is not blocked from being included into your sitemap, which is in line with the robots exclusion standard.

    Red: (The respective URL is excluded from your sitemap) A red colour code means that Site-Scan found the robots.txt file, you selected to include the rules found in there, and that the rules disallowed the inclusion of the respective URL into your sitemap.

  • Index:

    This column shows the results when scanning for the "index / noindex" meta tag on the webpage for the respective URL. Currently the following colour codes are available for this value:

    Grey: (The respective URL is not excluded from your sitemap) A grey colour code means that you deselected "Include meta tags" before starting Site-Scan, the cell will show the value "Not selected" and the URL is not excluded from your sitemap.

    Green: (The respective URL is not excluded from your sitemap) A green colour code means that Site-Scan found the meta tag "index" on the respective webpage, and the URL is not excluded from your sitemap. We also show this value in green if we didn't find the "index" meta tag, but the document is not of the type "text/html".

    Orange: (The respective URL is not excluded from your sitemap) A orange colour code means that Site-Scan didn't find the meta tags "index" or "noindex" on the respective webpage, and the URL is not excluded from your sitemap. Basically the same like "Green", we show it in orange because we recommend to use "index, follow" by default on every webpage, unless you really want to use "noindex" or "nofollow".

    Red: (The respective URL is excluded from your sitemap) A red colour code means that Site-Scan found the meta tag "noindex" on the respective webpage, therefore the URL is excluded from your sitemap.

    Note: In case you are using the alternative "all" or "none" directives in your robot meta tags, Site-Scan will translate them into "index, follow" or "noindex, nofollow" respectively and also show these values in your scan results.

  • Follow:

    This column shows the results when scanning for the "follow / nofollow" meta tag on the webpage for the respective URL, which controls how Site-Scan handles the links found on this webpage. Currently the following colour codes are available for this value:

    Grey: (The links on the respective webpage are not excluded from crawling) A grey colour code means that you deselected "Include meta tags" before starting Site-Scan, the cell will show the value "Not selected" and the links found on the respective webpage will not be excluded from being crawled.

    Green: (The links on the respective webpage are not excluded from crawling) A green colour code means that Site-Scan found the meta tag "follow" on the respective webpage, and the the links found on the this webpage will not be excluded from being crawled. We also show this value in green if we didn't find the "follow" meta tag, but the document is not of the type "text/html".

    Orange: (The links on the respective webpage are not excluded from crawling) An orange colour code means that Site-Scan neither found the meta tag "follow" nor "nofollow" on the respective webpage, and the links found on the this webpage will not be excluded from being crawled by this value. Basically the same like "Green", we show it in orange because we recommend to use "index, follow" by default on every webpage, unless you really want to use "noindex" or "nofollow".

    Red: (The links on the respective webpage are excluded from crawling) A red colour code means that Site-Scan found the meta tag "nofollow" on the respective webpage, and all links found on the this webpage will therefore be discarded and excluded from being crawled.

  • Archive:

    This column shows the results when scanning for the "noarchive" meta tag on the webpage for the respective URL, which actually has no effect on the way we are crawling your website, neither on whether the URL is included or excluded from your sitemap. We only show it here fore the sake of completeness, and to give you maximum control over the way search-engines are indexing and storing your webpages. Currently the following colour codes are available for this value:

    Grey: A grey colour code means you deselected "Include meta tags" before you started the Site-Scan, therefore Site-Scan doesn't scan for this meta tag and the value is unknown, we can't tell you how search-engines will handle this webpage.

    Green: (Search-engines will archive this webpage) You selected "Include meta tags" for the Site-Scan, but Site-Scan didn't find the "noarchive" meta tag on the respective webpage. Therefore search-engines which support caching of webpages will cache the content of this URL.

    Orange: (Search-engines will not archive this webpage) You selected "include meta tags" for your Site-Scan, and Site-Scan found the "noarchive" meta tag on the respective webpage. Therefore search-engines will not cache the content of this URL.

  • C-f-H (Clicks-from-Homepage):

    When humans visit your website they tend to loose patience when they have to follow a large number of links before they can view the information they are looking for. Search-engines nowadays honour this by rating webpages lower if they are hidden deeply inside your navigation and therefore are difficult to reach so you should try to keep your website flat. Our "Clicks-from-Homepage" column shows you the minimum number of links a visitor of your website has to follow from your homepage before he / she can display the respective resource in a browser. The threshold-value used here for displaying a warning is taken from the recommendations for the MSN search-engine which considers everything more than 3 clicks away as "too far". Therefore currently the following colour codes are available for this value:

    Green: (The given resource can be reached with 3 clicks) The given resource can be reached when following a maximum number of 3 links when starting from your homepage, therefore the should be no problem with the ranking of the given resource by search-engines.

    Orange: (The given resource cannot be reached with 3 clicks) The given resource cannot be reached when following a maximum number of 3 links when starting from your homepage, therefore there is a risk that search-engines will rank the given resource lower.

    Note: Our "Clicks-from-Homepage" value is based on viewing webpages in a browser. This means if, for example, somebody takes a look at your homepage all resources embedded in this page and displayed in a browser are 0 clicks away, because they are shown together with this webpage, even though a search-engine would have to follow another link to access this resource. So if you follow a link from your homepage to another webpage all resources displayed with this webpage are 1 click away, however if this webpage contains a thumbnail with a link to the full picture this would be 2 clicks away, and so on, and so on...

  • File-Not-Found response:

    Even though it is located in the very last row of the results table this is a very important piece of information and tells you whether Site-Scan and other robots like search-engines can properly crawl your website. When requesting an invalid (non-existent) URL from a server, the server should reply with a "404 - Not Found" response to tell the user-agent that this URL is not pointing to any valid resource on this server. However, many websites use custom error documents which return a "200 - Okay" response, which is okay for human visitors which can see the error message in their browser, but is misleading for robots. For example, if a search-engine would come across an invalid URL and the request returns code 200 instead of 404 the search-engine would index this URL even though it would only display a custom error page. In order to avoid this problem many search-engines, including Google, check the file-not-found response for each server crawled and might abandon servers having this problem to avoid indexing of invalid URLs.

    Site-Scan checks the file-not-found response by means of a randomly build URL, so there is very little chance that this URL actually exists on your server. Furthermore, because some servers are additionally wrongly configured, Site-Scan will issue a HEAD as well as a GET request for this URL and display the results for both requests to verify your server settings. If this value is not displayed in green make sure you sort out this problem before requesting another scan because it might lead to the inclusion of invalid URLs into your sitemap!

Note: Site-Scan only issues a GET-request for content it can actually parse, for all other resources it will only issue a HEAD-request in order to reduce the amount of data transferred and the time required for the scan. Depending on your server settings, the actual content of the resource etc, GET and HEAD requests might lead to different results for certain response header fields. If in doubt you should use our HTTP response header tool, which not only allows you to send GET or HEAD requests for the resource in question, but also permits you to to check whether your server replies correctly to a conditional "If-Modified-Since" request.

Google Sitemap:

This section shows the data Site-Scan was able to retrieve from your old sitemap (if any), as well as the number of URLs and optional attributes which are included into your new sitemap. Furthermore, you can view and download your newly generated Google sitemap. For detailed information on how to register and submit your sitemap with Google, please refer to the Google Sitemaps page.

  • Existing sitemap:

    This part shows you whether Site-Scan was able to find an existing sitemap, how many URLs are listed in this file, and whether the optional attributes <lastmod>, <changefreq> and <priority> are included. Possible values for "Found existing sitemap" include:

    • sitemap.xml: Site-Scan found an existing uncompressed sitemap on your server, the content is valid and any available values for <changefreq> and <priority> will be copied over to the new sitemap if selected.

    • sitemap.xml.gz: Site-Scan found an existing compressed sitemap on your server, the content is valid and any available values for <changefreq> and <priority> will be copied over to the new sitemap if selected.

    • Size exceeded: Site-Scan found an existing sitemap on your server, but the content length of the sitemap exceeded the limit currently set for Site-Scan. This is rather unlikely if the number of pages on your website are within the limits set for Site-Scan, however a simple workaround in this case is to keep a compressed version of your sitemap on your server. When searching for an existing sitemap, Site-Scan will first try to locate and use a compressed version, so you can keep both versions, compressed and uncompressed, on your server if required.

    • Invalid: Site-Scan found an existing sitemap on your server and it was within the size limits, however when parsing your old sitemap an error occurred. This is usually caused by missing opening or closing tags or by the use of improperly escaped XML-entities. Most modern browsers support parsing of XML-documents and will show you possible errors in your existing sitemap. Another common reason for this problem are custom error pages which do not return the correct HTTP status code.

    • No: Site-Scan was unable to detect an existing sitemap on your server, therefore no values for optional attributes are available for the generation of your new sitemap.

    Besides displaying the sitemap status and the number of URLs contained in your existing sitemap, Site-Scan also analyses the optional attributes for each URL and displays whether they are included or not. Possible values include:

    • No: This optional attribute is not used for any URL listed in your current sitemap.

    • Partially: This optional attribute is used for some URLs listed in your existing sitemap, but not for all.

    • Yes: This optional attribute is used for all URLs listed in your current sitemap.

  • New sitemap:

    This section shows you how many files were skipped from the inclusion of scanned links based on the scan results, how many URLs are listed into your new sitemap(s), and which optional attributes you selected to include before starting the Site-Scan.

    Note: If you selected "Include old sitemap" from the Site-Scan options and you also selected to include the optional attributes <changefreq> and / or <priority>, Site-Scan will check for each scanned URL whether it is included in your existing sitemap. If it is, it will copy over the old values for this optional attributes if available. This means if you didn't add any new pages or renamed certain URLs, your new sitemap will include these values together with the latest modification date, so you can upload the new sitemap to your server without any modifications necessary. If you added new URLs, Site-Scan will set the values for <changefreq> to "never" and <priority> to "0.5", making it easy for you to find newly add pages in your sitemap and modify these values if required. Should your sitemap include <priority> values with more than 1 digit after the decimal point, Site-Scan will round them to the nearest value. Please note that you can also select your own default values for <changefreq> and <priority> before starting the scan.

  • Download your Sitemaps:

    Here you can find links for viewing and / or downloading the different sitemaps generated by Site-Scan, all formats are available in a compressed as well as an uncompressed version. In general we recommend the use of the compressed version, especially in case of larger sitemaps, in order to minimize the bandwidth required by Google and other search-engines to download your sitemap. In case you are running Windows and you need to decompress your sitemaps for manual modifications or you would like to compress your uncompressed sitemap before uploading it on to your server we recommend the use of 7-Zip, available from our download page. Currently the following sitemap-formats are available:

    • Google Sitemaps: The xml-sitemaps format as used by Google and Yahoo! and defined on the website www.sitemaps.org. The sitemap-options available for this format are <lastmod>, <changefreq> and <priority>. Just save it in the root-directory of your website and inform Google about the URL for the sitemap.

      Note: For compatibility-reasons the general XML-entities are encoded using the "&entityName;"-format, while all other control and non-ASCII characters are encoded in the numeric (hexadecimal) format. This is also the case for the ROR/RSS sitemap-feed generated by Site-Scan as it ensures that it can be viewed directly in various browsers.

    • ROR/RSS sitemap-feed: A sitemap in the RSS 2.0 syndication-feed format with additional information contained in elements using the ROR sitemap namespace-extension, this format is accessible for all search-engines which process RSS 2.0 syndication-feeds. In order to make it available upload it to the root-directory of your website and add the following line to the <head>...</head> section of your index-page (e.g. http://www.myDomain.com/index.html):

      <link href="ror.xml" rel="alternate" type="application/rss+xml" title="ROR/RSS Sitemap" />

      In addition to the standard elements supported by Google for RSS 2.0 sitemap-feeds this format uses the <ror:updated> element for the modification-date and the <ror:updatePeriod> element for the change-frequency, the priority value is currently not included, for further information please refer to rorweb.com.

    • Yahoo! URL-list: A plain text-file containing the URLs of your website, this format does not support any of the optional attributes. As with the other sitemaps all URLs are listed in alphabetical order to make it easier to search for certain URLs in order to ensure that it contains all URLs required. For more information about the Yahoo! URL-list file and how to submit it, please refer to the relevant Yahoo! webpage Submit Your Site (note that the usage of this service requires a Yahoo! ID).

      Note: For compatibility-reasons the URL-list files generated by Site-Scan are encoded in UTF-8. In case your server should map a different charset to files with the extension .txt and your URLs are containing non-ASCII characters you should convert the file into the appropriate charset. We therefore recommend that you check the content of your URL-list in a browser to verify that your URLs are displayed properly before uploading it on to your server. Furthermore all files generated by Site-Scan are in the UNIX-format, which can lead to problems when viewing them on Windows systems or uploading them on to Windows-based servers. In this case you can convert them into the required DOS-format by using the software-tool Treptos, available from our download page.

Server powered by
www.ipserverone.com
Scan-technology by M-Press Systems