Manual: Create XML Sitemap

The feature "Create XML Sitemap" of Visual SEO Studio, documented in detail.

Create XML Sitemap

The purpose of XML Sitemaps is helping search engines to find quicker new pages or pages that have been updated.
Small websites don't usually really need a sitemap, because search engines already visit them regularly and can find with ease new contents, and most blogs don't update often old pages (but a sitemap does not hurt!).
Other websites are another matter: big e-commerce sites, announcements site and the likes, with thousands of pages or more, do need partial XML Sitemaps to have search engine quickly (re)index new and changed pages that they would take ages to find among the huge quantity of contents.

The best sitemaps should be the ones generated in real-time by the used CMS: they can't be out of date and can generate a correct lastmod attribute. Unfortunately too often an integrated one if available doesn't fit the these basic needs, so you need external XML Sitemap generator able to cherry-pick a selection of pages.

E-commerce sites for example often need to tell search engines there is a subset of pages to be indexed or re-indexed quickly. May be because they are special offers or promotions to be indexed quickly, may be because they changed the price of some articles and want the pages to be updated as soon as possible. They cannot wait for the search engine spider to find those URLs among the thousands within the e-commerce site, and they cannot rely on a sitemap including the whole site URLs.
They need a dedicated XML Sitemap listing those URL and those URL only; Visual SEO Studio sitemap generator was built exactly for that.

Head info

Start URL

The address from where the spider started visiting the website. You will typically insert at the start of a new exploration the website Home Page, usually the "root" address. For explorations of list of URLs the field is not populated.

Session name

You can give your sessions an optional descriptive name. The name can be assigned when choosing the crawl parameters, or at a later time.

Domain / subdomains

An XML Sitemap can only list URLs of pages contained within a single domain / subdomain.
Thus, when you create an XML Sitemap, you have to specify which domain/subdomain you want to create it for. Only the pages within it will be shown in the tree view below to be selected.
When you choose a the domain entry with no subdomain, only the pages within the "naked domain" (i.e. the domain name without the www. part) can be added.

Authority

The combination of protocol, host name and, if different from the default value, port number.

Pages

The number of pages within the related domain/subdomain that can be added to the XML Sitemap.

Tree view

A tree view of the website directory structure.
Each node text is a folder or file name (URL decoded, thus in human-readable form).
For root nodes, the text is the website authority name (the combination of protocol, host name and, if different from the default value, port number).
Single pages or pages within entire folders are selectable with a "check box" aside their node. Selected pages can be added to the resulting XML Sitemap.

Expand/Collapse nodes

The tree view nodes can be expanded or collapsed at pleasure: each node individually by clicking on the +/- symbol aside the tree node, or grouped.
For the latter case, two controls are at your disposal:

  • Clicking on the Expand All button all nodes will be expanded. All nodes expanded is the default state.
  • Clicking on the Collapse All button all nodes will be collapsed and only the root nodes will be left visible.

General options

Use Canonical path

This option is selected by default. We recommend to keep it as such.
When the option is not selected, and the "Skip non-Canonical URLs" option is also not selected, canonicalized pages (i.e. pages with canonical link tag pointing to a different URL) can be added to the table on the right and once exported the used URL will be the non-canonical one.

XML Sitemap should only include canonical pages returning a "200 OK" HTTP status code. Nevertheless, it is possible to use temporary sitemaps including also non indexable or non-canonical pages, or non-"200 OK", to help search engine discover quicker changes page URLs or changes in canonicalization or robots meta tag. This option permits you when unselected to create temporary sitemaps to report pages canonicalized after they were indexed.
In this scenario, you should also uncheck the "Skip non-Canonical URLs" option.

Skip non-Canonical URLs

This option is selected by default. We recommend to keep it as such.
Its effect is that canonicalized pages (i.e. pages with canonical link tag pointing to a different URL) will not be added to the table on the right and consequently not added to the generated XML Sitemap, even if their checkbox is flagged.

XML Sitemap should only include canonical pages returning a "200 OK" HTTP status code. Nevertheless, it is possible to use temporary sitemaps including also non indexable or non-canonical pages, or non-"200 OK", to help search engine discover quicker changes page URLs or changes in canonicalization or robots meta tag. This option permits you when unselected to create temporary sitemaps to report pages canonicalized after they were indexed.
In this scenario, you should also uncheck the "Use Canonical path" option.

Another typical use case is when generating an XML Sitemap from a development/pre-production website which has all URLs canonicalized to the URLs of the production server.
In such case you should uncheck the option or else no URL could be added to the XML Sitemap (and you should keep the "Use Canonical path" option flagged).

Export order

You can choose what order you want pages to be exported, i.e. listed in the XML Sitemap document.
Export order does not affect XML Sitemap working, it is for your own convenience only.

Crawl order

Pages will be listed in the XML Sitemap document in the order they were found during the site exploration.
Since the spider visits the page using a breadth-first algorithm, it should roughly be the order of supposed priority.

Alphabetical order

Pages will be listed in the XML Sitemap document in alphabetical order of URL.

Options: Add Priority

The Sitemap protocol permits to optionally state for each URL a Priority, a value ranging from 0.0 to 1.0 stating the search engine which URLs you deem most important to be crawled first.

Keep in mind that the most largely used search engine - Google - completely ignores the Priority value. Other search engines do officially support it, even if we have reasons to believe they largely discard it.
For these reasons the option is not selected by default.

When the option is enabled, the program will assign each URL a priority based on its Link Depth.
Assigning it individually would be non feasible for XML documents with hundreds or thousands of URLs or more, and using the link depth makes sense when the site has a clean link structure.
For URLs with link depth greater than 2, no priority will be assigned and the default implicit value of 0.5 is assumed by the protocol.
Notice that the priority is meant to be a relative value, assigning the same value to all URLs would not prioritize them. Default values of 0.5 would be omitted even if set in the options.

Home Page

The priority to assign to the URL with link depth zero, the Home Page (assuming it were the start URL).

First level

The priority to assign to the URLs one click away from the Home Page.
In a clean link structure, this normally means the most important area pages linked by the main website menu.

Second level

The priority to assign to the URLs two clicks away from the Home Page.
In a clean link structure, this normally means the most important pages within each website main area.

Options: Add images

Image Sitemaps are an extension by Google of the Sitemap protocol to include for each page the relevant images we want the search engine to index.

The XML Sitemap generator can optionally include images.
Sitemap images are decorated with the descriptions taken from the ALT attribute.

Having to include images for each page implies the tool has to be able to recognize and discard non-relevant images added as decorative elements without providing informational value.
Permitting the user to view and cherry-pick each image for each exported page sounded cool, but given the huge number of images and pages an XML Sitemap could include it would have been impractical.
We had to balance ease of use, high automation and quality result. The solution we adopted is providing two ways to filter out irrelevant image: by lack of alt attribute and by number of occurrences. The two solutions do not exclude each other.

Skip images with no ALT

This options permits to skip images with empty or missing alt attribute, which according to HTML specifications should be left empty for irrelevant images.
This is by far the fastest method.
We recommend fully checking your site images with Images Inspector to ensure all image alt attributes are correct.

Skip images with too many occurrences

This option uses a reference count to detect irrelevant images to skip.
Since the alt attribute is often not populated for relevant images also, the criteria is setting a maximum number of occurrences to discriminate relevant images, and if it is exceeded, the image is considered irrelevant.
The empirical criterion stems from the observation that decorative images tend to be used many times across a website, while content images only once or just a few times.
Used threshold is set by the Relevant images max references option.
In case of large sites this method can be significantly slower.

Relevant images max references

Sets the threshold to be used for the Skip images with too many occurrences option.

Page list

Within this grid appear all pages selected in the tree view at the left which satisfy the chosen export option.
They are the pages that would be exported to the XML Sitemap.

Path

The resource path (URL decoded, thus in human-readable form).

Canonical

The path of the canonical URL to be used for indexing (URL decoded).
When the "Use Canonical path" option is flagged (default behavior), it is normal this value is identical to the value in the Path column.

Depth

The depth of the page in the site link structure, also known as "link depth", i.e. the number of clicks needed to reach it starting from the Home Page.
When the "Add Priority" option is flagged, link depth is used to assign a priority value.

Visit Nr.

Indicates the progressive number during the crawler exploration.

Export XML Sitemap

Clicking on the Export XML Sitemap... button will pop up a folder dialog box asking you where to save the XML Sitemap to be generated.
Choose the desired folder and file name (XML Sitemaps can have any name you want) and the generator will create the XML Sitemap.

Don't forget to upload the newly generated XML Sitemap to the production web server.
XML Sitemaps are normally located in the root folder of the domain/subdomain they refer to, but can also be placed in a subfolder if they only list URLs within that subfolder.
Then, you would probably also want to submit sitemap to your Google Search Console account (or the equivalent services provided by other search engines). Refer to the search engine documentation to know how to do it.

Selected Pages

The field states the number of pages listed in the table on the right, that will be exported into the XML Sitemap.