Manual: URL Suggestions

The feature "URL Suggestions" of Visual SEO Studio, documented in detail.

URL Suggestions

The URL Suggestions feature gives you detailed insights on all possible URL issues normally ignored by other SEO auditing tools, focusing in particular on URL canonicalization issues.

Summary

The Summary tab sheet give you an overall perspective of the reports available in URL Suggestions.
You can quickly select it anytime, even when not it is not visible, by clicking on the Show Summary link.

Reports table columns

Description

The descriptive name of the report. The text is an active link that once clicked will select the tab sheet containing the related report.

Pages

The number of pages affected by the issue detected by the report, i.e. the number of pages listed in the report.

Tot Pages

The total number of pages taken into account when elaborating the reports. This number is the same for all the listed reports.

Percentage

The percentage of pages detected by the specific report, computed as the ratio between the previous two values.

Keep in mind that same reports make binary decisions to decide whether to catalog a page as affected or not by an issue, using a fixed (configurable) threshold and seeing whether the specific measured dimension exceeds it or not.
In such cases the percentage says nothing about how the dimension is distributed. Take for example the report "Too many query parameters": it will tell you - say - all URLs with more than 2 query parameters (e.g. ...?par1=a&par2=b&par3=c), and from their number you have a percentage. But how many of them have none? How many have far too many?
To give you a much better idea of the distribution, every report which uses a threshold to be computed comes with a dedicated Histogram view. You can visualize it in the bottom pane.

Pie chart icon

A mini pie chart visually reporting two important information:

  • The percentage shown in the previous column, related to the colored pie slice.
  • The alert level of the issue investigated by the report. It is reported by the color of the slice:
    • Red: the issue investigated has to be considered an Error.
    • Yellow: the issue investigated has to be considered a Warning.
      Please notice that a warning is not a "light error", but something that at this stage the program cannot determine if it is a potential error or something wanted.
    • Azure: the report is just Informational.

As previously said for the Percentage column, reports which use a threshold to be computed are better evaluated by looking also at the Histogram bottom pane to understand how the measured dimension is distributed.

Items break-down table

This table gives you an overview of all pages - computed and discarded - in the evaluated crawl session.

Items break-down graph

The 3D pie chart displays visually the content of the table above it. Like all 3D graphs in Visual SEO Studio, the chart can be zoomed, rotated, copied and saved at pleasure.

Export buttons

Reports in URL Suggestions can all be exported.
In the upper-right corner of each report tab sheet you can find easy to spot export buttons:

  • Open in Tabular ViewPermits to open the listed pages in Tabular View as a subset of the whole crawl session (and from there can also export them to Excel/CSV).
  • Export to ExcelPermits to export the content of the shown columns to an Excel document.
    This option is available only when the view in the tab sheet is a table (and it is also available from the table context menu).
  • Export to CSVPermits to export the content of the shown columns to a CSV file.
    This option is available only when the view in the tab sheet is a table (and it is also available from the table context menu).

Context menu

Every report in URL Suggestions provides a context menu you can trigger by right clicking with the mouse on a page row:

  • Copy URL
    Copies in the clipboard memory the URL of the selected resource.
  • Browse URL
    Navigates with the default browser the URL of the selected resource.

URL canonicalization reports

Canonicalization errors should always be fixed, because they could prevent the correct indexing of the web pages by the search engines.
Visual SEO Studio provides here a comprehensive set of reports dedicated to URL canonicalization, to pinpoint all potential canonicalization issues.

Canonicalized to noindex pages

Entry type: Error Error

This report detects the following scenario:
A page - suppose this is pageA - is "canonicalized"; it means it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...a URL to pageB..." />

where pageB is not indexable (because it has robots meta tag or the equivalent HTTP header set to "noindex":

<meta name="robots" content="noindex" />

The two directive are giving conflicting signals to the search engine:
The first says the pageA is just a copy of pageB and it's this last one's URL to be indexed, the second says pageB should not be indexed.
What the search engine will do is not predictable, it might discard the canonical and index pageA with its own URL, or it could take the most restrictive directive and avoid indexing both pageA and pageB.

How can you fix the issue on the reported pages:
Either one or both the two directives are likely wrong, sort it out what you exactly want and fix the wrong directive if you want pageA to be indexed.

Canonicalized to HTTP issue URL

Entry type: Error Error

This report detects the following scenario:
A page - suppose this is pageA - is "canonicalized"; it means it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...a URL to pageB..." />

where the URL of pageB returns a HTTP status code differing from "200 OK".

The site is giving conflicting signals to the search engine:
The canonical link tag says the pageA is just a copy of pageB and it's this last one's URL to be indexed, but pageB URL is not indexable because of its HTTP status code.
What the search engine will do is not predictable, it might discard the canonical and index pageA with its own URL, or it could take the most restrictive directive and avoid indexing pageA.

How can you fix the issue on the reported pages:
First you should look at what pageB status code actually is:

  • A HTTP 3xx status code, for example a HTTP 301 "Permanent redirect".
    You likely should update the canonical link tag with the end URL of the redirection.
  • A HTTP 4xx status code, for example a HTTP 404 "Not Found" is clearly an error.
    Either the canonical link tag is wrong, and you should fix it, or pageB was deleted in error.
  • A HTTP 5xx status code, for example a HTTP 500 "Server error".
    The server error could be a temporary situation, but you definitively have to investigate it and have it fixed.

Canonicalized to non-HTML page

Entry type: Warning Warning

This report detects the following scenario:
A page - suppose this is pageA - is "canonicalized"; it means it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...a URL to resourceB..." />

where resourceB is not a web page (its content type is neither "text/html" nor "application/xhtml+xml").

Having a canonical link tag pointing to a non-HTML resource is perfectly fine, but really unusual, so much that very often it is the result of an incorrect setup, that's way the program reports is as a Warning, something to double check.
It usually happens with resources that you have both in HTML and PDF format. You permit crawling also the PDF resource, but to avoid duplicate content issues you use a canonical HTTP header.

Link: <...URL to the HTML version...>; rel="canonical"

...but of course there are cases where the webmaster (or the SEO person) decided to index the PDF version instead, and canonicalized the HTML page to the PDF file. In such case it would not be an error.

How can you fix the issue on the reported pages:
First thing, check if it actually is an error. Then if it is do fix the URL in the canonical link tag to point to the correct resource.

Canonicalized to non-canonical page

Entry type: Error Error

This report detects the following scenario:
A page - suppose this is pageA - is "canonicalized"; it means it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...a URL to pageB..." />

where pageB is also canonicalized to yet another resource pageC:

<link rel="canonical" href="...a URL to pageC..." />

The two directive are giving conflicting signals to the search engine:
The first says the pageA is just a copy of pageB and it's this last one's URL to be indexed, the second says pageB should not be indexed because it is just a copy of pageC.
What the search engine will do is not predictable, it might discard the canonical and index pageA with its own URL, or do something else.

How can you fix the issue on the reported pages:
Either one or both the two directives are likely wrong, sort it out what you exactly want and fix the wrong directive if you want pageA to be indexed.

Canonicalized to non-crawled URL

Entry type: Error Error

This report detects the following scenario:
A page - suppose this is pageA - is "canonicalized"; it means it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...a URL to pageB..." />

where pageB cannot be crawled by a search engine, usually because the robots.txt file is blocking it.

The site is giving conflicting signals to the search engine:
The canonical link tag says the pageA is just a copy of pageB and it's this last one's URL to be indexed, but pageB URL is not crawlable because of the block in the robots.txt file (actually it is possible for a robots.txt blocked resource to be indexed, but this scenario is most certainly a mistake).
What the search engine will do is not predictable.

How can you fix the issue on the reported pages:
Either one or both the two directives are likely wrong, sort it out what you exactly want and fix the wrong directive if you want pageA to be indexed.

Canonicalized to out-of-session URL

Entry type: Information Information

This report detects the following scenario:
A page - suppose this is pageA - is "canonicalized"; it means it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...a URL to pageB..." />

where pageB was not crawled by Visual SEO Spider because either it was part of another domain or because the crawl process stopped before it could.

The program cannot investigate if there is a problem in the URL canonicalization, not having visited the canonical URL.

How can you fix the issue on the reported pages:
Check the canonical URL manually, or do segment the website exploration to have the program do it for you.

Canonicalized and noindex

Entry type: Warning Warning

This report detects the following scenario:
A page - suppose this is pageA - is "canonicalized"; it means it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...a URL to pageB..." />

but at the same time pageA is not indexable because it has robots meta tag or the equivalent HTTP header set to "noindex":

<meta name="robots" content="noindex" />

The two directive are giving conflicting signals to the search engine:
The first says the pageA is just a copy of pageB and it's this last one's URL to be indexed, the second says pageA should not be indexed.
What the search engine will do is not predictable; in our experience Google takes the most restrictive directive and avoid indexing pageA (but this is not a documented behavior and thus nothing guarantees it will work the same in the future).

We decided to report this issue as a Warning and not as an Error because it is a common scenario in case of usage of self-referring canonical URL applied by default to all pages. The noindex is usually set deliberately but the used CMS does not remove in such case the canonical link tag. The intention of the webmaster to "noindex" the page is nevertheless respected, at least in Google case.

How can you fix the issue on the reported pages:

  • If the canonical link tag is "self-referring" (i.e. it points to the same page it is applied to), and you cannot change your CMS behavior, you can most likely safely ignore the issue.
  • If the canonical link tag is NOT "self-referring" (i.e. it points to a different URL), either the canonical link tag is wrong, or the "noindex" is. Remove the wrong one after having examined the case.

Without canonical tag

Entry type: Warning Warning

This report (previously named "Missing Canonical tag") lists the pages were there is not canonical link tag (self-referring or not).

Not having a canonical link tag in a page is perfectly fine: the page URL will be considered the default canonical URL. Nevertheless, we strongly recommend using a self-referring canonical URL (i.e. a canonical link tag with a URL pointing to the very same page it is applied on) in such cases, to avoid accidental issues of duplicate content if a page were linked - internally or externally - with additional querystring parameters.

For example, if the page /page.html were also linked as /page.html?par=true, a search engine would see two distinct URLs responding with the same content. A self-referring canonical link tag would prevent a duplicate content issue to occur.

Canonical not matching URL

Entry type: Information Information

This report finds all "canonicalized" pages:
A page - suppose this is pageA - is "canonicalized" when it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...a URL to pageB..." />

where pageB has a different URL from pageA (we are thus excluding self-referring canonical link tags).

Canonicalized URLs are perfectly fine; since URL canonicalization is often done wrong, it is useful having a list of all canonicalized pages with their canonical URL so you can double check.

Canonical not matching URL (Casing only)

Entry type: Error Error

This report detects the following scenario:
Suppose you have a page https://wwww.example.com/page.html which has the following canonical link tag:

<link rel="canonical" href=">https://wwww.example.com/PAGE.html" />

It is very likely we are facing a mistake here. URL paths are case-sensitive, any change in upper/lower case letter leads to a different resource, according to the specifications.
Some web server - the most notable is MS IIS - go against the protocol and consider URLs as case-insensitive, but a search engine would comply to the specifications and see two distinct resources. These situations could lead to duplicate content issues.

How can you fix the issue on the reported pages:
Either the crawled URL or the canonical URL are wrong.

  • If the URL in the canonical link tag is correct, it means the spider followed a link pointing to a URL with an error in casing, and being the hosting web server case-insensitive this lead to the same resource instead of a HTTP 404 "Not Found" error. In this case, locate the "Referrer URL" and correct the broken link there.
  • If the crawled page is correct, the one in the canonical link tag is not, and should be fixed.

Duplicate Canonical tags

Entry type: Information Information

This report groups by canonical URL all pages canonicalized to the same URL.

A page is "canonicalized" when it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...the address URL-X..." />

In the example all pages with a canonical link tag pointing to the address URL-X would be grouped under the URL-X node.

Canonicalized pages are perfectly fine; since URL canonicalization is often done wrong, it is useful having a list of all canonicalized pages grouped by canonical URL so you can double check.
Pages canonicalized with a self-referring URL (the canonical pages) are not grouped under the URL node, only their logical copies are.

Duplicate Canonical tags (case insensitive)

Entry type: Information Information

This report groups by canonical URL all pages canonicalized to the same URL, without caring of casing difference in the canonical URL.

A page is "canonicalized" when it has a canonical link tag pointing to another URL:

<link rel="canonical" href="...the address URL-X..." />

In the example all pages with a canonical link tag pointing to the address URL-X (or url-x, or any other permutation in upper/lower case letters) would be grouped under the URL-X node.

Example of issue spotted with the report
Example of issue spotted with the report

Canonicalized URLs are perfectly fine; since URL canonicalization is often done wrong, it is useful having a list of all canonicalized URLs grouped by their canonical URL so you can double check.
Pages canonicalized with a self-referring URL (the canonical pages) are not grouped under the URL node, only their logical copies are.

Very similar to the Duplicate Canonical tags report, it permits to spot errors in letter casing in the canonical link tag URL path.

URL structure reports

The following reports help detect potential issues in the existing URL structure.
Changing an existing - indexed and positioned - URL should always be done with caution, because you should set the proper HTTP 301 ("Moved Permanently") redirection from the old to the new URL once you changed it. Search engines periodically re-crawl already indexed web page, and if instead of a HTTP 301 redirection they find a HTTP 404 ("Not Found") status code, the indexed web page would lose all the value previously gained in the eye of the search engine.

Duplicate URLs (case insensitive)

Entry type: Warning Warning

This report groups pages by crawled URL, without caring of casing difference in the URL.

According to the official specifications, URLs are case-sensitive in the path part (not in the protocol and domain name part) and should be compared as such by web clients (browsers and web spiders), but some servers (like MS IIS) or some CMS could resolve internally URL paths differing in casing only with the same web page, leading to duplicate content issues seen by search engines.
The report helps locating pages such pages, normally found by the spider because it followed internal links with wrong upper/lower case characters in the URL path part.

Note: usage of the canonical link tag would solve duplicate content issues, but not potential crawl budget waste problems caused by the wrong internal links.

Example of duplicate URLs, likely caused by wrong internal linking
Example of duplicate URLs, likely caused by wrong internal linking

How can you fix the issue on the reported pages:
Locate under the URL node the nested pages that have been crawled using the wrong upper/lower case combination, then find the links pointing to that upper/lower case combination of the URL, and fix them.
The easiest way to find all links to a given URL is using the Links Inspector feature; users of the free limited Community Edition can easily locate the Find referrer link to URL context menu entry available in any of the main views to find the referrer link followed by the spider.

URL too long

Entry type: Warning Warning

The HTTP protocol does not put a limit in the number of characters forming a URL, but physical limitations do exist:
Some web servers limit to a maximum length (older ones even down to 255 chars, but that's something of the past); the XML Sitemaps protocol wants URLs to be no longer than 2048 characters, and many other systems have been discovered in the past to impose practical limits on URL length.

All that said, a URL is no longer an invisible thing used by the communication protocol only: it is shown in clear by browsers in the address bar, and by search engines in their SERPs. They are now supposed to be "human friendly" (other than "search engine friendly"), easy to read and able to communicate to the user the page mission.
URLs too long would not be optimal under this aspect, so SEO professionals and agencies need tools to detect and enforce internal policies they adopt on the length of URLs. This report is dedicated to such task.

Used threshold, at the time this guide was written, is "Longer than 90 chars". Like many others, it can be customized in the program Options, reachable via the program main menu Tools -> Preferences... entry.
Over time with changes in the SEO world, we update default threshold to keep you up-to-date with the ever evolving market.

Histogram showing distribution of URL length values
Histogram showing distribution of URL length values

Note: the Histogram bottom panel gives you an overall view on how the "URL length" dimension is distributed among the whole pages in the crawl session.

URL too deep

Entry type: Warning Warning

With "URL depth" we mean the number of folders in the URL path. Folders in URL paths are useful to help categorize content, to split it by language, to ease reading URLs.
Too many folders can nevertheless be "too much", especially considering that if you want to put a focus keyword in the web page name, it will appear at the end of the URL; the longer the folders part, the more distant from the domain name, but under a SEO perspective you'd prefer the keyword part to be closer to the left part.
SEO professionals and agencies need tools to detect and enforce internal policies they adopt on the "depth" of URLs. This report is dedicated to such task.

Used threshold, at the time this guide was written, is "Deeper than 4 folders". Like many others, it can be customized in the program Options, reachable via the program main menu Tools -> Preferences... entry.
Over time with changes in the SEO world, we update default threshold to keep you up-to-date with the ever evolving market.

Histogram showing distribution of folder depth values
Histogram showing distribution of folder depth values

Note: the Histogram bottom panel gives you an overall view on how the "URL depth" dimension is distributed among the whole pages in the crawl session.

Too many query parameters

Entry type: Warning Warning

Query parameters are a part of the URL following the name of the resource, e.g.:

...?par1=a&par2=b&par3=c

They are used for several purposes: drive a dynamic web page behavior, tracking and monitoring page visits, recognizing affiliate partners, and so on.
One of the basic SEO rules is "URLs should be SEO-friendly" (other than "user-friendly"): without useless parameters, the less the better, and if possible no parameters is even better. But for many websites due to technical reasons getting rid of query parameters is not an option, and you at least should try to keep them to the bare minimum.
This report helps SEO professionals and agencies enforcing internal policies on the number of URL query parameters.

Used threshold, at the time this guide was written, is "With more than 2 parameters". Like many others, it can be customized in the program Options, reachable via the program main menu Tools -> Preferences... entry.
Over time with changes in the SEO world, we update default threshold to keep you up-to-date with the ever evolving market.

Histogram showing distribution of number of query parameters
Histogram showing distribution of number of query parameters

Note: the Histogram bottom panel gives you an overall view on how the "number of query parameters" dimension is distributed among the whole pages in the crawl session.

Too many tokens

Entry type: Warning Warning

There is evidence that only up to 16 words are taken into account in a link anchor text (source updated to 2018).
What has it to do with URLs? Turns out anchor texts are often made by full URLs, which are tokenized in single words up to the above mentioned limit (sources - in German - are dated 2009, and based on an old version of the previously cited article which detected the limit as 8 words, now superseded with the 16 words value).
The takeaway is supposed to be: if you want to squeeze the most of your URL in terms of anchor text juice, keep it within the first 16 tokens.

Used threshold, at the time this guide was written, is "More than 16 tokens". Like many others, it can be customized in the program Options, reachable via the program main menu Tools -> Preferences... entry.
Over time with changes in the SEO world, we update default threshold to keep you up-to-date with the ever evolving market.

Note: the Histogram bottom panel gives you an overall view on how the "number of tokens" dimension is distributed among the whole pages in the crawl session.

Upper-Case file name

Entry type: Warning Warning

On the Internet writing in upper case is equivalente as shouting. Users have the same perception when the text is the page name in the URL.
You can use this report to spot all web pages having the file name part of the URL all in upper case.

Encoded path

Entry type: Warning Warning

This report lists all pages where the Path part of the URL needs encoding in order to travel on the wire.

Due to a limit of the HTTP protocol, a URL when "running on the wire" can only contain ASCII characters (i.e. Western characters with no diacritics). URL encoding replaces special characters (diacritics, spaces, non-Western alphabet letters, ...) with their Escape sequence.
The encoding used is called percent-encoding.

There nothing wrong per-se using encoded paths, especially if you deal with non-Wester alphabets and want to optimize URLs as well. Nevertheless many SEO professionals and agencies working with the Western market have policies to avoid using them, and need tools to detect their usage.

Our recommendation is using percent-encoding in your URLs if you want to optimize them when dealing with non-Western languages.
We recommend avoiding adding spaces (encoded with the %20 sequence) in URLs, because URLs can be copied and pasted in other programs (e.g. e-mail clients) that automatically try to create a link if they recognized the typical URL structure, but they would fail truncating the link URL at the space point. Use the "hyphen" sign in their place.

An example of fair usage of percent-encoding:

Path: /о-компании (it translates from Russian as /about-company)
Path encoded: /%D0%BE-%D0%BA%D0%BE%D0%BC%D0%BF%D0%B0%D0%BD%D0%B8%D0%B8

An example of bad usage of percent-encoding:

Path: /my page (notice the white space)
Path encoded: /my%20page

Encoded file name

Entry type: Warning Warning

This report lists all pages where the file name part of the URL needs encoding in order to travel on the wire.

Due to a limit of the HTTP protocol, a URL when "running on the wire" can only contain ASCII characters (i.e. Western characters with no diacritics). URL encoding replaces special characters (diacritics, spaces, non-Western alphabet letters, ...) with their Escape sequence.
The encoding used is called percent-encoding.

There nothing wrong per-se using encoded paths, especially if you deal with non-Wester alphabets and want to optimize URLs as well. Nevertheless many SEO professionals and agencies working with the Western market have policies to avoid using them, and need tools to detect their usage.

Our recommendation is using percent-encoding in your URLs if you want to optimize them when dealing with non-Western languages.
We recommend avoiding adding spaces (encoded with the %20 sequence) in URLs, because URLs can be copied and pasted in other programs (e.g. e-mail clients) that automatically try to create a link if they recognized the typical URL structure, but they would fail truncating the link URL at the space point.

An example of fair usage of percent-encoding:

Path: /о-компании (it translates from Russian as /about-company)
Path encoded: /%D0%BE-%D0%BA%D0%BE%D0%BC%D0%BF%D0%B0%D0%BD%D0%B8%D0%B8

An example of bad usage of percent-encoding:

Path: /my page (notice the white space)
Path encoded: /my%20page