Manual: HTTP issues

The feature "HTTP issues" of Visual SEO Studio, documented in detail.

HTTP issues

In this tab sheet are listed all HTTP issues found during when crawling the web site.
With "HTTP issues" we mean all server answers to HTTP requests made but spider which did not result in a HTTP 200 OK response code.

Issues are present in the form of Errors ( Error ) and Warnings ( Warning ), it is possible to show or hide Errors and Warnings by selecting or deselecting the related toggle buttons.

Clicking on an issue it will be automatically selected the correspondent page in the table (or in the tree according to the type of view) of the main tab document.

Toolbar

Errors

By selecting or deselecting the Errors toggle button you can show or hide errors.
The button also reports the number of HTTP errors found.

Warnings

By selecting or deselecting the Warnings toggle button you can show or hide warnings.
The button also reports the number of HTTP warnings found.

Issues found

The number of Errors and Warnings found.

Context menu

Right clicking on an issue row a contextual menu will let you to copy or navigate the page containing the issue, or navigate the Referrer pages, and much more:

HTTP Issues context menu

Context menu command items are:

Copy URL
Copies in the clipboard memory the URL of the selected resource.
Browse URL
Navigates with the default browser the URL of the selected resource.
Go to Referrer URL
Selects in the main view the node related to the "Referrer" URL, i.e. the address where the spider found the link to the resource.
Go to Redirection URL
Selects in the main view the node related to URL where the resource is redirected to.
Browse Referrer URL
Navigates with the default browser the "Referrer" URL, i.e. the address where the spider found the link to the resource.
Browse Redirection URL
Navigates with the default browser the URL where the resource is redirected to.
Find pages linking to the URL
Opens in a new Tabular View all pages linking to the resource.
Find all links to the URL
Opens the Links Inspector to locate all links pointing to the resource.
Find referrer link to the URL
Selects the right pane DOM view and there hightlights the HTML node where the spider found the link to the resource.

Column headers

Icon

The icon column gives an indication on the state of the crawled resource.

Used icons and their meaning are:

if we got an error while exploring a resource (e.g. when a resource is not found producing a 404 error)
the warning icon is not necessarily an error, it means the result of the exploration of the resource needs some attention

Prog. #

Indicates the progressive number during the crawler exploration.

Thanks to this progressive number you can get an idea on how a search engine spider would explore your website, a piece of information you should take into account when dealing with Crawl Budget issues, typical of large websites.
For example, you may realize the spider takes exploration paths towards content areas you repute less important compared to the ones you think more strategical; in such case you should intervene on the website link structure.

Note: the crawl progressive number is an approximation:
Visual SEO Studio uses an exploration pattern called Breadth-first, which is demonstrated to be the most efficient in finding important contents in absence of external signals; the actual exploration order can slightly change because of the parallelization used for speed reasons during the crawl process. Using a single crawl thread you could make it strictly repeatable.
Search engines exploration patterns are on their part high asynchronous, and exploration priority is weighted by - in Google case - the resources PageRank which could be inflated by external links.

Status

The HTTP response code received from the web server upon requesting the resource.

Response codes can be summarized in five standard classes:

1xx Informative response – request was received and its processing is going on (it is very unlikely you will ever see a 1xx response code)
2xx Success – request was received successfully, understood, accepted and served (it is the response code you normally want to see).
3xx Redirection – the requested resource is no longer at the address used
4xx Client Error – request has a syntax error or cannot be honored
5xx Server Error – web server were unable to honor an apparently valid request

Some very common answers are for example 200 (OK - the standard response for HTTP requests successfully served), 301 (Moved Permanently - used when a page URL is changed and you don't want to "break" external links to the old URL nor you want to lose the page indexation on search engines and want to preserve its PageRank.

(Redirect) do work as follows: when an old URL is requested, the web server answers the client (a browser, or a search engine spider) with a HTTP code 3xx to report the address has changed, and adding in the HTTP header the new address. The browser will then have to request with a new HTTP call the resource to the new address, and in case of permanent redirect could remember for the future the redirection in order to avoid making a double call when the link to the old address will be clicked again.

Redirects can be implemented on the server side using several methods, depending on the used technology and the platform the web server is running on. For example by configuring the .htaccess file on Apache web servers with generic or specific rules; or with dedicated plugins in a WordPress installation; or in case of web sites in ASP.NET technology with rules expressed in the web.config file, or directives set in the single page, or in the logic of the used CMS engine.

Having redirects is not an error per-se, but if they are detected - as it normally happens - during a normal site crawl navigating internal links, it is sign that such internal links were not updated after the URLs change. It is recommended to update the internal links with the new URLs in order not to slow down user navigation experience and not to waste the crawl budget allotted by the search engine.

Particular attention should be given to the 4xx response codes, which Visual SEO Studio rightly reports as errors.
The 4xx codes you will stumble upon are usually 404 (Resource not found) and the nearly identical (Resource no longer existing). Their presence is symptom of a broken link that should be corrected, because user and search engine can not reach the link destination page.

5xx response codes are errors occurred on the web server when it was trying to build the resource to return to the browser or the spider.
They could be a temporary issue, but they should normally not ignored, better reporting them to the developer and investigate on the server side. 5xx errors are a very bad user experience, make visitors abandon the website, and potentially can cause de-indexation by the search engines if repeated over time.

For a more in-depth description of HTTP response codes you can consult the following page on Wikipedia: HTTP status codes

Status Code

The textual description of the HTTP response code received from the web server upon requesting the resource.

URL

Uniform Resource Locator, the resource address.

For a better search engine optimization it is preferable having "friendly" URLs (i.e. URLs anticipating the page content) and not too long.

Authority Name

The combination of protocol, host name and, if different from the default value, port number.

An important piece of information you can see form the Authority Name for example is whether the URL is protected by the secure HTTPS protocol.

It could also be handy having the authority name shown in case of explorations of URL lists or of sites with more sub-domains.

Path (encoded)

The resource path, with URL encoding when required.

Due to a limit of the HTTP protocol, a URL when "running on the wire" can only contain ASCII characters (i.e. Western characters with no diacritics). URL encoding replaces special characters (diacritics, spaces, non-Western alphabet letters, ...) with their Escape sequence.

Many URLs are only composed of ASCII character, and since they do not need encoding, the encoded and decoded version of their path look the same, but let's have a look to an example URL written in Cyrillic:

Path: /о-компании (a typical URL path for a company page, it translates from Russian as /about-company)

Since HTTP protocol cannot convoy non-ASCII characters, in order to permit these human-readable URL paths the characters are encoded by the browser transparently before sending them on the wire to request the resource to a web server, transforming the example path as:

Path (encoded): /%D0%BE-%D0%BA%D0%BE%D0%BC%D0%BF%D0%B0%D0%BD%D0%B8%D0%B8

The encoding used is called percent-encoding
Visual SEO Studio by default shows URLs and Paths in their decoded, human-readable form, but user might want to see the encoded version to investigate URL issues.

Path (decoded)

The resource path (URL decoded, thus in human-readable form).

Redirected To

The HTTP header Location, used with 30x redirect status codes.
In case of non ASCII characters in the URL, it is shown with URL encoding.

Upon a 30x response code the browser will navigate the URL stated in the Location read from the HTTP header.

Redirected To (decoded)

The HTTP header Location, used with 30x redirect status codes (URL decoded).

Referrer URL (decoded)

The complete URL of the resource where the link to present resource was followed (URL decoded).

Crawl paths taken by a bot during a website exploration permit to understand the website link structure.
The Referrer URL is not necessarily the only URL to the resource, just the one Visual SEO Studio spider followed to discover the URL.
You can locate all links to the resource with the context menu entry Find all links to the URL.

Referrer Path

The path of the URL of the resource where the link to present resource was followed.

Referrer Path (decoded)

The path of the URL of the resource where the link to present resource was followed (URL decoded)

Depth

The depth of the page in the site link structure, also known as "link depth", i.e. the number of clicks needed to reach it starting from the Home Page.

Knowing a page depth from the main URL is important because search engines give more or less importance to a page relatively to its distance from the main URL: the closer, the more important it is.
Note: this is a simplification; in the case of Google for example usually the Home Page is the page with greater PageRank (a Google measure to assess the importance of a page, other search engines use similar models), the pages connected with a single link to the Home Page are thus the ones receiving more PageRank.

Furthermore, the greater the distance, the less likely is the page to be reached and explored by the search engine spiders, because of the normally limited Crawl Budget (simplifying: the number of pages a search engine can explore within a certain time slot when visiting a website).

Thus, place the pages you want to give more weight closer to the Home Page.

Link Depth is also important from a user perspective: it would be hard for them to find a content starting from the Home Page if it takes many clicks to reach it.
A common usability rule wanted each page reachable with three clicks or less. This is not always possible in case of very large websites, nevertheless you should choose a link structure that minimizes each page link depth.