Manual: Page language directives
The feature "Page language directives" of Visual SEO Studio, documented in detail.
Page language directives
While both Google and Yandex support hreflang tags, Bing doesn't and uses instead the 'content-language' meta tag or HTTP header. Other tools (software translators, screen readers for visually impaired people, and so on) look for the "lang" attribute in the HTML. So there are several language directives a webmaster has to ensure they match.
Example of page with conflicting language directives
The alternate/hreflang attribute is a help to the search engine to decide which of a set of "coupled" pages to show in a localized SERP, but Google and other search engines do not use it to understand the language family. Language detection is what they use.
Page language directives view in Visual SEO Studio
Keeping all these elements in mind, we decided to provide a tool to show for every single page all the possible language directives to see if they match. And we added a state-of-the-art language detector to test the language of title, description, and main content.
Toolbar
Page URL
The address of the selected page.
Column headers
Self-referring hreflang tag
The self-referring hreflang
tag, be it from the HTML or the HTTP header, or the XML Sitemap.
An example of the tag in the HTML syntax is:
<link rel="alternate" hreflang="en" href="...self-referring URL..." />
where with "self-referring URL" we mean the very same URL of the page containing the tag.
'lang' attribute in HTML tag
The lang
attribute of the HTML tag (or xml:lang
in case of XHTML pages).
An example of use of the attribute in the HTML5 syntax is:
<html lang="en">
An example of use of the attribute in the XHTML syntax is:
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
If both lang
and xml:lang
attribute are used, the latter has priority; the former will be discarded.
The lang
attribute is evaluated by Bing to understand the page language, along with the content-language
tag and the content-language
HTTP header.
'content-language' meta tag
The content-language
meta tag is a "poor man's" way to express a content language when you cannot use the HTTP response header.
An example of the tag in the HTML syntax is:
<meta http-equiv="content-language" content="en">
The tag has been deprecated in HTML5 specifications, in favour of the new lang
attribute.
The content-language
tag is evaluated by Bing to understand the page language, along with the lang
attribute and the content-language
HTTP header.
'content-language' HTTP header
The content-language
HTTP response header expresses the language of the page content.
An example of the HTTP header is:
Content-Language: en
The content-language
HTTP header is evaluated by Bing to understand the page language, along with the content-language
tag and the lang
attribute.
Title Language (auto-detected)
The language used within the title
tag, automatically detected by the program by examining its text.
Language detection can be inaccurate. It needs at least five words to attempt a recognition, the more the better. Long texts can be recognized with greater precision, but take more computation time.
The entire title text is used for the test. The title
tag often contains a trailing branded name that could deceive the language recognition system, so the result is more precise when applied to longer titles.
Other times titles can contain mixed languages that could again deceive the software.
Description Language (auto-detected)
The language used within the meta description
tag, automatically detected by the program by examining its text.
Language detection can be inaccurate. It needs at least five words to attempt a recognition, the more the better. Long texts can be recognized with greater precision, but take more computation time.
The entire meta description text is used for the test. Usage of terms of mixed languages could deceive the software.
Content Language (auto-detected)
The language used within the page main content, automatically detected by the program by examining its text.
Language detection can be inaccurate. It needs at least five words to attempt a recognition, the more the better. Long texts can be recognized with greater precision, but take more computation time.
Since the program could potentially examine hundred of thousands or more texts, for performance sake only the first twenty words found within the body
tag inner HTML are examined, which according to our tests is a good trade-off between performance and precision.
Content Sample
The sample of text made of the first twenty words found in the body
tag inner HTML, used to automatically detect the language of the page main content.
The text is shown to let you judge whether it is representative of the language, useful in case of mismatches with other languages detected from other page parts.