Manual: Page language directives

The feature "Page language directives" of Visual SEO Studio, documented in detail.

Page language directives

While both Google and Yandex support hreflang tags, Bing doesn't and uses instead the 'content-language' meta tag or HTTP header. Other tools (software translators, screen readers for visually impaired people, and so on) look for the "lang" attribute in the HTML. So there are several language directives a webmaster has to ensure they match.

Example of page with conflicting language directives

The alternate/hreflang attribute is a help to the search engine to decide which of a set of "coupled" pages to show in a localized SERP, but Google and other search engines do not use it to understand the language family. Language detection is what they use.

Page language directives view in Visual SEO Studio

Keeping all these elements in mind, we decided to provide a tool to show for every single page all the possible language directives to see if they match. And we added a state-of-the-art language detector to test the language of title, description, and main content.

Toolbar

Page URL

The address of the selected page.

Column headers

Self-referring hreflang tag

The self-referring hreflang tag, be it from the HTML or the HTTP header, or the XML Sitemap.
An example of the tag in the HTML syntax is:

<link rel="alternate" hreflang="en" href="...self-referring URL..." />

where with "self-referring URL" we mean the very same URL of the page containing the tag.

'lang' attribute in HTML tag

The lang attribute of the HTML tag (or xml:lang in case of XHTML pages).

An example of use of the attribute in the HTML5 syntax is:

<html lang="en">

An example of use of the attribute in the XHTML syntax is:

<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">

If both lang and xml:lang attribute are used, the latter has priority; the former will be discarded.

The lang attribute is evaluated by Bing to understand the page language, along with the content-language tag and the content-language HTTP header.

'content-language' meta tag

The content-language meta tag is a "poor man's" way to express a content language when you cannot use the HTTP response header.
An example of the tag in the HTML syntax is:

<meta http-equiv="content-language" content="en">

The tag has been deprecated in HTML5 specifications, in favour of the new lang attribute.

The content-language tag is evaluated by Bing to understand the page language, along with the lang attribute and the content-language HTTP header.

'content-language' HTTP header

The content-language HTTP response header expresses the language of the page content.
An example of the HTTP header is:

Content-Language: en

The content-language HTTP header is evaluated by Bing to understand the page language, along with the content-language tag and the lang attribute.

Title Language (auto-detected)

The language used within the title tag, automatically detected by the program by examining its text.

Language detection can be inaccurate. It needs at least five words to attempt a recognition, the more the better. Long texts can be recognized with greater precision, but take more computation time.
The entire title text is used for the test. The title tag often contains a trailing branded name that could deceive the language recognition system, so the result is more precise when applied to longer titles.
Other times titles can contain mixed languages that could again deceive the software.

Description Language (auto-detected)

The language used within the meta description tag, automatically detected by the program by examining its text.

Content Language (auto-detected)

The language used within the page main content, automatically detected by the program by examining its text.

Language detection can be inaccurate. It needs at least five words to attempt a recognition, the more the better. Long texts can be recognized with greater precision, but take more computation time.
Since the program could potentially examine hundred of thousands or more texts, for performance sake only the first twenty words found within the body tag inner HTML are examined, which according to our tests is a good trade-off between performance and precision.

Content Sample

The sample of text made of the first twenty words found in the body tag inner HTML, used to automatically detect the language of the page main content.

The text is shown to let you judge whether it is representative of the language, useful in case of mismatches with other languages detected from other page parts.