Domains and Search Engines Subdomains and Search Engines
What is a Subdomain?
How to distinguish domain parts
Questions and Answers about subdomains
Conclusions

 

Subdomains and Search Engines

Sub-domains always caused great curiosity on those who aim to do search engine optimization. Are they seen as part of the main site, or as distinct web sites? Based upon erroneous assumptions, strategies have been build leading to great wastes, and sometimes damages.

If we take for true the assumption - erroneous most of the times - that a subdomain were seen as a distinct site from the the main site, there will always come up someone thinking things like:
I could create some subdomains, and there add links to the main domain, so that they would count as external links.
Or:
Ouch, my blog/forum/ecommerce/etc is in a subdomain, and the links it receives do not pass authority to my main site. I have to move it to a subfolder!.
(Such reasoning also reveals some confusion about the concept of Google PageRank; let's ignore the issue at the moment).

If you ever found yourself thinking along these lines, keep in mind that most of the times Google treats a sub-domain roughly as if it were a sub-folder.
What does “most of the times” mean?
I'll explain it soon, but in short: if you have control over both the main domain and the sub-domains, almost certainly your sub-domains will be treated as sub-folders.

What is a Subdomain?

The DNS (Domain Name System) permits to organize sub-domains below your own domain. Strictly speaking, in company.com “company” is sub-domain of the first level domain “com” assigned by ICANN, nevertheless in common usage the term “sub-domain” is used for the domain name part below (at the left of) the assignee's domain. A sub-domain is commonly meant managed and administered by the same entity (company, person, organization, etc.) to whom the main domain is assigned.
To clarify, if company.com is the assigned second level domain, each name below it is a sub-domain: blog.company.com, forum.company.com, store.company.com, www.company.com, etc…
Notice that in the example I also mentioned www. We are so used seeing it that we thing about it main domain alias, but both historically and technically it's not. It is a well established convention.

Given the complete DNS name, it's not always immediate to understand which part is the registered domain, and which is an internal subdomain. Or better, it is not immediate understanding which is the registered entity.
Let's see some example:

  • in subdomain.company.com, "subdomain" - a third level domain - is part of company.com, both are managed by the same entity/authority;
  • in company.co.uk, "company" - a third level domain too - is instead an entity distinct from co.uk, “company” here is equivalent to “company” in the previous point;
  • in sub.company.com, “sub” is integral part of company.com, but..
  • in blogname.blogspot.com it is not so, “nomeblog” is a distinct entity form blogspot.com

No algorithmic solution exists to distinguish them starting only from the complete name and elements position, you need rules and exceptions.
This because some national Domain Registrars permit to register at the third level (or also the second, with restrictions over some cases). The domain co.uk is a well known example.

How to distinguish the subdomain and the registered domain?

The only safe way to distinguish the assignee entities is to have a list with all possible suffixes and exceptions.

Such list exists, it is a public database maintained by Mozilla and published with open license, where all possible cases are detailed: the Public Suffix List.

The Public Suffix List was created to permit browser producers to solve privacy issues when dealing with supercookies.
A couple of examples to better understand:

  • A page in blog.site.com should normally access to a cookie created in www.site.com
  • But a page in company1.co.uk must be prevented to access to a cookie created in company2.co.uk

Both examples concern third level domains, yet the difference is huge.

Browser need to distinguish public suffixes also for other tasks: sort lists of URLs, highlight the main domain in the address bar, etc.


Firefox address bar: the recognized entity is highlighted.
The blog part is not highlighted because correctly recognized as part of the latter.

Browser producers are not the only ones in need to distinguish the part assigned by the Domain Registrar, nowadays many other products have similar needs. For example:

  • Certification authorities that issue SSL/TLS digital certificates have to distinguish between main domain and subdomains in order to issue various types of certificates (single domain, wildcard, multi-domain, etc…).
  • Browsers again need to correctly handle the SSL/TLS certificates of the previous point
  • Search engines need understand whether a subdomain belongs to the same entity of the main domain in order to compute internal domain/site metrics*, pass site-wide penalties, discriminate internal links from external links, permit webmasters accessing to a unified administration console, etc.
  • Some SEO tools need to emulate at best a search engine behaviour. For example, Visual SEO Studio uses the Public Suffix List to accomplish various tasks, the most evident is understanding a site “boundaries” by discriminating external links from internal links.

In the specific case of Google search engine I never found documentation and/or public statements from Google where it asserted they used such database, albeit it is known that Google Chrome browser uses the Public Suffix List.
It makes sense to hypothesize Google uses the Public Suffix List for Google Search as well; alternatively it should maintain its own equivalent list and keep it in sync with the Mozilla public list.
The most probable thing is the do exactly this even for Google Chrome, to handle exception not included into the Public Suffix List.

* John Mueller from Google recently admitted - a little forced - that Google uses a metric that somehow measures a site authority (not to be confused with DA from Moz). That's all we know at the moment. My personal hypothesis is it could be a dimension - may be multi-dimensional, not restricted to a simple number - reflecting the idea of authority as estimated by the AI trained by Quality Raters.

Note: The Public Suffix List uses the notation "public suffix" to indicate what we here refer to as "assigned domain" or "registered domain".

Questions and Answers about subdomains

After having clarified what subdomains are and who administers them, here is a set of questions - many are variations of the same theme - we often are asked to answer.

Is it better to structure my contents in subdomain or in subfolders for Google?

Google ha repeatedly specified that the two solutions are equivalent: subdomains are treated as subdirectories (we are referring to the scenario where they are administered by the same entity, you).


Matt Cutts, 2012: Should I structure my site using subdomains or subdirectories?


John Mueller, 2017: Subdomain or subfolder, which is better for SEO?

You will find on the web case studies attempting to demonstrate how migrating from subdomains to subdirectories (or vice versa) organic visits increased; none of them is sufficiently solid in my opinion.

And how about multi-lingual sites? Are subdomains or subfolders better for Google?

It's the same as the question before: for Google they are equivalent.
A solution based on subdirectories or subfolders is a good solution when each - or nearly each - content is translated in all managed languages. It is also a perfect case to implement alternate/hreflang. It is highly recommended to us generic domain extensions (.com, .org, etc.) because Google bind most part of national extensions to a single nation (e.g. .it for Italy) and would make it harder to compete in foreign SERPs. From Google Search Console it is possible to geo-localize distinct subfolders.
There also is a third solution to mention: registering distinct domains on national ccTld (es. .it, .fr, etc.) in case you wanted a more flexible presence for a specific localization. In such case the targeting is not only for a language, but also for a specific country: Google links most national domains to the single nation, without permitting to set a different one. The solution also implies you have to take the administrative burden of registering more ccTld.
A mixed approach is also possible, with some languages hosted in subfolders or subdomains in a generic top level domain site, and some language/nations managed with dedicated sites.

OK, for Google subdomains or subfolders are equivalent, but which solution do you suggest?

As always, “it depends” is the universal answer. Generally speaking there are practical differences: subdomains require a network system administrator, obey each to its own robots.txt, and not all SSL/TLS certificates work with subdomains (other than the conventional www.), thus in most cases I suggest using subfolders for simplicity sake. Not for an hypothetical preference by Google.
Yet there are cases where subdomains could be a correct and better choice, for example to distribute on another server the load of sizable section, logically separated and with high traffic and resource demands, like an highly trafficked e-commerce store or a forum.

If I receive external link to my subdomain, will the main domain benefit from them?

If there are internal links pointing from the subdomain to the main domain, PageRank will flow as in any other link. PageRank is not a metric related to a “site”, it is related to each single page, and can flow through links to other pages, be them internal or external.

Is it true that if I have the blog/forum/store/etc… in a subdomain, the main domain would not benefit from the “juice” from links on external sites pointing to the subdomain?

This is basically a close variant of the previous question. No, it's not true. PageRank flows through the links in the subdomain pointing to the main domain, which thus benefits from it.
What might change, independently from the adopted solution, is the internal link structure. if the blog/forum/store/etc. section has links toward the main domain, the PageRank will flow toward it; if there were no link - or they were nofollow - no.

Does Google sees subdomains as stand-alone sites, or as part of the main domain?

Ah, I see you read almost nothing of the present article!
If you are posing yourself the question, i.e. if both domain and subdomains are under your control, Google is perfectly capable to understand it and consider the whole as a website.
Domains permitting to users to have personal stand-alone spaces are - or should be - registered in the Public Suffix List.

I've been told that external inbound links are important for SEO, and that subdomains are seen as external sites. If I created many subdomains and added in them links to my main domain, would I get ranking benefits as if they were links from other domains?

One of the two things you have been told is wrong: it is true that external inbound links are important, because they carry PageRank and pertinence to the anchor-text; it is not true that subdomains are seen as external sites. In your case, if you own both the main domain and the subdomains, they are all considered belonging to the same entity and subdomains are treated as paart of the main site, as if they were subdirectories. Links from the subdomains would be seen as internal links.

OK I understood that subdomains are seen as part of the main site if I do not include my domain in the Public Suffix List. But what if I registered my domain in the Public Suffix List and created many subdomains with links to the main domain, they would be seen as external links; would I obtain ranking advantages?

This is a more interesting question.
It should be first reminded that PageRank flows from URL to URL, independently of the site they belong, and the new subdomains would be like new sites with the minimal amount of PageRank assigned by the algorithm (exactly as if they were internal pages).
Keep in mind that with the links in the subdomains seen as external links, you also have to be careful to the percentage of exact-match because Penguin filters would be active.
In terms of PageRank value to get an advantage you should receive external inbound links to your subdomains, you'd thus have external links with some PageRank. The link building effort would be the same as to have them pointing to the main domain in many cases.
You might also try to link only a few subdomains from the other subdomains, and from the few ones then link to the main domain. Or more complex variants..
The new links to site.com would all be seen as from sites sub1.sito.com, sub2.sito.com, etc., I cannot exclude that Google already had in place ways to discard benefits form such links in subdomains; it is a common scenario in case of suppliers of web spaces for free or very low cost.
This topic would surely merit further experimental investigation. Keep in mind if you wish to do black hat this way that who enlists in the Public Suffix List does it in plain daylight.

I have main domain and subdomain linking each other. Do I risk a penalty?

Again: the subdomain is seen as part of the main site, as if it were a subdirectory, and the links between subdomain and main domain are thus seen as internal links. Penalties for link exchanges (or link rings) are measures to contrast black hat techniques among separate sites, they do not exist for internal links.

Is it true that if I use a subdomain, it is seen as a stand-alone site and I have to restart from scratch for the whole SEO and link building?

Analogous to the previous answer: no, a subdomain is seen in the vast majority of cases as part of the main site. Also notice that PageRank flows towards URLs, no matter if they are internal or external.

If in the subdomain I had links toward the main domain, should I be aware of exact-match links to avoid Penguin like penalties?

The subdomain is seen as part of the main site (except in rare instances where they are administered by two distinct entities), and for internal links Google has stated clearly it's not an issue if link anchor is an exact-match.
Exact-match links can arise problems only in case of external links because they might suggest they were created artificially to alter PageRank. Internal links are per definition always spontaneous and not caused by spam, at worst they might cause problems of keyword stuffing like normal internal text.

In case of penalty on a third level domain, do I risk to also be penalized for the second level domain?

It depends on they penalty. Google is known to have imposed highly "sectorial" penalties, concerning only parts of a site, or single pages/URLs.
Nevertheless, in my experience in general manual penalties usually affect the entire site, making it completely disappear from the SERP.
Of course we talking about the usual case where the subdomain is seen as part the the main site on the main domain, because owned by the same entity.

Are there domains offering their users web space in subdomains, not listed in the Public Suffix List?

Unfortunately yes. Every domain offering independent user space should register in the Public Suffix List, mainly to avoid security problems; but this does not always happen, likely because their owners ignore it.
One of the most notable cases is *.wordpress.com, not listed in the Public Suffix List.
In the images below you can see how Firefox - which uses the Public Suffix List - treats differently a site hosted under wordpress.com compared to a site hosted under blogspot.com when highlighting the main domain (recognized entity) in the address bar.


The part highlighted as recognized entity è only wordpress.com, the subdomain isnot seen by Firefox as an entity independent from wordpress.com


Different is the analogous case with blogspot.com, registered by Google in the Public Suffix List

I hope no one will be tempted to experiment vulnerabilities with supercookies; I don't know if and how wordpress.com deals with the problem.
Registration policies for the Public Suffix List impose that only the domain owner could require inclusion within the exception list. My hope is this article could lead wordpress.com users to solicit the request of inclusion.

Is it true that it is better keeping the forum in a subdomain because being it is more easily subject to spam it would negatively influence the main domain if it where hosted in a subfolder?

Google is quite efficient in understanding that the entity owning the website and the forum are the same. To it subfolder or subdomain are the same in this case..
In case of sections with UGC (User Generated Content) the correct solution is to administer them with antispam tools and having contents approved by moderators. If you wish to host the forum on a subdomain, this could be a correct solution for other reasons (e.g. to balance load on another server).

Conclusions

Someone could find this article controversial: searching on the Internet you will find a lot of opinions about sub-domains being considered different sites from the registered domain. I tried to demonstrate that this is most of the time wrong, and explained how browsers distinguish an administrative entity from the host name part of a URL.

The Public Suffix List exists since years, but I found the large majority of SEO professionals never heard of it. I read some who even speculated a Search Engine could determine a sub-domain is considered if the main domain "had many of them"; other - more sensibly - though about a complex artificial intelligence.

You also might find downloadable Excel spreadsheets to get the domain name from the URL - i.e. getting only the main site part - something you can't really accomplish with a simple Excel formula. Those spreadsheets can usually handle just a very limited case of all the possible ones: every national domain name registrar has its own set of policies and exceptions that change over time.

While we have no insights on what a search engine like Google actually does to treat sub-domains, all answers in this article are compatible with public declarations given by Google employees and representatives.
We know the Public Suffix List is used by Google for its Chrome browser, and it makes perfect sense Google used it also for its search engine to target similar problems. Some notable cases like wordpress.com are not included in the list (and only a domain owner could require an amendment) so it's very likely that search engines could keep an extended copy of the list internally, keeping it updated with the official one.

The main purpose of the present article is to spread knowledge among the SEO community and correct mistakes and false myths which could lead to wrong business decisions. I hope at least to have succeeded on making a dent and caused the curiosity to investigate further about the Public Suffix List and its applications

And what's your take? Do you have better insight knowledge on the subject?