On-site SEO issues of Inbound.org

Does a bookmarking web site need on-site SEO? If you think so, than read what they are doing wrong, why, and how to correct it.

Introduction: What is Inbound.org?

Inbound.org is a bookmarking site dedicated to the world of inbound marketing. Launched on February 2012, the site experienced from the beginning an average of 44.000 visits per month (source: http://inbound.org/about).

The main purpose of a bookmarking site is giving visibility and driving traffic to bookmarked sites. Being indexed by a search engine and making it easier for a search bot to find links to crawl is probably a side product, I don't know how much it scores among Inbound.org goals.

Yet, if driving also search traffic to the bookmarked sites is a goal - and it seems to be, being all the URLs followed - then on-site SEO should be taken into account too.
What we will examine is how the site fares with common on-site SEO issues.

Canonicalization issues: www vs non-www

The first test I usually perform is the www. vs non-www. issue.
Surprisingly, site pages respond to both the URL versions, e.g.:

http://www.inbound.org/ http://inbound.org/

both point to the home page. The same happens for every site url.

Why this is wrong:

A search engine might find an inbound url in the form of www.inbound.org and index a copy of the page as a different resource.
Potentially every page could be indexed twice (in the case of Inbound.org, internal links are all absolute in the canonical non-www form, so a non-canonical outbound link could potentially harm the linked page only).
The link juice of inbound URLs in the www. form is probably wasted.

What they could do:

301-redirect all incoming www. requests to the non-www version
use canonical link meta tag on every page (they don't)
set the non-www version as the preferred one on Google Webmaster Tools (I bet they did it already)

Note: a search for [site:www.inbound.org] returns no entries, so at the moment the issue today is not causing an index duplication problem, at least not yet.

Canonicalization issues: multiple urls pointing to the same resource

I've found at least eight different URLs pointing to the Home Page content:

http://inbound.org/ http://inbound.org/articles/all/hot http://inbound.org/articles/all http://inbound.org/articles http://www.inbound.org/ ...and all other www. variants

Googling for [site:inbound.org] and [site:inbound.org/articles] you can see the non-www versions all are indexed!

Why this is wrong:

the Page Rank of the page is split among the URLs variants (this is an over-simplification, but you got the idea)
the link juice passed to the bookmarked URLs is diluted

What they could do:

301-redirect all incoming requests to the canonical version
or at least the use canonical link meta tag on every page variant

Crawl budget issues

Searching for [site:inbound.org] you are returned a good bunch of results:

Inbound.org indexed pages on Google

We all know the total returned is completely bogus, but when such an high number is returned, you've better check your site structure!

Examining their pagination, you can quickly gather they have about 610 pages dedicated to article links, 7 tools pages, 9.000 users... all index-able pages should hardly amount to a third.

It's time to have a closer look at their link structure with my favorite Free SEO Audit Software and Web Spider!

I'm no way going to crawl more than 30 thousand URLs, nor I'm going to DOS their site with tens of requests per second; 500 URLs will probably be enough to assess the site, politely crawling with a courtesy delay of 10 seconds (they did not specify a Crawl-Delay in their robots.txt file).
Let's take a nap while Visual SEO Studio does its job for me... ...here we are:

Visual SEO Studio highlights the "G-Time", i.e. the time it would take to googlebot to crawl the site. Simulated crawl-delay was set to the maximum you can set via GWT, but even taking a 33s value - more probable for a big web site - one can quickly realize that crawling more than 30.000 pages (and not the first 500 of our test) would take googlebot many days.

My guess is that the site "crawl budget" is not optimized, and googlebot wastes too much time crawling "thin content".

Crawl budget issue I: flag links

A first glance at the crawl view, where pages are shown in order of visit with link depth, and we can notice the high number of HTTP issues found within the limited sample of 500 requests:

for each bookmarked url, a spider also follows the "flag" link!

Why this is wrong:

The link points to content not to be indexed, letting a search bot spider it wastes crawl budget (hundreds hours of spidering) that could be dedicated crawling important content, accelerating its indexing.
It also consumes site bandwidth of a high traffic web site
Generally speaking - not specific to SEO - such operation should never be performed with an HTTP GET: a crawl could flag involuntarily all the bookmarks. In our case this doesn't happen because the task is subject to Twitter authentication and the spider is redirected to the home page.

What they could do:

mark the links with the rel="nofollow" attribute
block with the robots.txt the articles/flag path

Crawl budget issue II: article pages

Bookmarked URLs are external links, not followed by Visual SEO Studio, so why did we see an additional URL request per bookmark?
Switching to Index View we see that Inbound.org dedicates a page for each bookmarked link:

Where are the links to such pages?
Another inspection with Firebug clears it out:

It's the comment link which actually points to the dedicated page.

One might thing this is not really an issue, but a design choice: comments might add value and I've checked and are actually indexed, nevertheless...

Why this is, in my humble opinion, wrong:

only a small minority of the bookmarks is commented, in all other cases we end up with 18.000 pages of "thin content" adding no value and, again, costing hundreds of hours of crawl budget.

What they could do:

add the rel="nofollow" attribute to the link..
...except when there already are comments

Duplicate content issues

Let's now have a look at the HTML Suggestions provided by Visual SEO Studio.
The first thing to catch the eye is the number of duplicated titles (130 pages over 500 visited):

Why this is wrong:

Html titles should be always unique within a site, like URLs are.
A title tag has a strong weight on search engine ranking algorithms, it should be used to give the search engine a clue of what the page is about. In this case pages are for the most part category pages, and the only hint the search engine has is the URL.

What they could do:

add a distinctive title tag to the pages worth indexing (and the others as well, also humans matter)
add the robots-noindex meta tag to the category pages
solving duplicate content issues when different URLs point to the same resource
consider adding the rel="nofollow" attribute to some of the views

Conclusions?

This is not an in-depth SEO audit; one could go on and investigate pagination issues, the use of querystring parameters, load time... but that's not the point of the article.

Behind Inbound.org there is people who really gets SEO, they probably already are aware of the SEO issues I outlined, but as the saying goes, "the shoemaker's son always goes barefoot".

And You, are you neglecting the SEO love your assets deserve?

Update (November 4th, 2012):
Discussion is ongoing at Inbound.org article page, where the article gained some popularity.