Auditing a HTTP to HTTPS migration

You installed an SSL/TLS certificate, now it's time to test it and find what prevents the "Secure" label to appear. Learn how to audit a HTTP to HTTPS site migration.

Migrating a website to HTTPS means all pages URLs are changed. There are plenty of good guides and check lists out there explaining how to prepare the migration. One above all, we recommend Aleyda’s The HTTP to HTTPs Migration Checklist.

What you’ll find here is a description of how you can find all the potential issues of a HTTP to HTTPS migration, using Visual SEO Studio.

I suggest you creating a new project dedicated to the site migration.

Crawl your HTTP version first

If you didn’t set the 301 redirects, take the chance to make a full crawl of your site with the SEO audit tool spider. Visual SEO Studio saves crawl sessions automatically as it goes, so you’ll always find it for later inspection (in case the redirects are already in place, worry not, you’ll just have to do some more work later).

Craw View, before migrating to HTTPS
Craw View, before migrating to HTTPS

If there are some on-site issues (e.g. broken links leading to 404 or 30x pages), better fix them now.

Setting all the 301 redirects

I’m not going to teach anything new here: you need to redirect every possible call to old http:// URLs to their new https:// version, using a HTTP 301 “Moved Permanently” status code.
You want to do it for several good reasons (prevent double indexation and the two versions of each page competing with each other, preserve the PageRank assets of the old URLs transferring it to the new URLs, etc...)
How to set up the proper 301 redirects is beyond the scope of this guide. It depends on your web server platform (Apache, Nginx, IIS), on your CMS, the server side technology, and the possible limitation imposed by your host. I assume you already covered this.

Crawling your new HTTPS site

Now repeat the website exploration, this time starting from the https:// address.
Desired scenario it seeing the crawled structure matches exactly 1:1 the one of the http:// crawl. This will not likely happen at the first attempt, and you’ll probably see something like this:

Crawl View with mixed http and https items
Crawl View with mixed http and https items

Locating and fixing all internal links still pointing to the HTTP version

See all those items under the http:// property?
They all come from old hard-coded absolute links the SEO spider found pointing to the old version.

Note: In the example case they were not explored as the http://[DOMAIN]/robots.txt URL returned a value unexpected for robots.txt files and we chose the (non default) crawl option to consider the case as a Disallow.

There are several ways to locate those broken links.

Locating old link, first method

The first one is trivial and quick, I suggest you to use it first:
Right-click on each item and select the entry “Go to Referrer URL”.

Context menu Go to Referrer URL
Context menu “Go to Referrer URL”

The program will highlight the page where the bot found the link with the first occurrence of the wrong URL.
You can then expand the “Page links” bottom pane, and locate (sorting by URL or using the “Find in grid...” option) the row with the link you searched for.
Then you can invoke the “Show in DOM” (or alternatively “Show in code”) option; the culprit link will be highlighted in DOM View to help you understand where to fix it. Fix your template or page in your CMS, and repeat for all other cases.

Context menu Show in DOM
Context menu “Show in DOM”

DOM View, with the link highlighted
DOM View, with the link highlighted

Locating old link, second method

The first shown method will only catch the first occurrence found of each URL; most of the times it will be a template link so you fix there, you fix it anywhere.
For further refinements you might use the “Find in-links” entry to find them all.

Context menu 'Find in-links'
Context menu “Find in-links”

The command will inspect all pages within the crawl session and return a list of those linking to the old URL.
From there you can continue as described for the first method.

If your site is not too big, I suggest you performing incremental crawls to avoid seeing things you already fixed.
You will soon fix all old links within the site templates, and reduce the content links to a minimum.

Locating old link, third method

You can locate in one shot all pages linking to old internal http:// URLs if you have the Professional Edition (or an active 30-day Trial) of Visual SEO Studio, leveraging the powerful Data Extraction engine.

Data Extraction, finding all old links
Data Extraction, finding all old links

Open the Data Extraction tab and create a new set by clicking on the New button.

You need to add three extraction columns:

Column name: “A http www”
XPath to content: “//a[starts-with(@href, 'http://www.example.com')]/@href”
What to extract: “InnerText”
Extract only first element: false

Column name: “A http non-www”
XPath to content: “//a[starts-with(@href, 'http://example.com')]/@href”
What to extract: “InnerText”
Extract only first element: false

Column name: “A https non-www”
XPath to content: “//a[starts-with(@href, 'https://example.com')]/@href”
What to extract: “InnerText”
Extract only first element: false

Warning: remember to substitute the domain name “example.com” with your own domain name!

Give the set a name – e.g. “links to non httpswww” – and save it for future use.

Click on the “Extract Data” button, and all wrong links will be extracted.

Here too you have the “Show in DOM” and “Show in code” commands at your disposal.

If you prefer, you can export the extracted data to Excel or CSV file:

Data Extraction, Export to Excel
Data Extraction, Export to Excel

Checking whether the redirects are working properly

Remember you crawled your http:// site before adding the redirects (if you didn’t, don’t worry and please follow me)? Time to audit those redirects now.

Note: you’ll need the Professional Edition now, or an active Professional Trial.

Open your old crawl session in Tabular View, ensure the column “URL” is visible and export to Excel its items (we already have seen how to export to Excel, works the same on every Visual SEO Studio grid; Tabular View also sports an easy to spot button).

With Excel (or OpenOffice, or LibreOffice) copy all cell values (but the header) of the “URL” column.

Now launch the “Crawl URL List” command (available in the Visual SEO Studio Professional Edition)

Crawl URL list to test redirections
Crawl URL list to test redirections

Make sure you have the “For robots.txt file, treat a redirection to [other domain]/robots.txt as 'full allow”” option enabled.

Click the “Crawl” button.

The expected outcome is all pages should be return a HTTP 301 status code, pointing to the equivalent https:// version.

Note: in case you don't have a pre-migration crawl session, you can always attempt to produce the old URL list using the new HTTPS crawl: export to Excel, find and replace all instances of "https://" with "http://" in the URL column, copy the column content and paste it in the "Crawl URL List" (the program will take care of removing duplicated entries).

Locating and fixing all mixed-content pages

An important aspect to mind of after a migration to HTTPS is avoiding mixed content.

What is mixed content?

When a page (with an https:// address) is serving embedded content – images, CSS files, scripts… - partially from http:// addresses, we are in a case of mixed content.

Why don’t we want pages serving mixed content?

See the symbol near the URL in the address bar?

HTTPS address with mixed content
HTTPS address with mixed content

You might think it indicated a http:// site, but it actually is the https:// version.
What? No “lock” symbol and “Secure” label?
No dude: click on the “i” symbol, and you’ll see:

HTTPS address with mixed content, detail message
HTTPS address with mixed content, detail message

Click F12 (developer tools), select the “Network” tab, add the “scheme” column, refresh the page, and you’ll see there actually are contents served under http:// URLs.

Chrome developer tools show mixed content for a single page
Chrome developer tools show mixed content for a single page

If you want to see the comforting lock symbol with the “Secure” label, you must get rid of mixed content and serve everything under HTTPS.

HTTPS address with no mixed content, with the reassuring lock symbol
HTTPS address with no mixed content, with the reassuring lock symbol

F12 is handy for a single page, but how to bulk check and find all mixed content pages?
Luckily, you have a powerful tool to detect all pages with mixed content: Visual SEO Studio (you need again the Professional Edition, as we’ll be using Data Extraction again):

Open the Data Extraction tab and create a new set by clicking on the New button.

You need to add four extraction columns:

Column name: “IMG”
XPath to content: “//img[starts-with(@src, 'http://')]/@src”
What to extract: “InnerText”
Extract only first element: false

Column name: “JS”
XPath to content: “//script[starts-with(@src, 'http://')]/@src”
What to extract: “InnerText”
Extract only first element: false

Column name: “LINK tag (CSS, canonical)”
XPath to content: “//link[starts-with(@href, 'http://')]/@href”
What to extract: “InnerText”
Extract only first element: false

Column name: “FORM action”
XPath to content: “//form[starts-with(@action, 'http://')]/@action”
What to extract: “InnerText”
Extract only first element: false

No need to indicate the exact domain here, as we want to locate all http:// contents, independently by the source.

Data Extraction, finding mixed content
Data Extraction, finding mixed content

Give the set a name – e.g. “Mixed content” – and save it for future use.

Click on the “Extract Data” button, and URLs causing mixed content issues will be extracted.

Proceed fixing the issues as you did before with the wrong links; rinse and repeat.

Audit XML sitemaps and robots.txt

Once you switch your site to HTTPS you also have to ensure your XML Sitemaps only list https:// URLs. Ditto for your robots.txt file: if it lists Sitemaps, ensure it points to the HTTPS versions.

You can inspect your XML Sitemaps manually, if they are a lot (imagine the case of index sitemap enumerating tens or hundreds of them), you can ask Visual SEO Studio to do all the legwork for you.

Invoke the command “Crawl Sitemap...

Note: you’ll need the Professional Edition or an active Professional Trial here.

Auditing Sitemaps
Auditing Sitemaps

and see if any item will be correctly crawled of will be skipped because it has the wrong scheme.

Conclusions

I hope you found this tutorial useful. We only scratched the surface of the functionalities provided by Visual SEO Studio.

Migrating a big site to HTTPS involves a careful planning and several steps out of the scope of this tutorial.
For complex sites, you need a tool to automate your audit, and Visual SEO Studio is a great companion for the task!