Explore only part of a site

Sometimes you are interested in auditing only a part of a big site, without having to crawl it all. Learn how to tell the SEO spider how to limit the exploration.

Let's see how can we instruct Visual SEO Studio crawler to restrict exploration to sections of a website.

Explore only pages within a folder

You can restrict exploration to a single folder, starting from a page within that very same folder.

Craw option: crawl only within a folder
Craw option: crawl only within a folder

You just have to set the Start URL with the address of a page within the folder, and untick the option "Crawl also outside of start folder" (note: you have to untick the option above in order to have it enabled).

An example of such scenario could be when you only want to crawl the articles of a blog hosted within the /blog/ path.

Keep in mind that the crawler can only explore pages if it finds links pointing to them.

Explore from the root excluding some crawl paths

May be you want to crawl from the home page, yet still want to exclude some section.
All it takes is adding the paths to be excluded in the "Custom 'Disallow' path" field located in the "Advanced Settings" tab. You can add as many directive as you want.

Craw option: exclude some crawl paths
Craw option: exclude some crawl paths

The rows syntax is the one of the robots.txt file Disallow directives, and also supports wildcards:

  • * represents any sequence of characters within the URL path
  • $ represents the end of the URL path

Other ways to restrict crawl

By default the crawler explores all URLs within a site, including subdomains, and crosses HTTP/HTTPS boundaries (i.e. it considers the http:// and https:// versions as part of the same site).
You can prevent such behaviours by unticking the two crawl options "Cross HTTP/HTTPS boundaries" and "Crawl sub-domains" in the main "Crawl settings" tab.