There are so many things under development in Visual SEO Studio that on Sunday 27th I published an intermediate release: ... (suspence)... "painkiller", the new version of the Site Audit tool. It addresses all major pain points pointed out by the user base.
New crawling policies
First thing you'll notice in this release is the "Crawl a Site" dialog has been simplified.
In the majority of cases user just need to insert the Start URL and click a button, without being distracted by advanced options.
Visual SEO Studio crawl dialog
But it's not only a cosmetic change:
All unnecessary slow-downs during the crawl process have been removed.
Visual SEO Studio crawler, the Pigafetta bot, has always an been extremely polite web citizen - and still strives to be such, today and tomorrow.
When crawling a site with an automated bot, you are wasting server resources and bandwidth of a property which is not your. Normal visitors do the same, but they are likely to convert and thus well accepted by site owners. But robots do not convert. You can give a service back - like search engines do - or at least you have to be polite. That's what the Robots Exclusion Protocol is for.
And I was always concerned with crawl speed. You have absolutely no right to bomb other people sites with what looks like a DOS attack. So I made Pigafetta to always respect the crawl-delay directive, or a courtesy delay when not specified. I allowed crawling at full-speed only the sites a user could demonstrate to be an administrator.
On the top of that, the crawler pipeline is designed to be adaptative to the server response time, and will never overload a site.
There are far too many bad behaved bots out there and to me ethics matters.
But I exceeded a little too much in caution.
I realized I was hurting my own user base - and my users have been always keen to remind me it - for no good reasons.
When a search engine crawler can afford to respect even large courtesy delays, a site crawler can't.
Visual SEO Studio, while still striving to be polite, is supposed to save its users' time. It is appreciated for its data visualization and powerful data extraction and reporting features, but what users perceive first is crawling speed.
...so I made a few resolutions
Something had to be done.
There are two/three major changes in the crawl process:
- you can now crawl at full-speed even sites you do not administer
- when no crawl-delay is specified in the robots.txt file, no extra courtesy delay is added
- when a craw-delay directive is found, it is respected up to a maximum of 2 seconds (previously was 10 seconds).
I might sound silly, but these simple decisions costed me in terms of conscience. Not only I don't want to waste other people bandwidth and resources if I can't give them something back, I don't even want to be the one who sells a gun and says "it's up to the holder's conscience not to use it for doing bad".
What convinced me to do that?
Last point first:
The "10 seconds courtesy delay" rule as a "fair" is many years old. I didn't invent it, it still can be found in many old pages discussing how a web crawler should be built.
But the web has changed, now bandwidth is much larger and web servers have greater scalability. Even normal web browsers perform at spikes many concurrent HTTP requests (think about pages with many external CSS and script files, images, and so on).
I realized that having an adaptive crawler already made Pigafetta a low-footprint consumer.
Adding a courtesy delay when no explicitly requested did not do any further good to the visited web site, just made my users wait more. So no more courtesy delay when not requested.
Now about crawl-delay upper limit set to 2 seconds:
It already changed with 0.8.6, but again, the web has changed and now 2 seconds is what's considered "fair" (even googlebot often uses it). Search Engines can afford more, but an interactive site crawler can't.
Site owners can always decide to block Pigafetta, and SEOers to crawl their sites with much less polite crawlers.
I already mentioned the adaptative crawler. There simply is no way to DOS a site with Visual SEO Studio, it will respect the server throughput. So I decided to drop the "administrators only" limitation as well (by the way, there still are good reasons to declare your administered sites, more on this later).
The first user to in fact appreciate the change result is myself.
I always disciplined to do all site audits like the rest of you, and have to admit the new policies really changes user experience. Interactive crawl is now enjoyable and fast.
A few words more about the crawl speed
Yes, competitors' crawlers usually crawl sites faster. Sometimes much faster, sometimes in a comparable way. And sometimes the might crash slow web servers.
Visual SEO studio is different. It doesn't by design use a parallel pipeline, it crawls a web site in strict breadth-first sequential order. This way not only it can adapt to the web server throughput: it can also reconstruct the site link structure, a valuable information normally hidden by tabular only data.
That said, there's still space for improvement in crawl speed. I estimate I could squeeze about a 25-30% in speed with responsive sites, just keeping the polite sequential exploration and paralleling data storage and user interface updating. But that's food for future releases!
There are also other news about crawling options:
You can now choose to avoid exploring sub-domains. The default value is crawling them, as it better emulates the way a search engine sees your site as a whole, but now you can choose not to do it, as you might want to limit the scope of the crawl process. Thanks to the user who suggested me the feature.
Also, I said there still might be good reasons to declare your administered sites. When you do it, for instance, now you can even spoof the user agent. And more you will be able to do in the near future...
Visual SEO studio habla Español
I'm really happy to announce Visual SEO Studio now speaks Spanish as well!
Visual SEO Studio with Spanish User Interface
Spanish language support has always been in my wish-list, I only missed native-language SEOers volunteering to help me. Then chatting with two Italian friends who both spoke the language (one used to live in Mexico, the other in Spain), they unexpectedly offered their help.
I normally champion the mantra "translations by native-speakers", but the alternative was to procrastinate I don't know for how long. They are not into SEO but both are tech savvy, and I at least can read it and am learning a little more of Spanish to try the challenge.
I could never thank enough my friends Mauro Larosa and Silvano Parodi; with their enthusiasm and iterative help we made it.
I would never had a Spanish version so soon without their patient, uninterested help. Thank you!
The two guys had to largely work "in the dark" without having context and not able to see the end result, and I did a patient job trying to fit all the gaps with my tiny language knowledge (which is: I can read it with enough ease, hold a simple conversation with it, but it takes me great effort to write even short sentences).
So yes, I am the one to blame for all the mistakes, as I checked every sentence with my poor Spanish and for sure added mess.
I was confident I could find a skilled native language SEOers to at least test a private beta once there was something to show, and in fact a volunteer arrived as soon as I announced the availability of a private beta for those who dared to try it.
Jose Luis Santana Blasco come to the rescue and was been so kind and patient to to point me to the most evident mistakes I made. Thank you Jose Luis, you helped me hugely.
Again, any mistake left - and for sure there are tons - is my fault, as I pushed to add Spanish language support as "experimental" into the 0.8.10 release I was going to publish.
...but now "Painkiller" is out, and every native Spanish-speaking SEO can enjoy the version, curse me for the terrible mistakes I made, and hopefully help me make a better product.
So let me celebrate :)
Fixed long standing issues
I wish I had a penny for each time I've been bugged by the same recurring problem. A crash experienced by about 3% of my user base - quite a lot of people - and I had no clue what it was. The automated diagnostic didn't help, and I just had to hope they left their e-mail address along with the crash report so I could tell them to restart the program (users were usually able to get past it by restarting the program, or by restarting the PC).
"Luckily" enough a similar crash, but much rarer, happened to just two users. It had nearly the same root cause and symptoms, but made me realize what was probably going on: an hard to reproduce crash occurring in some cases when a program instance was already running.
I spare you the boring details, but I'm glad to say it should be over now!
There are several other fixes and minor improvements. For those having trouble getting asleep, I can recommend to check them by reading the full release notes.
Conclusions, and what's next
This release is an important step stone. It closes the most common instability issues, and changes crawling experience in better while keeping the spider well behaved.
It also fully inaugurated the new development and deployment cycle I worked on in the last few months.
As I stated at the beginning of the article, this is to me an intermediate release.
There are many new things I'm working on in the development branch I hope to release soon. Some I already announced in occasion of previous releases, and some I want to keep as a surprise.
So stay tuned!
In the meanwhile, enjoy this new release!
If you tried Visual SEO Studio in the past, time to give it a second look.
If you haven't tried it yet, time to give it a go.
So don't waste any more time, install the latest version and save tons of time with your SEO audits!
Comments are open on linked Google+ page.