Release Notes for version 0.8.29.3

Product release notes detail every single modification made on the release. Find out what changed with Visual SEO Studio version 0.8.29.3

Visual SEO Studio 0.8.29.3

Published: Tuesday, April 28, 2015

This is mainly a stability and performance improvement release. It also adds better detection of character set encoding, and lays the foundation for soon to come important improvements.

New Features:

  • Full support of BOM (Byte Order Mark) encoding detection. According to most recent specifications, BOM encoding - when present - takes precedence over HTTP header content-type. The software now correctly recognize BOM for UTF-8, UTF-16, UTF-16 big endian, UTF-32 and UTF-32 big endian for all downloaded content (html pages, xml sitemaps, robots.txt files, etc...)
  • Other than the character set read (if present) from the HTTP header "content-type", the tool now stores and shows as page properties also the charset read from (if present) the BOM (Byte Order Mark) and meta charset. A page property "Character Set" shows the computed result of the cascade of the following three: "BOM", "Character Set (HTTP)", "Character Set (Meta)".

Performances:

  • Reduced memory footprint during crawling. It also should be slightly faster in parsing the html, although one might not notice it.
  • The part that stores crawled pages has been rewritten to be much faster. It already was running in background, so it's easier to notice when crawling more sites in parallel or when quite ahead in crawling large sites.
  • Minor performance improvement at startup when having to write or read the configuration file the first time. The change is actually a workaround to permit users with un-patched WinXP + .net3.5 using the product, and resulted in a minor speed up.
  • Another minor improvement at startup makes the software UI respond faster when displaying the Start Page.
  • Crawling: minor (very little) percentual improvement in performance.

User Experience / Usability:

  • A progress bar is now shown when deleting a big crawl session takes time and blocks the application UI.
  • When requesting to upgrade an old project, the project name is shown.
  • Manage Crawl Sessions form: 'Name' and 'Start URL' columns are wider, so it is easier to read contained values.
  • Custom filter: ux goodie, expression rows grid unsaved changes are persisted also by pressing the Save button on the toolstrip.

Various:

  • UI: a new icon to identify VSS both on the desktop and on software forms. People kept telling me the old one was too ugly (yes, I made it by myself) as a desktop icon, and many didn't recognize it represents a chameleon (and not all other kind of reptiles it has been called), so I had one made by a pro. Not sure it conquers me, let's hear the users' feedback.
  • Now all .dll and .exe files are digitally signed, not only the .msi installer file. This should minimize the chance some antivirus would mistakenly report them as suspicious and block them.
  • UI: when no character set is specified for a page, the default UTF-8 encoding used is reported in lower-case.
  • VSS verification: added support to BOM-detected encoding.
  • Custom Filters UI: if there is no localized string for a queryable property or an expression operator, then the name of the enum value is shown in the combo box.
  • The tool is now more robust: even if one kills the process during a crawl, the saved data preserves its integrity and can be deleted. No more orphan pages and being unable to remove a forcefully interrupted crawl session.
  • When removing a crawl session, afterwards the tool will also remove any possible "orphaned" page. Those are now not anymore possible, but could exist in some old crawls.

Fixes:

  • Fixed real-world crash occurring at startup which prevented quite a number of people with WinXP + .net 3.5 not properly patched to use the product. A workaround now should avoid the problem to occur.
  • Fixed real world crash occurring when base tag contained multiple slashes. Affected one user, sorry.
  • Spider/Parser: pages with Character-Set not expressed in HTTP header but in meta tag were treated as UTF-8. This affected a small - but not null - percentage of web sites, where encoding were going bananas. The software now recognizes both "http-equiv" and "charset" (html5) meta tags.
  • Default encoding for robots.txt, when no BOM nor HTTP header are provided, is now UTF-8. It previously was considered ISO-8859-1 in accordance to RFC2616 (HTTP protocol, section 3.7.1), but that is superseded buy reality (for html we already defaulted to UTF-8, basing on statistical evidence).
  • HTTP Issues, fixed selectors (errors/warnings) in case of some uncommon status coded.
  • Crawl/Index/Table Views: fixed colour and icon for some uncommon HTTP status codes.
  • Fixes crashing condition, which could occur when renaming the last crawled session. No real users have been affected.
  • Disabled message "Max Depth X exceeded, skipping Y (Referrer is Z)" during crawling, as if met in huge quantity continuously (in same scenarios could happen, especially when X value is low) would make the UI non responsive after a while. Not really a fix yet, just a workaround (proper fix will come soon).
  • Custom Filters UI: if an expression row already existed and the user created a new row then the operators combo box showed wrong values taken from that existing one.
  • Custom Filters UI: when switching from a unary queryable property to a binary queryable property the value combo was not enabled/disabled correctly.
  • Crawling: better robustness against uncommon content-type values in encoding detection.
  • Custom filters: notation "html/title" fixed as "html/head/title".
  • UI, grids and property window: when showing URLs (case of broken links) containing new line characters, those were stripped off and one couldn't realize what the problem was. They are now replaced with their textual representation "\r" and "\n".
  • UI, tree views: when showing URLs (case of broken links) or titles containing new line characters, the node text was unreadable. They are now replaced with their textual representation "\r" and "\n".
  • Administered Sites form: flickering during the load phase fixed.