Visual SEO Studio 1.4.0 New feature: Data Extraction
Readability Analysis extended
Inbound links computation
Custom filters enriched
'Copy XPath' helper
'Find in grid' more powerful
Conclusions, and what’s next

 

New feature: Data Extraction

Data Extraction permits to data mine crawled pages, scraping content based on XPath selectors. Results will be collected in a neat table.
It works on pre-crawled data sets, thus you can repeatedly test, correct aim and refine; it saves tonnes of time not having to repeat the exploration each time.

Visual SEO Studio Data Extraction
Visual SEO Studio Data Extraction

You can add any number of custom columns. All will be resizable, movable, hidable and sortable.
In case an XPath expression returns a collection of node, you can choose whether to keep only the first one (default behaviour) or keep them all. In the latter case, additional results are reported in the same column to ease data analysis and consumption. The choice is customizable for each single column of extracted data.

Remember when we detailed the full timetable of BrightonSEO Sept 2016 speeches before it was officially published? We used the then internal version of "Data Extraction". Here is how to do it:

First we crawl the page https://www.brightonseo.com/speakers/ with Visual SEO Studio spider.
When selecting crawl parameters, we uncheck the option "Crawl also outside of Start Folder". This produces a set of page all nested within the start URL folder; each page is dedicated to a single speaker containing their name, profile, social media links, and the list of their speeches.

Index View of the crawled pages
Index View of the crawled pages

With the browser "Inspect" tool and some XPath basics it's easy to produce a few XPath expressions locating the data juice we want to show in a table.

Data Extraction parameters example
Data Extraction parameters example

In our example:

Speaker:
/html/body/div/div[1]/section/div/div/div/div[2]/h2

Speech(es):
//h2[@class='schedule__event']
Note: here we deselected the option "Extract only first element in case of multiple results", because some speakers lectured more than once.

Twitter address:
//a[contains(@href, 'twitter.com')]/@href

LinkedIn address:
//a[contains(@href, 'linkedin.com')]/@href

The only thing left to do is clicking on the "Extract Data" button.
Et voilà, your data is served:

Data Extraction result
Data Extraction result

Extracted data cells have "Show in code" and "Show in DOM" context menu options. Aside, "Content", "DOM" and "Session" right panel views are available.

Show in DOM in Data Extraction
"Show in DOM" in Data Extraction

You can browse each item URL page thanks to the "Browse URL" context menu item.

Remember: each grid in Visual SEO Studio can be exported to Excel or CSV formats.

Export to Excel
Export to Excel

Note: "Data Extraction" is available only in the Professional Edition. You can evaluate it for free for 30 days by registering the Trial version.

Readability Analysis extended

With release 1.3.0 we introduced Readability Analysis, a content auditing tool able to compute readability of texts in English, French, Italian and Spanish.
Readability scores is only the cherry on the cake, the tool itself permits to word/sentence/letter counts of page sections located by XPath expressions. We decided so to add an "Other languages" option to be used for – guess what? – other languages.

Other languages option in Readability Analysis
"Other languages" option in Readability Analysis

Note: we recommend always setting the correct language if available, even if you were only interested in the counters and not in readability scores: correct sentence computation takes into account the most common abbreviations for the selected language. For example, the text "Mr. Brown and Dr. Fox" is a single sentence in English.

Highly requested from the user base, we prioritized the feature and now you can have internal incoming links computation for every single crawled page. The feature is available in all major views via a context menu, and in the "Page Links" bottom view.

Find incoming links context menu
"Find incoming links" context menu

We debated how to implement the feature and inquired several users on they work habits. Computing the incoming links for all page at crawl-time or when loading from database would have made invoking the command immediate, but would have had downsides: it would have slowed down the two operations, every single time; for big sites, it would have been a significant overhead in terms of memory consumption.
Checking a page incoming links is something you do occasionally, so we decided to compute it only when needed. For huge sites it might take some time, while for small sites it is quite fast.

Custom Filters enriched

Inbound links computation doesn't stop here, it is also a handy operand for Custom Filters. This means you can import and crawl a list of URLs, thus even from different domains, and query whether they point to a specific page.

Incoming links in Custom Filters
Incoming links in Custom Filters

"Copy XPath" helper

The most recent features dedicated to content auditing – Readability Analysis and Data Extraction both do leverage XPath expressions to locate page sections. So why not adding a helper to spare you some time writing them?
That's how we decided to add a new context menu command in the DOM view:

Copy XPath in DOM view
"Copy XPath" in DOM view

Of course you can use similar functions within your browser of choice, and often it is easier to first spot visually where the DOM element is. Yet sometimes we really missed the feature when working on Data Extraction, and we imagined our users would appreciate it too.

There is never a unique XPath expression to locate an element, and often you really need to edit on your own to get the desired outcome. Nevertheless, our XPath detection is well optimized to return the shortest expression to reach the desired path in the DOM structure (we noticed it performs very similarly to Chrome, while Firefox tends to return more verbose strings).

"Find in grid" more powerful

Another improvement solicited by users and beta tester: while you could search every grid, the tool only returned the first occurrence of the searched text.
We listened to them and now the "Search value in grid" feature is a full fledged search: you can search next and previous values, and decide whether to use a case sensitive search.

Search value in grid dialog
"Search value in grid" dialog

Conclusions, and what's next

This release adds a lot of value to Visual SEO Studio. We are proud of the way we implemented it. We literally collected all possible use cases and reproduced them, honing the tool to make you perform the tasks better, quicker and with more ease.

There are several other improvements in this release. For a full and boring list, please consult the Release Notes.

Development of new features doesn't stop here: we have more coming in the next months.

Now, 'The Miner' wants to dig.
Update Visual SEO Studio to the latest release and let it extract the gems for you!


Comments are open on linked Facebook page.