In this chapter we will work on a real Recipe, from definition to extraction.
Our goal will be to extract the table on this page:
Here are the steps to follow:
1. Open the website to be analyzed in a Chrome tab.
2. Click the Free / Batch Data Collector icon to launch the program on top of this Chrome tab, pointing the program to the source code you’d like to analyze.
3. In Free / Batch Data Collector – Advanced Interface – choose a name for your Recipe:
4. Back on the original webpage, look for the parent element of the table. We recommend you use Chrome’s Inspect feature, which can be accessed by right-clicking on the first cell of the table and choosing Inspect from the menu.
5. From Inspect we can clearly recognize that the repetitive element is table#customers tr
6. The child columns of the repetitive element will be td with Instance n. progressing from 0 to 2 (since we’re hoping to extract a three-column table).
7. For each column set the Type drop-down to text.
Here’s our completed Recipe:
8. When you’re ready to proceed, click Empty Results and Extract.
At the bottom of the page you’ll see the generated JSON code and 3 options to extract them. The first allows you to export the data collected in Excel format.
The others allow you to visualize the extracted data without passing them to a spreadsheet.
Quick and simple, no?