Alon Ghelber is a Product Executive from Tel-Aviv and specializes in VPN, Proxies, Scraping and CX
Welcome to the new way of scraping the web. In the following guide, we will scrape BestBuy product pages, without writing any parsers, using one simple library: Scrapezone SDK.
This SDK does all of the heavy lifting for us, so all we need to do is import it and start scraping. The library allows scraping most of the leading eCommerce sites, but to keep it specific we will use it to scrape Best Buy here. Specifically, we will get the title, description, price, and rating information.
This data can be used in so many interesting scenarios, and if are building a software product that uses parsed data from eCommerce websites, this can save you thousands of dollars and months of development. The SDK will avoid anti-bot detections for you, use different IP addresses and user fingerprints, and scale the scrape you send over thousands of parallel workers – so the rate and amounts of data you can scrape are almost endless.
Why Scraping as a Sevice?
You must have heard a lot about SAAS solutions, and maybe even building one yourself. The internet software world is turning into one that allows you to focus only on what you are building, without needing to deal with writing code that is already implemented as a product somewhere. Just like you don’t implement your own servers anymore or your eMail service, Scraping as a service is turning web scraping into an easy, usable service that just gets you the data you need.
Before we start, let’s set up the work environment
In order to sarpe the BestBuy product details, we will use the Scrapzone Node.JS SDK. Node allows us to import and use external libraries in a breeze.
If you are unfamiliar with Node.JS, download and install it according to the installation guide. After installing you can validate that node is installed on your system using the command:
If you see the version printed, you are all set. My current version is v14.15.0, but any node version above 8.0 should do.
If you don’t have one yet, register for a free Scrapezone Account here. This account will be loaded with 1,000 free scraping credits, which should get you started. After verifying your email address, you should log in to Scrapezone Dashboard to get your scraping username and password. The details will be in Home -> API Information.
Create a new folder named bestbuy scraper, and initialize a new node project in it.
Open Terminal and type:
This will create a new standard npm project named ‘bestbuy_scraper’ in this folder. As you can see, the folder now contains a ‘package.json’ file.
If you are new to Node.js, package.json is the file that defines the dependencies and project information.
Now to install Scrapezone SDK type the line:
npm install scrapezone-node-sdk
This will add scrapezone-node-sdk to the project as a library.
Scraping BestBuy Products: The Code
Create a new file and name it ‘index.js’. Open it in your favourite editor and paste the following code:
const ScrapezoneClient = require('scrapezone-node-sdk');
const scrapezoneClient = new ScrapezoneClient("<YOUR_USER>", "<YOUR_PASS>");
query: [ 'https://www.bestbuy.com/site/sony-wh-1000xm4-wireless-noise-cancelling-over-the-ear-headphones-black/6408356.p?skuId=6408356', 'https://www.bestbuy.com/site/sony-wi-1000xm2-wireless-noise-canceling-in-ear-headphones-black/6395364.p?skuId=6395364'
]}).then(results => console.log(results));
Paste the username and password from Scrapezone Dashboard instead of <YOUR_USER>, <YOUR_PASS>, and you’re all set.
What this code does is send an API request to Scrapezone and polling the results. For two product pages, the scraping time should be under 20 seconds, and for 1000 pages under 8 minutes.
In this code, we use a BestBuy scraper, but you can use any of the official Scrapezone Scrapers, documented here.
The code is very basic. The SDK receives an object with 2 parameters: scraper_name, and query. The scraper name specifies which scraper will be used, and the query is a list of URLs to scrape.
To run the code, open terminal in this folder and type:
Once the pages are scraped, the SDK will return a parsed JSON response, which we print to the console.
The SDK is limited to 1,000 URLs in one query, so in case you need to scrape more URLs, it is possible to split the request into multiple chunks and send them in parallel.
I hope this guide saved you a lot of precious time and that you find it very useful. Good luck and I hope you will be building amazing products using this cool technology.
Create your free account to unlock your custom reading experience.