A step by step tutorial to analyse sentiment of Amazon product reviews with the FastText API
This blog provides a detailed step-by-step tutorial to use FastText for the purpose of text classification. For this purpose, we choose to perform sentiment analysis of customer reviews on Amazon.com and also elaborate on how the reviews of a particular product can be scraped for performing sentiment analysis on them hands on, the results of which may be analysed to decide the quality of a product based on the given feedback, before purchase.
What is FastText?
Text classification has become an essential component of the commercial world; whether it is used in spam filtering or in analysing sentiments of tweet sor customer reviews for E-Commerce websites, which are perhaps the most ubiquitous examples.
FastText is an open-source library developed by the Facebook AI Research (FAIR), exclusively dedicated to the purpose of simplifying text classification. FastText is capable of training with millions of example text data in hardly ten minutes over a multi-core CPU and perform prediction on raw unseen text among more than 300,000 categories in less than five minutes using the trained model.
Pre-Labelled Dataset for Training
A manually annotated dataset of amazon reviews obtained from Kaggle.com containing few million reviews was collected and used for training the model after conversion to FastText format.
The data format for FastText is as follows:
__label__<X> __label__<Y> ... <Text>
where X and Y represent the class labels.
’ and a space.
__label__2 Great CD: My lovely Pat has one of the GREAT voices of her generation. I have listened to this CD for YEARS and I still LOVE IT. When I'm in a good mood it makes me feel better. A bad mood just evaporates like sugar in the rain. This CD just oozes LIFE. Vocals are jusat STUUNNING and lyrics just kill. One of life's hidden gems. This is a desert isle CD in my book. Why she never made it big is just beyond me. everytime I play this, no matter black, white, young, old, male, female EVERYBODY says one thing "Who was that singing ?"
signifies that the reviewer gave either 1 or 2 stars for the product, while
indicates a 4 or 5 star rating.
Training FastText for Text Classification
Pre-process and Clean Data
Execute the following command to generate a preprocessed and cleaned training data file after normalizing text case and removing unwanted characters.
cat <path to training file> | sed -e “s/([.!?,’/()])/ 1 /g” | tr “[:upper:]” “[:lower:]” > <path to pre-processed output file>
$ wget https://github.com/facebookresearch/fastText/archive/v0.1.0.zip $ unzip v0.1.0.zip
Move to the fastText directory and build it:
$ cd fastText-0.1.0 $ make
Running the binary without any argument will print the high level documentation, showing the different use cases supported by fastText:
>> ./fasttext usage: fasttext <command> <args> The commands supported by fasttext are: supervised train a supervised classifier quantize quantize a model to reduce the memory usage test evaluate a supervised classifier predict predict most likely labels predict-prob predict most likely labels with probabilities skipgram train a skipgram model cbow train a cbow model print-word-vectors print word vectors given a trained model print-sentence-vectors print sentence vectors given a trained model nn query for nearest neighbors analogies query for analogies
subcommands, which corresponds to learning (and using) text classifier.
Training the model
The following command is used to train a model for text classification:
./fasttext supervised -input <path to pre-processed training file> -output <path to save model> -label __label__
, containing the trained classifier, is created in the given location.
Optional parameters for improving models
Increasing number of epochs for training
By default, the model is trained on each example for 5 epochs, to increase this parameter for better training, we can specify the -epoch argument.
./fasttext supervised -input <path to pre-processed training file> -output <path to save model> -label __label__ -epoch 50
Specify learning rate
0.1 - 1.0
The default value of lr is 0.1. Here’s how we specify this parameter.
./fasttext supervised -input <path to pre-processed training file> -output <path to save model> -label __label__ -lr 0.5
Using n-grams as features
This is a useful step for problems depending on word order, especially sentiment analysis. It is to specify the usage of the concatenation of consecutive tokens in a n-sized window as features for training.
parameter for this (ideally value between 2 to 5):
./fasttext supervised -input <path to pre-processed training file> -output <path to save model> -label __label__ -wordNgrams 3
Test and Evaluate the Model
The following command is to test the model on a pre-annotated test dataset and compare the original labels with the predicted labels of each review and generate evaluation scores in the form of precision and recall values.
The precision is the number of correct labels among the labels predicted by fastText. The recall is the number of labels that successfully were predicted.
./fasttext test <path to model> <path to test file> k
where the parameter k represents that the model is to predict the top k labels for each review.
The results obtained on evaluating our trained model on a test data of 400000 reviews are as follows . As observed, a precision, recall of 91% is obtained and the model is trained in a very quick time.
Analyse Sentiments of Real-Time Customer Reviews of Products on Amazon.com
Scrape Amazon Customer Reviews
We use an existing python library to scrape reviews from pages.
To setup the module, In your command prompt/terminal type:
pip install amazon-review-scraper
Here’s a sample code to scrape review of a particular product, given the url of the web page:
from amazon_review_scraper import amazon_review_scraper url = input("Enter URL: ") start_page = input("Enter Start Page: ") end_page = input("Enter End Page: ") time_upper_limit = input("Enter upper limit of time range (Example: Entering the value 5 would mean the program will wait anywhere from 0 to 5 seconds before scraping a page. If you don't want the program to wait, enter 0): ") file_name = "amazon_product_review" scraper = amazon_review_scraper.amazon_review_scraper(url, start_page, end_page, time_upper_limit) scraper.scrape() scraper.write_csv(file_name)
if it does not exist already, for the scraper to function properly.
The above code scrapes the reviews from the given url and creates an output csv file in the following format:
and a space as in the training file, and store them in a separate txt file for prediction of sentiments.
Prediction of Sentiments of Scraped Data
./fasttext predict <path to model> <path to test file> k > <path to prediction file>
where k signifies that the model will predict the top k labels for each review.
The labels predicted for the above reviews are as follows:
__label__2 __label__1 __label__2 __label__2 __label__2 __label__2 __label__2 __label__2 __label__1 __label__2 __label__2
Which are quite accurate as verified manually. The prediction file can then be used for further detailed analysis and visualization purposes.
Thus, in this blog, we learnt using the FastText API for text classification, scraping Amazon Customer Reviews for a Given Product and predicting their sentiments with the trained model for analysis.
If you have any queries or suggestions, I would love to hear about it. Please write to me at [email protected]