Text summarizer using deep learning made easy – Hacker Noon

1 – Building your deep work online

we would be using google colab for our work , this would enable us to use their free gpu time to build our network , ( this blog would give you even more insights on the free ecosystem for your deep project)

you have 2 main options to build your google colab

  1. Build a new empty colab
  2. Build from github , you can use this repo , which is a collection of different

you can find the details on how to do this in this blog

having your code on google colab enables you to

  1. connect to google drive (put your datasets onto google drive )
  2. free gpu time

you can find how to connect to google drive in this blog

2- Lets represent words

since our task is a nlp task we would need a way to represent words ,this have 2 main approaches that we would discuses ,

  1. either providing the network with a representation for each word , this is called word embedding , which is simply representing a certain word by a an array of numbers , There are multiple already trained word embedding available online , one of them is Glove vectors
  2. or letting the network understand the representations by itslef

3- The used Datasets

For this task we would use a dataset in form of news and their headers , the most popular is using the CNN/Daily Mail dataset , the news body is used as the input for our model , while the header would be used as the summary target output .

These datasets could be found easily online , we would use 2 main approaches for using these datasets

  1. using the raw data itseld , and manually applying processing on them
  2. using a prepossessed version for the data , it is currently used in the most recent researches

4 – Models used

Here i would briefly talk about the models that would be included if GOD wills in the coming series , hope you enjoy

A .Corner Stone model

to implement this task , researchers use a deep learning model that consists of 2 parts , an encoder , that understands the input , and represent it in an internal representation , and feed it to another part of the network which is the decoder ,

The main deep learning network that is used for these 2 parts in a LSTM , which stands for long short term memory , which is a modification on the rnn

in the encoder we mainly use a multi-layer bidirectional LSTM , while in the decoder we use an attention mechanism , more on this later

B .Pointer Generator

But researchers found 2 main problems with the above implementation , like discussed in this ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks , they have a truly amazing blog you need to see

which is

  1. the inability of the network to copy Facts (like names , and match scores) as it doesn’t copy words , it generates them , so it sometimes incapable of generating facts correctly
  2. Repetition of words

this research builds on these 2 main problems and try to fix them , I have modified their repo to work inside a jupyter notebook on google colab

C. Using Reinforcement learning with deep learning

I am still researching on this work , but it is a truly interesting research , it is about combing two fields together , it actually uses the pointer generator in its work (like in implementation B ) , and uses the same prepossessed version of the data .

This is the research , it uses this repo for its code

they actually are trying to fix 2 main problems with the corner stone implementation which are

  1. the decoder in the training , uses the (1 output from the encoder) , (2 the actual summary) , (3 and then uses its current output for the next action) , while in testing it doesn’t have a ground truth , as we it is actually needed to be generated , so it only uses (1 output from the encoder) (2 and then uses its current output for the next action) , this causes an Exposure Problem
  2. the training of the network relies on a metric for measuring the loss , which is different from the metric used in testing , as the metric used in training is the cross entropy loss , while the metric for the testing (like discussed below) is non-differentiable measures such as BLEU and ROUGE

I am currently working on implementing this approach in a jupyter notebook , so if GOD wills it , you would see more updates concerning this in the near future .

4 — Summary Evaluation

to evaluate a summary , we use a non-differentiable measures such as BLEU and ROUGE , they simply try to find the common words between the input and the output , the more the better , most of the above approches score from 32 to 38 rouge scores

I hope you enjoyed this quick overview on the series , my main focus in these blogs is to present the topic of text summarization in easy and practical way , providing you with an actual code that is runnable on any computer , without the need to have a powerful GPU , and to connect you to the latest researches about this topic , please sow your support by clapping to this blog , and don’t forget to check out the code of these blogs

In the coming blogs if GOD wills it , i would go through the details to build the corner stone implementation , that actually all the modern researches are based apon it , we will use word embedding approach , and we would use the raw data , and manually apply preprocessing

While in later blogs if GOD wills it , we would go through modern approaches like how you would be able to create a pointer generator model , to fix the problems mentioned above , and using reinforcement learning with deep learning .

read original article here