Recommendation algorithms have naturally penetrated to every part of information we get from the internet, from your basic search results on Google to your social media news feed on Instagram.
The whole process of the internet coming up with suggestions of what to read, listen to, or watch next has become something we are so familiar with that now we no longer appreciate it as much as we used to.
But recommending the right content to the user and making the users to actually view it is one of the biggest factors that affect significant portions of the company’s profit.
So the industry has been developing lots of methods to improve the quality of recommendation, and along the way, it has sprouted a lot of academic researchers to address more scientific questions related to the field.
Moving towards a generation that videos communicate better than plain texts or images, it is worth looking into how YouTube, the video giant subsidiary of Google, manages to deliver this important functionality of effective content recommendation to the users.
And as a company deriving most of the profits from ad placement, the recommendation algorithm lies at the heart of the lucrative plot.
YouTube’s Deep Recommendation Algorithm
The authors introduce a recommendation system architecture described in the Figure below.
(figure taken from the paper Covington et al. 2016.)
It is a process of narrowing down videos through two deep learning models: (1) candidate generation model, and (2) ranking model
(both depicted as funnel-shaped figures in the diagram). These models
are aimed at solving the following classification problem:
Given the video, user, and context features (training data), what is the probability of the user watching a certain video next?
The problem is posed as an extreme multi-class classification, where there exist millions of classes (videos), and the job is to model the probability of the user watching each video given the training features.
The models are trained in a supervised manner, which means they are trained by optimizing a loss function based on the fixed amount of training data available to the algorithm. In this case, the training data are the videos and the user/contextual data.
Candidate Generation Model
The candidate generation model, shown in the figure below, takes in several user/contextual data as the training data:
- Watch history
- Search history
- Geographic information
- Age, gender, logged-in state, etc.
- Video age
(figure taken from Covington et al. 2016)
With these data, the model is trained to predict the class
probabilities. At test time, because the real-time computation of the
class probabilities is high, the algorithm selects top N videos
according to an approximate nearest neighbor algorithm that simply
searches for the videos that lie closest to the user vector generated.
The ranking model is similar to the candidate generation model, while it
is allowed to access more information about the user and videos because
now the search space of the videos has been narrowed down (from
millions to a few hundred) by the candidate generation model. The model
structure is depicted in the figure below.
(figure taken from Covington et al. 2016)
The input of the model now consists of richer features compared to the candidate generation model:
- Video history
- Search history
- User’s past interaction with similar videos
- Video language & User language
- Time passed since the last watch
of the videos, using weighted logistic regression at the ultimate
layer’s output. At test time, the videos with the most expected watch
time will be suggested to the users in the end.
Their experiment section, although we will not go into too much detail here, verifies the following:
- Candidate generation model performs better when using multiple types of features altogether (Figure 4, 6 in the paper)
- The ranking model performs better when using a wider and deeper hidden layer (Table 1 in the paper)
However, the take-away from this work is that they empirically verified that such extra-large-scale models also work as expected for a recommendation task.
Although it would have been more interesting if they provided how effective their recommendation was compared to other baseline methods, e.g. naive collaborative filtering with matrix factorization, they did show that adding more layers, making the models more complex with richer features, do help in terms of performance, given enough computation power and lengthy engineering work.
Recommender system in Stan World
The recommender system in Stan World will be a driving force for understanding user behaviors and making the World more explorable for them.
With a rich amount of data from the users, including their preferences, travel/search/purchase history, gaze movements, we hope to provide users with a tailored set of advertisements, contents, and entertainment.
A high-quality recommendation is crucial to maintaining user loyalty — avg. user-retention rate + avg. user-engagement rate — in the World, and this leads to more interesting content and fun among the users, a higher chance of being discovered for Virtual Resort owners, and more revenue to the platform.
To conclude, there are three points that we should bear in mind when building our own recommendation system at Stan World.
Deep learning can be effectively applied for recommendation systems.
This is good news. We can exploit the powerful representational power of
deep learning to develop a recommendation system. The recommendation is all about understanding the user’s preference, suggest something that is the most similar to that preference.
So the objective then is to accurately capture that abstract concept of “similarity”, and deep learning is quite specialized in such areas.
Training and deploying such system online requires great computational cost and such reasonable performance is only possible when there is an ample amount of training data.
This is bad news, especially for the early stage of development. This is
commonly referred to as a cold start problem, where the system cannot
predict anything useful due to the lack of existing data at the initial
stage of deployment.
However, we have solutions to this problem:
1) Unlike YouTube, a platform where users join to discover who/what they
like, Stan World is a platform where users join to visit resorts launched by who/what they already like; they already have at least one specific destination in mind. A resort visit is already providing the first layer of preference data.
2) The main activities inside the resort — games, content, events — will provide further in-depth data, a secondary layer of preference data. Some examples of the activities are casual games like this recently launched quiz game that has successfully gained traction in growth and revenue or this soon-to-launch digital content magazine with a top YouTuber in that category.
3) Users on our platform is incentivized and rewarded with STAN coins — our platform’s token — for high user engagement, retention rate, and participation in our surveys.
The model requires a lot of trial-and-error work, to balance the performance, complexity, and user-friendly deployment schemes.
Industry-scale models tend to require exponentially large effort in managing, and because the models are inherently complex, it is usually very difficult to debug or control. This is something we must be really careful about when deploying the algorithm online, as we do not want to startle the users. To prevent such cases, it will require a lot of thorough testing, both offline and online.
As our recommendation system becomes smarter, the user-experience satisfaction rate will increase due to more relevant content and advertisement.
In the next post, I will discuss more in-depth on the solutions and implementations mentioned above to attract more active users, build the
database required, train the model, and iterate these processes to ultimately improve the model based on the users’ feedback.
(Disclaimer: The Author is an Engineer at Stan World)