← Home

Core components of recommender systems

May 8, 2021

The way that recsys is usually taught by starting from its taxonomy (collaborative filtering, content based, hybrid based) is confusing to me at first. In this post I am going to take another path to explain recsys by discussing the core components first and then draw the connection to the recsys taxonomy. Essentially, the majority of recsys objective is to map items and users on the same vector space through a learning objectives so that we can determine items to give to users for a recommendation or which items to show in a similar items page. My post is mostly based on this wonderful talk by James Kirk from Spotify so I highly recommend you to watch the video to gain further understanding.

Main components of a recommendation system:

  1. Interactions Matrix
  2. User and Item Features
  3. User and Item Representation Function
  4. Learning Objective
  5. Prediction Function

Let’s go through it one by one!


Interaction Matrix

Interaction Matrix is the main components of a recsys, usually of $M \times N$ size, where $M$ is the number of users and $N$ is the number of items. Note that user-item definition in a recsys is not restrictive to an actual user or item. The user is the receiving and acting entity on the recommendation, whereas the item is the passive entity that is being recommended to the user.

The matrix contains the interaction value between a user and an item. The interaction value can be:

Image Source: James Kirk’s slides

Interaction value should reflect what is the intended effect that we want to happen in our recommender system. For example, if we want users to purchase more given our recommender system, then we want our interaction value to be related to purchase. If we want users to engage more in our platform, then our interaction value should be related to engagement.

We might also want to handle and preprocess our interaction matrix before using it. In some settings, users can give negative explicit interaction. Negative interactions can be a rich signals for the recsys to learn from but we need to be careful in handling them. Some loss functions such as learning-to-rank cannot accomodate negative signals. Interaction matrix is also usually sparse, so we need to handle missing value. In explicit interactions, we can’t just replace them with zeroes as it indicates that the user do not like the item. Hence we just go about modelling with sparse data. In implicit interactions, it is much safer to replace with zeroes as it indicates no action from the user (no views, no plays, etc). Missing values are usually what we want to predict with our recsys: giving item recommendation to a particular user that hasn’t been interacted by the user.


User/Item Features

There are two types of features in a recsys:

In a lot of libraries out there, if we do not specify any metadata features for item and user and use interaction matrix solely in our recsys, then by default only the indicator feature is used. The model is only going to learn representation that is unique for every user/item.

Image Source: James Kirk’s slides


User/Item Representation Function

In recsys we want to find a good representation for the user and item. A representation is typically a low-dimensional vector that encodes information regarding the user and item. Getting the representation involves transforming the user/item features via a representation function. Some examples of representation function:

  1. Linear Kernels
  2. Neural Networks (Fully Connected, Transformers, Word2vec, Autoencoder)
  3. Passthrough (None)

Depending on the design, there are many ways to choose our representation function. Linear kernels are effective with both indicator and metadata features. Linear kernels are also a natural choice for matrix factorization problem. If we have an unstructured data, e.g., text/image that we want to utilize, we can use separate architecture like word2vec or autoencoder to get the text/image representation and then feed it into our recsys representation function. We can also use different representation function for different features, for example, linear kernels for indicator and metadata and passthrough for text/image (directly using representation output from the word2vec or autoencoder).


Prediction Function

A function that outputs the estimated items’ relevance to a particular user. This function converts user/item representation into a prediction. The prediction itself can vary depending on the design: it can be a score that tells us how relevant an item to a user but it can also be a relevance rank on several items for a particular user. Some examples of prediction function:

  1. Dot product
  2. Cosine similarity
  3. Euclidian distance
  4. Neural network

Learning Objective

This is essentially our loss function that let us uncover the best representation for the user/item. The learning objective converts both the user/item representation and prediction into a loss to learn the best parameter for the recsys model. Learning objective can have a huge impact on the output of the recommendation system. Changes in learning objective can dramatically affect how our recsys feel to the user. Examples of loss function characteristics:

Due to these differences in nature, we must choose learning objective that best fit our goal. Some examples of loss function:

  1. Root-mean-square error
  2. KL Divergence
  3. Alternating least square
  4. Bayesian personalized ranking (learning-to-rank)
  5. Weighted approximately ranked pairwise (learning-to-rank)

Combining it all together

Image Common architecture of a recommender system. Source: James Kirk’s slides

There we have it: the architecture of recommender systems. Let’s try to design a recommender system based on some cases:

For you page. The example case is to recommend item on a homepage personalized for every user:

Similar items. The example case is to recommend item given another item:

With this architecture, we can also incorporate more complex feature representation into the system. For example, if we want to utilize the social graph between user-user, then we can introduce graph neural network that takes an input of user graph (user features) and outputs the representation to be further processed (user representation).

For you page utilizing social graph structure:

Simpler recsys such as user-based and item-based that does not require training also fits in this architecture. Let’s take a look:

User-based recsys:

Item-based recsys:

Is there a recsys model that does not fit into this architecture?

Yes! In some cases we can use the interaction matrix to design the training data and not use it as our label. You can find some of the examples here where the author tries to create a pseudo-sentences of products to then be fed into a word2vec training. This setting does not use the interaction matrix as the label.

Thanks for reading! Contact me on twitter if you spot any mistake.