Oct 5, 2021 4 min read

What is a recommender system?

The recommender system produces a list of recommendations such as movies to see, products to buy, news to read, and so on.

Different Categories of Recommender Systems

Recommender systems can be divided into two main types: Content-based and Collaborative filter recommender systems.
Before plunging into the various concepts, let's show the relationship among different recommender systems.

Content-Based Recommender System

A content-based recommender system builds a user profile based upon the previously rated items by the user. For example, a kindle user might give a five-star rating to a book if he really likes it. Similarly, a news-reader will finish reading an item of news if he really concerns about it. These kinds of processes may signify users' interest.
Content-based recommender systems acquire the recommendation idea from the past data of a user based on what items a user purchased, ranked, or paid attention to.
Let's take a news recommender for example. News has features like categories (Politics, Finance, Sports, Entertainments, etc.) or location (local, national, or international). Some Natural Language Processing techniques such as TF-IDF scores can be used to extract such features and feed them into recommender learners.
In this approach, the profile of each user as well as each item is created and vectors are created to represent them.
- Item vector: A vector contains value 1 for words having high TF-IDF and 0 others.
- User vector: A vector contains the user's numerical features.
Finally, we build a model, based on the Item vector and User vector, that explains the observed user-item interactions.

Collaborative Filtering Method

Collaborative filtering uses the feedback of other users to recommend items. These systems evaluate the quality of an item based on peer review.
Collaborative filtering recommender systems are based on the past interactions recorded between users and items to produce new recommendations. These interactions are stored in the so-called "user-item interactions matrix".

user_item_matrix

https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada

For example, a media service company Nextflix intends to develop a recommender system based on users' historical ratings of movies. A user-item interactions matrix can be simplified as follows:

Movie	The Godfather	Gone with the Wind	Green Book	The Big Short	Soul
Steven	5	1	4	4	2
Jessica	2	2	4	2	5
Tom	5	4	4	5	1
Beatrice	4	4	5	5	3

There are two categories of collaborative filtering RS:
- Memory-based approach:
  - Users and items are represented directly by their past interaction (large sparse vectors)
  - Similarity (measured by cosine similarity, Pearson's correlation, Euclidean distance, etc.) is the main concern to produce a recommendation list.
- Model-based approach
  - New presentations of users and items are build based on a model (usually based on some dimension reduction techniques, such as PCA, matrix factorization, factor analysis, etc.)
  - Using models (regression, neural nets, etc.) to find user ratings of unrated items.

0. Some Technical Details

0.1. Similarity Measures

Cosine similarity:

\[Cos(x, y)=\frac{\sum_{i=1}^{n} x_{i} y_{i}}{\sqrt{\sum_{i=1}^{n} x_{i}^{2}} \sqrt{\sum_{i=1}^{n} y_{i}^{2}}}\]

Jaccard Similarity:

\[J(X, Y)=\frac{|X \cap Y|}{|X \cup Y|}\]

Pearson's Correlation:

\[\rho = \frac{\sum\left(r_{ui} - \bar{r}_{u}\right) \left(r_{vi}-\bar{r}_{v}\right)}{\sqrt{\sum \left(r_{ui}-\bar{r}_{u}\right)}\sqrt{\sum \left(r_{vi}-\bar{r}_{v}\right)}}\]

Euclidean Distance

\[ d(X, Y)=\sqrt{\sum_{i=1}^{n}\left(x_{i}-y_{i}\right)^{2}}\]

Manhattan Distance

\[d(X, Y)=\sum_{i=1}^{n}\left|\left(x_{i }-y_{i }\right)\right|\]

Minkowski Distance

\[D(X, Y)=\left(\sum_{i=1}^{n}\left|x_{i}-y_{i}\right|^{p}\right)^{\frac{1}{p}}\]

0.2. Collaborative Filtering Algorithm

Assume that we use a linear regression model to predict user's rating as follows:

\[y^{(i, j)} = (\theta ^ {(j)})^{T} x ^{(i)} + u^{i, j}\]

where \(y^{(i,j)}\) represents the rating of user j giving on item i, \(x^{(i)}\) is the feature vector of item i, and \(\theta^{(j)}\) represents the weight vector of user j.

In this case, the collaborative filtering algorithm can be summarized in 3 steps:

Initialize \(x ^{(1)}, \ldots, x ^{(n_{m})}\) and \(\theta ^ {(1)}, \ldots, \theta ^ {(n_{u})}\) to small random values.
Minimize

\[ J(x ^{(1)}, \ldots, x ^{(n_{m})}, \theta ^ {(1)}, \ldots, \theta ^ {(n_{u})}) = \frac{1}{2} \sum_{(i,j): r(i,j)=1} ((\theta^{(j)})^{T} x ^{(i)})^{2} + \frac{\lambda}{2}\sum_{i=1} ^{n_{m}} \sum_{k=1} ^{n} (x_{k} ^{(i)}) ^{2} +\frac{\lambda}{2}\sum_{j=1} ^{n_{u}} \sum_{k=1} ^{n} (\theta_{k} ^{(j)}) ^{2}\]

using gradient descent (or an advanced optimization algorithm). E.g. for every \( j = 1, \ldots, n_{u}, i= 1, \ldots, n_{m} \):

\[ x_{k} ^{(i)} := x_{k} ^{(i)} - \alpha \left( \sum _ {j: r(i,j)=1}((\theta^{(j)})^{T} x ^{(i)} - y ^{(i, j)}) \theta_{k} ^{(j)} + \lambda x_{k} ^{(i)}\right)\]

\[ \theta_{k} ^{(j)} := \theta_{k} ^{(j)} - \alpha \left( \sum _ {j: r(i,j)=1}((\theta^{(j)})^{T} x ^{(i)} - y ^{(i, j)}) x_{k} ^{(i)} + \lambda \theta_{k} ^{(j)}\right)\]

For a user with weight vector \(\theta\) and a movie with (learned) feature vector \(x\), predict a star rating of \(\theta ^{T} x\).

Announcement: The content above is credited to many resources. All blogs on this website are study notes, not copyright publications.