What is a recommender system?

  • The recommender system produces a list of recommendations such as movies to see, products to buy, news to read, and so on.

Different Categories of Recommender Systems

  • Recommender systems can be divided into two main types: Content-based and Collaborative filter recommender systems.

  • Before plunging into the various concepts, let's show the relationship among different recommender systems.

  1. Content-Based Recommender System
  • A content-based recommender system builds a user profile based upon the previously rated items by the user. For example, a kindle user might give a five-star rating to a book if he really likes it. Similarly, a news-reader will finish reading an item of news if he really concerns about it. These kinds of processes may signify users' interest.

  • Content-based recommender systems acquire the recommendation idea from the past data of a user based on what items a user purchased, ranked, or paid attention to.

  • Let's take a news recommender for example. News has features like categories (Politics, Finance, Sports, Entertainments, etc.) or location (local, national, or international). Some Natural Language Processing techniques such as TF-IDF scores can be used to extract such features and feed them into recommender learners.

  • In this approach, the profile of each user as well as each item is created and vectors are created to represent them.

    • Item vector: A vector contains value 1 for words having high TF-IDF and 0 others.
    • User vector: A vector contains the user's numerical features.
  • Finally, we build a model, based on the Item vector and User vector, that explains the observed user-item interactions.

  1. Collaborative Filtering Method
  • Collaborative filtering uses the feedback of other users to recommend items. These systems evaluate the quality of an item based on peer review.

  • Collaborative filtering recommender systems are based on the past interactions recorded between users and items to produce new recommendations. These interactions are stored in the so-called "user-item interactions matrix".

https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada

  • For example, a media service company Nextflix intends to develop a recommender system based on users' historical ratings of movies. A user-item interactions matrix can be simplified as follows:
Movie The Godfather Gone with the Wind Green Book The Big Short Soul
Steven 5 1 4 4 2
Jessica 2 2 4 2 5
Tom 5 4 4 5 1
Beatrice 4 4 5 5 3
  • There are two categories of collaborative filtering RS:
    • Memory-based approach:
      • Users and items are represented directly by their past interaction (large sparse vectors)
      • Similarity (measured by cosine similarity, Pearson's correlation, Euclidean distance, etc.) is the main concern to produce a recommendation list.
    • Model-based approach
      • New presentations of users and items are build based on a model (usually based on some dimension reduction techniques, such as PCA, matrix factorization, factor analysis, etc.)
      • Using models (regression, neural nets, etc.) to find user ratings of unrated items.

0. Some Technical Details

0.1. Similarity Measures

  1. Cosine similarity:

\[Cos(x, y)=\frac{\sum_{i=1}^{n} x_{i} y_{i}}{\sqrt{\sum_{i=1}^{n} x_{i}^{2}} \sqrt{\sum_{i=1}^{n} y_{i}^{2}}}\]

  1. Jaccard Similarity:

\[J(X, Y)=\frac{|X \cap Y|}{|X \cup Y|}\]

  1. Pearson's Correlation:

\[\rho = \frac{\sum\left(r_{ui} - \bar{r}_{u}\right) \left(r_{vi}-\bar{r}_{v}\right)}{\sqrt{\sum \left(r_{ui}-\bar{r}_{u}\right)}\sqrt{\sum \left(r_{vi}-\bar{r}_{v}\right)}}\]

  1. Euclidean Distance

\[ d(X, Y)=\sqrt{\sum_{i=1}^{n}\left(x_{i}-y_{i}\right)^{2}}\]

  1. Manhattan Distance

\[d(X, Y)=\sum_{i=1}^{n}\left|\left(x_{i }-y_{i }\right)\right|\]

  1. Minkowski Distance

\[D(X, Y)=\left(\sum_{i=1}^{n}\left|x_{i}-y_{i}\right|^{p}\right)^{\frac{1}{p}}\]

0.2. Collaborative Filtering Algorithm

  • Assume that we use a linear regression model to predict user's rating as follows:

\[y^{(i, j)} = (\theta ^ {(j)})^{T} x ^{(i)} + u^{i, j}\]

where \(y^{(i,j)}\) represents the rating of user j giving on item i, \(x^{(i)}\) is the feature vector of item i, and \(\theta^{(j)}\) represents the weight vector of user j.

  • In this case, the collaborative filtering algorithm can be summarized in 3 steps:
  1. Initialize \(x ^{(1)}, \ldots, x ^{(n_{m})}\) and \(\theta ^ {(1)}, \ldots, \theta ^ {(n_{u})}\) to small random values.
  2. Minimize

\[ J(x ^{(1)}, \ldots, x ^{(n_{m})}, \theta ^ {(1)}, \ldots,  \theta ^ {(n_{u})}) = \frac{1}{2} \sum_{(i,j): r(i,j)=1} ((\theta^{(j)})^{T} x ^{(i)})^{2} + \frac{\lambda}{2}\sum_{i=1} ^{n_{m}} \sum_{k=1} ^{n} (x_{k} ^{(i)}) ^{2}  +\frac{\lambda}{2}\sum_{j=1} ^{n_{u}} \sum_{k=1} ^{n} (\theta_{k} ^{(j)}) ^{2}\]

using gradient descent (or an advanced optimization algorithm). E.g. for every \( j = 1, \ldots, n_{u}, i= 1, \ldots, n_{m} \):

\[ x_{k} ^{(i)} := x_{k} ^{(i)} - \alpha \left( \sum _ {j: r(i,j)=1}((\theta^{(j)})^{T} x ^{(i)} - y ^{(i, j)}) \theta_{k} ^{(j)} + \lambda x_{k} ^{(i)}\right)\]

\[ \theta_{k} ^{(j)} := \theta_{k} ^{(j)} - \alpha \left( \sum _ {j: r(i,j)=1}((\theta^{(j)})^{T} x ^{(i)} - y ^{(i, j)}) x_{k} ^{(i)} + \lambda \theta_{k} ^{(j)}\right)\]

  1. For a user with weight vector \(\theta\) and a movie with (learned) feature vector \(x\), predict a star rating of \(\theta ^{T} x\).

Announcement: The content above is credited to many resources. All blogs on this website are study notes, not copyright publications.