How Does Netflix Recommend K-Dramas For Me: Matrix Factorization

Introduction

Recommendation systems are one of the most fascinating applications of machine learning. Sites like YouTube, Viki Rakuten, and Netflix use them extensively to recommend those movies and videos that we may want to watch. In this blog we’re going to focus on following question “How do these work?”. One method they use a lot is called matrix factorization, and that’s what we are going to learn today.

Netflix Story

So, let’s get started. Let’s say there are four users Sarah Kim, Jason Park, Emily Lee and Daniel Choi and we also have five kdramas Crash Landing on You, Goblin, Descendants of the Sun, Itaewon Class, and Hotel Del Luna.

The users are going to assign ratings to each of those kdramas from one to five stars. And the point is we are going to predict these ratings. So let’s say that Sarah rated Hotel Del Luna with 4 out of 5, we are going to record that in a big table where the rows are the users and the columns are the dramas. We do that for every user in every drama and we get something like this.

In recommendation systems, everything revolves around the relationships, similarities or simply dependencies. These dependencies helps the algorithms to find out recommendations, predicting the ratings. Let’s take a simple example of how to use dependencies to get ratings.

Let’s take the above table as an example and let’s assume Emily hasn’t watched the “Itaewon Class” drama. So what do you think, how much would Emily would rate for Itaewon Class drama? If you take a closer look at the table, we can observe that Sarah and Emily have almost similar preferences, so we can assume that there is a high probability that Emily would also rate the drama 1 out of 5 stars.

Since this is an example table with only 4 users and 5 dramas, it’s easy to find dependencies with just having a peek of the table. However in real world scenarios with millions of users and dramas, it is more complicated to figure out all those dependencies. Sometimes the dependencies may not be direct. For instance, let’s say Jason Park likes romantic dramas, Emily likes comedy dramas and Daniel likes rom-com. So the ratings given by Daniel Choi is a result of addition of ratings given by Jason Park and Emily Lee. So the question is how do we figure out all these dependencies at the same time? And the answer is we use a concept called “Matrix Factorization”.

Matrix Factorization

Firstly, remember what factorization means? Factorization is like breaking down a larger number into smaller pieces. Like imagine we have 12, and we factor it into 4×3. Twelve is a big number but we express it as a product of small numbers. And that is what matrix factorization is also about. We have a large complicated matrix in the right and we’re going to express it as the product of two things like a and b.

So now the question arises how are going to find this a and b. The answer is simple, we should find features.

Features

We need to find a and b matrices whose product is the big one. And for that we are going to use features. The features of a drama can be anything from simple a simple feature like cast to a complicated feature like plot. But for now let’s keep it simple and say we have two features romance and comedy. They tell about how many comedy or romantic scenes it has. And now I’m going to introduce the dot product here, the dot product is a way to guess a rating based on how much a user likes comedy and romance in the dramas. So let’s say we figured out somehow that Sarah likes comedy and she doesn’t like romantic dramas, and now let’s say that we’ve managed to weight each drama based on the 2 features we have got: comedy and romance. For example, Hotel Del Luna has a weight of 4 in comedy and 1 in romance. so what we are going to do is we are going to do a dot product on the Sarah’s preference and the feature weights and we get the drama ratings.

And we repeat the process with every user and drama to get the rating table which we will use further for recommendations. If you notice the table, you can see that the ratings of Sarah and Emily are the same because they have similar preferences.

Storage

Now that we have learned to calculate the ratings of each movie and user preferences. The next question arises how do we store this matrix. When building software products, one of the most important factor we consider is the space used by the data. And as a software developers, we always try to find optimal way to store the data and reduce the additional costs as much as possible. If you see in our case, the space required for storing the rating matrix increases with increase in number of users and dramas. So basically as we scale the product the space complexity increases too.

One way is to, not store the ratings matrix itself, and store the users matrix and dramas matrix. Let’s say we have 1000 dramas and 100 features then the drama matrix have a total of 100k entries. And there are 2000 users and 100 features then we have 200k entries. So in total we will have 300k entries compared to 2M entries when storing the rating matrix itself. And when we need to access any particular location of rating matrix, we take the dot product of row from drama matrix and column from user matrix and we get the desired result.

Now comes the main question “How did we decide on the user’s preferences and features? How do we find the factorization?” While it seems obvious task for human beings, we can just have a glance and say person A likes comedy and person B likes action, but the computer doesn’t know this and it just has to figure out what the features are, what are the features that people like and what are the weights that people give to that feature. And this is not obvious for machines and that is where we need machine learning. so this is what we’re gonna answer now “How to find the right features and weights (finding factorization)?”

Finding Factorization

So let’s say the machine learning professional got a ratings matrix and the machine tries to learn what are the factor matrices for the ratings matrix. It constantly updates it’s factor matrices so that the product of those matrices would match the rating matrix. And in machine learning terms this process is called gradient descent.

Gradient Descent

First, we randomly start with a user preference matrix and a feature weights matrix. We find the dot product for each pair until we hopefully get a pretty good match with our original rating matrix. We start with random values by having a matrix with random values on the left for the user feature matrix, and a matrix with random values on top for the movie feature matrix. We no longer have comedy and romance; instead, we have two features, f1 and f2. The question is how we pick that — we try doing feature selection. For now, let’s try to find it with two features.

The product of these two matrices is the dot product, which we put here. To find these numbers, we take the corresponding elements and multiply them and add them all together.

For example, the dot product of the column 1 and row 1 is 1.44. We do this for every entry and obtain a matrix full of entries. It’s probably not the ones we want, so we compare it to the original matrix, the target matrix, to see how far we are. We compare every entry and see how we’re doing.

For instance, we compare the 1.44 on the top-left corner with this 3 over here. The computer comes with the 1.44 and says, “How am I doing?” We have a three and say, “Well, I guess you’re doing all right, but you’re not exactly at my 3. What do we need to do to get this 1.44 closer to the 3?” We need to increase the 1.44 by increasing these values by a little bit and these values by a little bit.

Then we move on to the next entry and repeat the same process. We do this repeatedly with some luck and a lot of computer power until we get to something pretty good. I didn’t say how much we increase or decrease the values; I just said a small value. To know the small value, we need to define an error function, a way to tell the computer how much to increase or decrease. We repeat this process until we get to something we want.

Error Function

Badly in machine learning it’s not enough to just tell it hey your matrices don’t work, we need to be able to quantify how badly the majors don’t work, like if we say the error is 6 then the computer knows a little more because it knows how to decrease the error function, this is the error function concept.

So simply how it works for our case is, the computer comes in with two matrices and we say hey you were wrong by 10.3 for example and the computer comes in with a slightly better matrices and we say you were wrong by 5.32 and then it comes with better matches and say you around by 1.23 and then maybe that’s small enough, so we’re happy to consider that as the answer. Now let’s see how to find this error and how to define an error function to tell the computer how far it is from the actual answer.

So let’s recall that this is where we want to get. So again let’s look at the same random numbers we had before and the entries that we get in the matrix. On the left we are going to compare this 1.44 with the 3. How do we compare 1.44 with a 3, well the difference is seems like a right place to go. 3 minus 1.44 or 1.44 minus 3? hard to tell which one is better because one is positive and other is negative. So actually to get rid of this ambiguity we are going to take the square of the difference so 3 minus 1.44 squared, that’s always going to be positive and that’s going to tell us how far these numbers are apart. So that’s it, that’s the difference, an error between the first two entries and then we move on to the second entry and we do the same thing we take the square the difference and so on and so forth.

So how do you decrease an error? well this is a common method known as gradient descent and I won’t get into much technical detail, here you decrease the error by taking the derivative of the error and walking in that direction. So what we want to do is, we want to find this derivative and the derivative is going to tell us by how much we should go up or down in order to decrease the error. Eventually we get the right numbers on the matrix on top on the matrix on the left in order to find the right factorization. So that’s pretty much it, that’s the matrix factorization algorithm now let’s look at predicting ratings.

Predict Ratings

For predicting ratings we first need the drama matrix, so this is how the Netflix matrix looks like. Actually not every user rates every drama so the Matrix is actually very sparse and it looks like this.

What we want to do is we want to fill in the blanks, so we take this matrix and somehow find a way to factor it. Something on top and something on the left and the way to factor is the same as before, we take the difference and use gradient descent and the difference is now just the difference of the squares of the entries that we have. We don’t take the entries that we don’t have but anyway once we are able to find these two matrices, then we can actually fill in the blanks by taking the dot products of rows and columns so we fill in the blanks like this. So these white ones are the predicted ratings for each user and movie that don’t exist yet and now what do we do? Well, for example Jason Park comes in and she hasn’t seen drama 1 and drama 3 then which one do we recommend? well it turns out that between drama 1 and drama 3, the highest predicted rating is drama 3 which is 5 so we recommend drama 3 to Jason Park and that is pretty much it.

Outro

Thank you very much as usual. I’ve tried my best to explain matrix factorization method for recommendation systems. I appreciate you taking the time to read this, and I hope I you enjoyed learning about it! In the future, I plan to post more posts on machine learning and computer vision. If you are interested in learning about computer vision models, check out my latest blog on architecture of vision language model explained in layman’s words.

Understanding Vision Language Model Architecture: From Iron Man to Reality

✍🏽 Check out my profile for more content like this.

🥇 Sign up for my email newsletter to receive updates on new posts! Have fun reading!

How Does Netflix Recommend K-Dramas For Me? Matrix Factorization was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.