You're building a Netflix clone :material-netflix:. You have a dataset of movie reviews, where each review is a (user_id, movie_id, rating) triplet.
movie_idsare integers in the range[0, Nmovies)user_idsare integers in the range[0, Nusers)ratingsare integers in the range[1, 5]
import random
 
Nmovies = 10
Nusers = 10
Nreviews = 30
 
movie_ids = random.choices(range(Nmovies), k=Nreviews)
user_ids = random.choices(range(Nusers), k=Nreviews)
ratings = random.choices(range(1,6), k=Nreviews)
 
print(movie_ids)
# [4, 9, 8, 7, 3, ... ]
 
print(user_ids)
# [1, 7, 4, 1, 2, ... ]
 
print(ratings)
# [1, 3, 2, 1, 1, ... ]
  - Build a compressed sparse matrix where (i,j) gives the ith person's review of movie j.
 - Normalize the movie vectors (column vectors) so that each of them has unit length.
 - Calculate the Euclidean distance between normalized movie 2 and normalized movie 4.
 
For example
if our Netflix clone had three users and two movies with a review matrix like this
[[1 0]
 [0 1]
 [3 0]]
  The normalized movie vectors would be
[[0.32  0. ]
 [0.    1. ]
 [0.95  0. ]]
  The Euclidean distance between these two normalized movie vectors is 1.41.