Ever wondered how Amazon or Youtube knows what books, movies or products you will probably like? In this short example you will see a simple way to measure the similarity of taste between two person. This can help to propose new movies, books or products, which one of the two, doesn’t know yet.

First we create a dictionary of different of ratings from different people for various movies.

ratings={ 'Alan': {'Jurassic Parc': 1, 'Terminator': 5,'Gone with the wind': 2.5, 'Superman Returns': 5, 'Groundhog day': 2.5,'The notebook': 2.5}, 'Thomas': {'Jurassic Parc': 4, 'Terminator': 5,'Gone with the wind': 1.5, 'Superman Returns': 5.0, 'The notebook': 2.5,'Groundhog day': 5}, 'Michael': {'Jurassic Parc': 3, 'Terminator': 2.5,'Superman Returns': 5, 'The notebook': 4.0}, 'Hillary': {'Terminator': 5, 'Gone with the wind': 2.5,'The notebook': 4.5, 'Superman Returns': 4.0,'Groundhog day': 2.5}, 'Alex': {'Jurassic Parc': 2.5, 'Terminator': 4.0,'Gone with the wind': 2.0, 'Superman Returns': 2.5, 'The notebook': 2.5,'Groundhog day': 2.0}, 'Julian': {'Jurassic Parc': 1.5, 'Terminator': 3.0,'Gone with the wind': 2.0, 'Superman Returns': 2.5, 'The notebook': 2.5,'Groundhog day': 2.0}, 'Anna': {'Jurassic Parc': 2.5, 'Terminator': 4.0,'The notebook': 2.5, 'Superman Returns': 5.0, 'Groundhog day': 5}, 'Toby': {'Terminator':4.5,'Groundhog day':1.5,'Superman Returns':4.0, 'Gone with the wind': 2.5}} print ratings['Toby'] print print ratings['Toby']['Terminator']

In the end we print out all the ratings from Toby and next just his rating for the movie Terminator. This allows us to extract certain ratings and compare them with the ratings from other people. Now similar when calculating the distance between two points in space we can calculate the rating difference between two people. As an example we look at two points in a 2D space and calculate their difference.

from math import sqrt sqrt(pow(3-1,2)+pow(6-1,2))

This distance is also called the Euclidean distance. Next we translate the same thinking to a function calculating the distance between two persons within our created dictionary:

#Return a distance-based similarity score for person1 and person2 def distance(dictionary, personA, personB): si={} for item in dictionary[personA]: if item in dictionary[personB]: si[item]=1 if len(si) ==0: return 0 sum_of_squares = sum([pow(dictionary[personA][item]-dictionary[personB][item],2) for item in dictionary[personA] if item in dictionary[personB]]) return 1/(1+sum_of_squares) distance(ratings, 'Toby', 'Julian')

The function distance ask first for a dictionary, which is called ratings. Next, it asked for two names Toby and Julian, of whom the function will calculate the distance in a multidimensional space. First the function creates a dictionary of common movies of Julian and Toby. Then gives this smaller dictionary to calculate the difference between the tastes of Toby and Julian. The result is 1.67 which is smaller than for exmaple the difference in tastes between Toby and Alex, which is 0.25.

Whant to take in our own survey? Please follow this link.

## One thought to “Machine learning based on Euclidean distance in Python”