Jump to content

Existing user? Sign In
Sign In

Remember me Not recommended on shared computers

Forgot your password?
Sign Up

Home

Mongo

Members

View Profile See their activity

Posts
1
Joined
November 19, 2014
Last visited
November 19, 2014

Profile Information

Application Season
2013 Spring
Program
Phd Information system

Mongo's Achievements

Decaf

Decaf (2/10)

0

Reputation

How to determine a data transformation/normalisation factor

Mongo posted a topic in Mathematics and Statistics

I'm working on transforming one set of data to another based on a certain variable (length). Here's how the actual problem is like: list1=['red', 'yellow','blue'] doc1=['yellow', 'green', red'] list2=['red', 'yellow','green', 'black','purple', 'brown'] doc2=['yellow','red','blue','grey','pink','pale','colours','indigo'] Jaccard similarity between list1 and doc1 gives a score of: 0.667 Jaccard similarity between list2 and doc2 gives a score of: 0.182 The first comparison has two overlaps (red and yellow), and has a higher score than the second comparison which has the same amount of overlaps. Hence the larger the size of the compared items, the smaller the similarity score and vice versa. My goal now is to determine a transformation/normalisation factor that will cancel out the effect of size difference and measure similarity based on actual overlap. Here's my attempt: I multiplied the similarity scores by the log of the average length of the compared items. first comparison average item length =3, final score == log(3) * 0.667=0.73277 second comparison average item length =7, final score == log(7) * 0.187=0.35416 Multiplying by item's length favours longer items, thus reducing the difference in scores that results from different sizes (length). However, my method didn't reduce the score margin enough, hence I'm looking for a method that will cancel out the effect of item sizes and focus on similarity based on overlaps. Any ideas?
- November 19, 2014

×

Browse
- Back
- Forums
- Staff
- Online Users
Activity
Results
Leaderboard
Terms
Privacy

×

Create New...

Important Information

This website uses cookies to ensure you get the best experience on our website. See our Privacy Policy and Terms of Use

I accept