# calculation of distance using Kullback–Leibler divergence

## Recommended Posts

Hi,

Basically I'm from biology back ground. So it may be a basic question for you. I have two plots for example

plot1:


X - axis values:
535.255111, 536.258228, 537.26097, 538.26361, 539.266194, 540.268735
Y-axis values:
0.7, 0.23474151, 1, 0.00980891, 0.00116291, 0.0001162
Plot 2:
X - axis values:
535.255111, 536.258228, 537.26097, 538.26361, 539.266194, 540.268735
Y-axis values:
1, 0.33474151, 0.06663174, 0.00980891, 0.00116291, 0.0001162
[/HTML]

Now I need to design a scoring function which should be symmetric and also 2D (score should be based on x - axis and also on y- axis values) which results in displaying how good these two plots fit.

I thought of using Kullback–Leibler divergence for Gaussian.

Here what I intended to do is to draw a gaussian curve for each point (1st value of plot 1 and 1st value of plot 2) individually in the two plots and calculate their over lap, and finally sum all the overlaps which we get from each point in the plots.

If the overlap is perfect on x- axis and also on y - axis then fit should be 1 or 100% and as there if there is some changes in x- axis or y- axis then the fit should be less than that. If the y -axis values are same and there is a significant change in the x- axis then the score should be near to "zero" as the overlap will be much less.

I hope I'm clear in presenting the Idea. It would be helpful if anyone provide me the implementation of the formula according to the problem as I'm dumbo in maths.

##### Share on other sites

Hmm. I'm not 100% sure this is on topic, but anyway.

Are your points always going to be discrete like this, or are these samples from some continuous functions? In the latter case any norm between the functions will work.

If you have a finite set of points and the first point from sample one (X_1,Y_1) is always to be compared with the first point from sample two (A_1,B_1), etc. why not just do a sum of squares \sum_{i=1}^n (X_i-A_i)^2 + (Y_i-B_i)^2

I don't think using a KL divergence is the right approach.

##### Share on other sites

my points will always be discrete.

Will this solution works for discrete too?

This sum of square I think depends on the number of points which you cannot predict. Because I need a score of "1" for best fit and should reduce and becomes zero if it is very bad fit. This should be symmetric and irrespective of number of points.

## Create an account

Register a new account

×