Hello,
I was trying to implement a calculation of similarity between data distribution based on boxplots and related fivenumber summaries.
The distance is based on difference between Median, 1Q, 3Q, minimum, and Maximum across variables (see attached picture).
My issue is that i have hard trying working out a viable strategy to calculate the distance pairwise across n. variables.
Assuming I have a toy dataset as follows:
what would be way to work out the similarity index pairwise across the 3 variables?
Any pointer to the rght direction would be appreciated.
Thank you
Gm
EDIT
I came up with a code to calculate the distance between a pair of variables:
Now I am wondering how can use that to automatically work out the distance pairwisely across any number of variables at hand?
I was trying to implement a calculation of similarity between data distribution based on boxplots and related fivenumber summaries.
The distance is based on difference between Median, 1Q, 3Q, minimum, and Maximum across variables (see attached picture).
My issue is that i have hard trying working out a viable strategy to calculate the distance pairwise across n. variables.
Assuming I have a toy dataset as follows:
Clike:
a < rnorm(30, 30,5)
b < rnorm(30,40,5)
c < rnorm(30,50,5)
df < data.frame("a"=a, "b"=b, "c"=c)
Any pointer to the rght direction would be appreciated.
Thank you
Gm
EDIT
I came up with a code to calculate the distance between a pair of variables:
Clike:
get.distance < function(x,y) {
d < 0.5 * (abs(min(x)min(y)) + 2 * abs(quantile(x, probs=0.25)quantile(y, probs=0.25)) + 2 * abs(median(x)median(y)) + 2 * abs(quantile(x, probs=0.75)quantile(y, probs=0.75)) + abs(max(x)max(y)))
return(d)
}
Attachments

21 KB Views: 3
Last edited: