rdist
provide a common framework to calculate distances. There are three main functions:
rdist
computes the pairwise distances between observations in one matrix and returns a dist
object,
pdist
computes the pairwise distances between observations in one matrix and returns a matrix
, and
cdist
computes the distances between observations in two matrices and returns a matrix
.
In particular the cdist
function is often missing in other distance functions. All
calculations involving NA
values will consistently return NA
.
rdist(X, metric = "euclidean", p = 2L) pdist(X, metric = "euclidean", p = 2) cdist(X, Y, metric = "euclidean", p = 2)
X, Y | A matrix |
---|---|
metric | The distance metric to use |
p | The power of the Minkowski distance |
Available distance measures are (written for two vectors v and w):
"euclidean"
: \(\sqrt{\sum_i(v_i - w_i)^2}\)
"minkowski"
: \((\sum_i|v_i - w_i|^p)^{1/p}\)
"manhattan"
: \(\sum_i(|v_i-w_i|)\)
"maximum"
or "chebyshev"
: \(\max_i(|v_i-w_i|)\)
"canberra"
: \(\sum_i(\frac{|v_i-w_i|}{|v_i|+|w_i|})\)
"angular"
: \(\cos^{-1}(cor(v, w))\)
"correlation"
: \(\sqrt{\frac{1-cor(v, w)}{2}}\)
"absolute_correlation"
: \(\sqrt{1-|cor(v, w)|^2}\)
"hamming"
: \((\sum_i v_i \neq w_i) / \sum_i 1\)
"jaccard"
: \((\sum_i v_i \neq w_i) / \sum_i 1_{v_i \neq 0 \cup w_i \neq 0}\)
Any function that defines a distance between two vectors.