rdist: an R package for distances

rdist provide a common framework to calculate distances. There are three main functions:

rdist computes the pairwise distances between observations in one matrix and returns a dist object,
pdist computes the pairwise distances between observations in one matrix and returns a matrix, and
cdist computes the distances between observations in two matrices and returns a matrix.

In particular the cdist function is often missing in other distance functions. All calculations involving NA values will consistently return NA.

rdist(X, metric = "euclidean", p = 2L)

pdist(X, metric = "euclidean", p = 2)

cdist(X, Y, metric = "euclidean", p = 2)

Arguments

X, Y	A matrix
metric	The distance metric to use
p	The power of the Minkowski distance

Details

Available distance measures are (written for two vectors v and w):

"euclidean": \(\sqrt{\sum_i(v_i - w_i)^2}\)
"minkowski": \((\sum_i|v_i - w_i|^p)^{1/p}\)
"manhattan": \(\sum_i(|v_i-w_i|)\)
"maximum" or "chebyshev": \(\max_i(|v_i-w_i|)\)
"canberra": \(\sum_i(\frac{|v_i-w_i|}{|v_i|+|w_i|})\)
"angular": \(\cos^{-1}(cor(v, w))\)
"correlation": \(\sqrt{\frac{1-cor(v, w)}{2}}\)
"absolute_correlation": \(\sqrt{1-|cor(v, w)|^2}\)
"hamming": \((\sum_i v_i \neq w_i) / \sum_i 1\)
"jaccard": \((\sum_i v_i \neq w_i) / \sum_i 1_{v_i \neq 0 \cup w_i \neq 0}\)
Any function that defines a distance between two vectors.

rdist: an R package for distances

Arguments

Details

Contents