Pairwise distance and ordination¶
-
allel.
pairwise_distance
(x, metric, chunked=False, blen=None)[source]¶ Compute pairwise distance between individuals (e.g., samples or haplotypes).
Parameters: x : array_like, shape (n, m, …)
Array of m observations (e.g., samples or haplotypes) in a space with n dimensions (e.g., variants). Note that the order of the first two dimensions is swapped compared to what is expected by scipy.spatial.distance.pdist.
metric : string or function
Distance metric. See documentation for the function
scipy.spatial.distance.pdist()
for a list of built-in distance metrics.chunked : bool, optional
If True, use a block-wise implementation to avoid loading the entire input array into memory. This means that a distance matrix will be calculated for each block of the input array, and the results will be summed to produce the final output. For some distance metrics this will return a different result from the standard implementation.
blen : int, optional
Block length to use for chunked implementation.
Returns: dist : ndarray, shape (m * (m - 1) / 2,)
Distance matrix in condensed form.
Examples
>>> import allel >>> g = allel.GenotypeArray([[[0, 0], [0, 1], [1, 1]], ... [[0, 1], [1, 1], [1, 2]], ... [[0, 2], [2, 2], [-1, -1]]]) >>> d = allel.stats.pairwise_distance(g.to_n_alt(), metric='cityblock') >>> d array([ 3., 4., 3.]) >>> import scipy.spatial >>> scipy.spatial.distance.squareform(d) array([[ 0., 3., 4.], [ 3., 0., 3.], [ 4., 3., 0.]])
-
allel.
plot_pairwise_distance
(dist, labels=None, colorbar=True, ax=None, imshow_kwargs=None)[source]¶ Plot a pairwise distance matrix.
Parameters: dist : array_like
The distance matrix in condensed form.
labels : sequence of strings, optional
Sample labels for the axes.
colorbar : bool, optional
If True, add a colorbar to the current figure.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
imshow_kwargs : dict-like, optional
Additional keyword arguments passed through to
matplotlib.pyplot.imshow()
.Returns: ax : axes
The axes on which the plot was drawn
-
allel.
pairwise_dxy
(pos, gac, start=None, stop=None, is_accessible=None)[source]¶ Convenience function to calculate a pairwise distance matrix using nucleotide divergence (a.k.a. Dxy) as the distance metric.
Parameters: pos : array_like, int, shape (n_variants,)
Variant positions.
gac : array_like, int, shape (n_variants, n_samples, n_alleles)
Per-genotype allele counts.
start : int, optional
Start position of region to use.
stop : int, optional
Stop position of region to use.
is_accessible : array_like, bool, shape (len(contig),), optional
Boolean array indicating accessibility status for all positions in the chromosome/contig.
Returns: dist : ndarray
Distance matrix in condensed form.
See also
allel.model.ndarray.GenotypeArray.to_allele_counts
-
allel.
pcoa
(dist)[source]¶ Perform principal coordinate analysis of a distance matrix, a.k.a. classical multi-dimensional scaling.
Parameters: dist : array_like
Distance matrix in condensed form.
Returns: coords : ndarray, shape (n_samples, n_dimensions)
Transformed coordinates for the samples.
explained_ratio : ndarray, shape (n_dimensions)
Variance explained by each dimension.
-
allel.
condensed_coords
(i, j, n)[source]¶ Transform square distance matrix coordinates to the corresponding index into a condensed, 1D form of the matrix.
Parameters: i : int
Row index.
j : int
Column index.
n : int
Size of the square matrix (length of first or second dimension).
Returns: ix : int
-
allel.
condensed_coords_within
(pop, n)[source]¶ Return indices into a condensed distance matrix for all pairwise comparisons within the given population.
Parameters: pop : array_like, int
Indices of samples or haplotypes within the population.
n : int
Size of the square matrix (length of first or second dimension).
Returns: indices : ndarray, int
-
allel.
condensed_coords_between
(pop1, pop2, n)[source]¶ Return indices into a condensed distance matrix for all pairwise comparisons between two populations.
Parameters: pop1 : array_like, int
Indices of samples or haplotypes within the first population.
pop2 : array_like, int
Indices of samples or haplotypes within the second population.
n : int
Size of the square matrix (length of first or second dimension).
Returns: indices : ndarray, int