Linkage disequilibrium¶
-
allel.stats.ld.
rogers_huff_r
(gn, fill=nan)[source]¶ Estimate the linkage disequilibrium parameter r for each pair of variants using the method of Rogers and Huff (2008).
Parameters: gn : array_like, int8, shape (n_variants, n_samples)
Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).
Returns: r : ndarray, float, shape (n_variants * (n_variants - 1) // 2,)
Matrix in condensed form.
Examples
>>> import allel >>> g = allel.GenotypeArray([[[0, 0], [1, 1], [0, 0]], ... [[0, 0], [1, 1], [0, 0]], ... [[1, 1], [0, 0], [1, 1]], ... [[0, 0], [0, 1], [-1, -1]]], dtype='i1') >>> gn = g.to_n_alt(fill=-1) >>> gn array([[ 0, 2, 0], [ 0, 2, 0], [ 2, 0, 2], [ 0, 1, -1]], dtype=int8) >>> r = allel.stats.rogers_huff_r(gn) >>> r array([ 1. , -1.00000012, 1. , -1.00000012, 1. , -1. ], dtype=float32) >>> r ** 2 array([ 1. , 1.00000024, 1. , 1.00000024, 1. , 1. ], dtype=float32) >>> from scipy.spatial.distance import squareform >>> squareform(r ** 2) array([[ 0. , 1. , 1.00000024, 1. ], [ 1. , 0. , 1.00000024, 1. ], [ 1.00000024, 1.00000024, 0. , 1. ], [ 1. , 1. , 1. , 0. ]])
-
allel.stats.ld.
rogers_huff_r_between
(gna, gnb, fill=nan)[source]¶ Estimate the linkage disequilibrium parameter r for each pair of variants between the two input arrays, using the method of Rogers and Huff (2008).
Parameters: gna, gnb : array_like, int8, shape (n_variants, n_samples)
Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).
Returns: r : ndarray, float, shape (m_variants, n_variants )
Matrix in rectangular form.
-
allel.stats.ld.
windowed_r_squared
(pos, gn, size=None, start=None, stop=None, step=None, windows=None, fill=nan, percentile=50)[source]¶ Summarise linkage disequilibrium in windows over a single chromosome/contig.
Parameters: pos : array_like, int, shape (n_items,)
The item positions in ascending order, using 1-based coordinates..
gn : array_like, int8, shape (n_variants, n_samples)
Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).
size : int, optional
The window size (number of bases).
start : int, optional
The position at which to start (1-based).
stop : int, optional
The position at which to stop (1-based).
step : int, optional
The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.
windows : array_like, int, shape (n_windows, 2), optional
Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters.
fill : object, optional
The value to use where a window is empty, i.e., contains no items.
percentile : int or sequence of ints, optional
The percentile or percentiles to calculate within each window.
Returns: out : ndarray, shape (n_windows,)
The value of the statistic for each window.
windows : ndarray, int, shape (n_windows, 2)
The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates.
counts : ndarray, int, shape (n_windows,)
The number of items in each window.
Notes
Linkage disequilibrium (r**2) is calculated using the method of Rogers and Huff (2008).
-
allel.stats.ld.
locate_unlinked
(gn, size=100, step=20, threshold=0.1, chunked=False, blen=None)[source]¶ Locate variants in approximate linkage equilibrium, where r**2 is below the given threshold.
Parameters: gn : array_like, int8, shape (n_variants, n_samples)
Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).
size : int
Window size (number of variants).
step : int
Number of variants to advance to the next window.
threshold : float
Maximum value of r**2 to include variants.
blen : int, optional
Block length to use for chunked computation.
Returns: loc : ndarray, bool, shape (n_variants)
Boolean array where True items locate variants in approximate linkage equilibrium.
Notes
The value of r**2 between each pair of variants is calculated using the method of Rogers and Huff (2008).
-
allel.stats.ld.
plot_pairwise_ld
(m, colorbar=True, ax=None, imshow_kwargs=None)[source]¶ Plot a matrix of genotype linkage disequilibrium values between all pairs of variants.
Parameters: m : array_like
Array of linkage disequilibrium values in condensed form.
colorbar : bool, optional
If True, add a colorbar to the current figure.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
imshow_kwargs : dict-like, optional
Additional keyword arguments passed through to
matplotlib.pyplot.imshow()
.Returns: ax : axes
The axes on which the plot was drawn.