F-statistics¶
-
allel.stats.fst.
weir_cockerham_fst
(g, subpops, max_allele=None, blen=None)[source]¶ Compute the variance components from the analyses of variance of allele frequencies according to Weir and Cockerham (1984).
Parameters: g : array_like, int, shape (n_variants, n_samples, ploidy)
Genotype array.
subpops : sequence of sequences of ints
Sample indices for each subpopulation.
max_allele : int, optional
The highest allele index to consider.
blen : int, optional
Block length to use for chunked computation.
Returns: a : ndarray, float, shape (n_variants, n_alleles)
Component of variance between populations.
b : ndarray, float, shape (n_variants, n_alleles)
Component of variance between individuals within populations.
c : ndarray, float, shape (n_variants, n_alleles)
Component of variance between gametes within individuals.
Examples
Calculate variance components from some genotype data:
>>> import allel >>> g = [[[0, 0], [0, 0], [1, 1], [1, 1]], ... [[0, 1], [0, 1], [0, 1], [0, 1]], ... [[0, 0], [0, 0], [0, 0], [0, 0]], ... [[0, 1], [1, 2], [1, 1], [2, 2]], ... [[0, 0], [1, 1], [0, 1], [-1, -1]]] >>> subpops = [[0, 1], [2, 3]] >>> a, b, c = allel.stats.weir_cockerham_fst(g, subpops) >>> a array([[ 0.5 , 0.5 , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , -0.125, -0.125], [-0.375, -0.375, 0. ]]) >>> b array([[ 0. , 0. , 0. ], [-0.25 , -0.25 , 0. ], [ 0. , 0. , 0. ], [ 0. , 0.125 , 0.25 ], [ 0.41666667, 0.41666667, 0. ]]) >>> c array([[ 0. , 0. , 0. ], [ 0.5 , 0.5 , 0. ], [ 0. , 0. , 0. ], [ 0.125 , 0.25 , 0.125 ], [ 0.16666667, 0.16666667, 0. ]])
Estimate the parameter theta (a.k.a., Fst) for each variant and each allele individually:
>>> fst = a / (a + b + c) >>> fst array([[ 1. , 1. , nan], [ 0. , 0. , nan], [ nan, nan, nan], [ 0. , -0.5, -0.5], [-1.8, -1.8, nan]])
Estimate Fst for each variant individually (averaging over alleles):
>>> fst = (np.sum(a, axis=1) / ... (np.sum(a, axis=1) + np.sum(b, axis=1) + np.sum(c, axis=1))) >>> fst array([ 1. , 0. , nan, -0.4, -1.8])
Estimate Fst averaging over all variants and alleles:
>>> fst = np.sum(a) / (np.sum(a) + np.sum(b) + np.sum(c)) >>> fst -4.3680905886891398e-17
Note that estimated Fst values may be negative.
-
allel.stats.fst.
hudson_fst
(ac1, ac2, fill=nan)[source]¶ Calculate the numerator and denominator for Fst estimation using the method of Hudson (1992) elaborated by Bhatia et al. (2013).
Parameters: ac1 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the first population.
ac2 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the second population.
fill : float
Use this value where there are no pairs to compare (e.g., all allele calls are missing).
Returns: num : ndarray, float, shape (n_variants,)
Divergence between the two populations minus average of diversity within each population.
den : ndarray, float, shape (n_variants,)
Divergence between the two populations.
Examples
Calculate numerator and denominator for Fst estimation:
>>> import allel >>> g = allel.GenotypeArray([[[0, 0], [0, 0], [1, 1], [1, 1]], ... [[0, 1], [0, 1], [0, 1], [0, 1]], ... [[0, 0], [0, 0], [0, 0], [0, 0]], ... [[0, 1], [1, 2], [1, 1], [2, 2]], ... [[0, 0], [1, 1], [0, 1], [-1, -1]]]) >>> subpops = [[0, 1], [2, 3]] >>> ac1 = g.count_alleles(subpop=subpops[0]) >>> ac2 = g.count_alleles(subpop=subpops[1]) >>> num, den = allel.stats.hudson_fst(ac1, ac2) >>> num array([ 1. , -0.16666667, 0. , -0.125 , -0.33333333]) >>> den array([ 1. , 0.5 , 0. , 0.625, 0.5 ])
Estimate Fst for each variant individually:
>>> fst = num / den >>> fst array([ 1. , -0.33333333, nan, -0.2 , -0.66666667])
Estimate Fst averaging over variants:
>>> fst = np.sum(num) / np.sum(den) >>> fst 0.1428571428571429
-
allel.stats.fst.
patterson_fst
(aca, acb)[source]¶ Estimator of differentiation between populations A and B based on the F2 parameter.
Parameters: aca : array_like, int, shape (n_variants, 2)
Allele counts for population A.
acb : array_like, int, shape (n_variants, 2)
Allele counts for population B.
Returns: num : ndarray, shape (n_variants,), float
Numerator.
den : ndarray, shape (n_variants,), float
Denominator.
Notes
See Patterson (2012), Appendix A.
TODO check if this is numerically equivalent to Hudson’s estimator.
-
allel.stats.fst.
windowed_weir_cockerham_fst
(pos, g, subpops, size=None, start=None, stop=None, step=None, windows=None, fill=nan, max_allele=None)[source]¶ Estimate average Fst in windows over a single chromosome/contig, following the method of Weir and Cockerham (1984).
Parameters: pos : array_like, int, shape (n_items,)
Variant positions, using 1-based coordinates, in ascending order.
g : array_like, int, shape (n_variants, n_samples, ploidy)
Genotype array.
subpops : sequence of sequences of ints
Sample indices for each subpopulation.
size : int
The window size (number of bases).
start : int, optional
The position at which to start (1-based).
stop : int, optional
The position at which to stop (1-based).
step : int, optional
The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.
windows : array_like, int, shape (n_windows, 2), optional
Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters.
fill : object, optional
The value to use where there are no variants within a window.
max_allele : int, optional
The highest allele index to consider.
Returns: fst : ndarray, float, shape (n_windows,)
Average Fst in each window.
windows : ndarray, int, shape (n_windows, 2)
The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates.
counts : ndarray, int, shape (n_windows,)
Number of variants in each window.
-
allel.stats.fst.
windowed_hudson_fst
(pos, ac1, ac2, size=None, start=None, stop=None, step=None, windows=None, fill=nan)[source]¶ Estimate average Fst in windows over a single chromosome/contig, following the method of Hudson (1992) elaborated by Bhatia et al. (2013).
Parameters: pos : array_like, int, shape (n_items,)
Variant positions, using 1-based coordinates, in ascending order.
ac1 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the first population.
ac2 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the second population.
size : int, optional
The window size (number of bases).
start : int, optional
The position at which to start (1-based).
stop : int, optional
The position at which to stop (1-based).
step : int, optional
The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.
windows : array_like, int, shape (n_windows, 2), optional
Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters.
fill : object, optional
The value to use where there are no variants within a window.
Returns: fst : ndarray, float, shape (n_windows,)
Average Fst in each window.
windows : ndarray, int, shape (n_windows, 2)
The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates.
counts : ndarray, int, shape (n_windows,)
Number of variants in each window.
-
allel.stats.fst.
windowed_patterson_fst
(pos, ac1, ac2, size=None, start=None, stop=None, step=None, windows=None, fill=nan)[source]¶ Estimate average Fst in windows over a single chromosome/contig, following the method of Patterson (2012).
Parameters: pos : array_like, int, shape (n_items,)
Variant positions, using 1-based coordinates, in ascending order.
ac1 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the first population.
ac2 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the second population.
size : int, optional
The window size (number of bases).
start : int, optional
The position at which to start (1-based).
stop : int, optional
The position at which to stop (1-based).
step : int, optional
The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.
windows : array_like, int, shape (n_windows, 2), optional
Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters.
fill : object, optional
The value to use where there are no variants within a window.
Returns: fst : ndarray, float, shape (n_windows,)
Average Fst in each window.
windows : ndarray, int, shape (n_windows, 2)
The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates.
counts : ndarray, int, shape (n_windows,)
Number of variants in each window.
-
allel.stats.fst.
blockwise_weir_cockerham_fst
(g, subpops, blen, max_allele=None)[source]¶ Estimate average Fst and standard error using the block-jackknife.
Parameters: g : array_like, int, shape (n_variants, n_samples, ploidy)
Genotype array.
subpops : sequence of sequences of ints
Sample indices for each subpopulation.
blen : int
Block size (number of variants).
max_allele : int, optional
The highest allele index to consider.
Returns: fst : float
Estimated value of the statistic using all data.
se : float
Estimated standard error.
vb : ndarray, float, shape (n_blocks,)
Value of the statistic in each block.
vj : ndarray, float, shape (n_blocks,)
Values of the statistic from block-jackknife resampling.
-
allel.stats.fst.
blockwise_hudson_fst
(ac1, ac2, blen)[source]¶ Estimate average Fst between two populations and standard error using the block-jackknife.
Parameters: ac1 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the first population.
ac2 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the second population.
blen : int
Block size (number of variants).
Returns: fst : float
Estimated value of the statistic using all data.
se : float
Estimated standard error.
vb : ndarray, float, shape (n_blocks,)
Value of the statistic in each block.
vj : ndarray, float, shape (n_blocks,)
Values of the statistic from block-jackknife resampling.
-
allel.stats.fst.
blockwise_patterson_fst
(ac1, ac2, blen)[source]¶ Estimate average Fst between two populations and standard error using the block-jackknife.
Parameters: ac1 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the first population.
ac2 : array_like, int, shape (n_variants, n_alleles)
Allele counts array from the second population.
blen : int
Block size (number of variants).
Returns: fst : float
Estimated value of the statistic using all data.
se : float
Estimated standard error.
vb : ndarray, float, shape (n_blocks,)
Value of the statistic in each block.
vj : ndarray, float, shape (n_blocks,)
Values of the statistic from block-jackknife resampling.