Principal components analysis¶
-
allel.
pca
(gn, n_components=10, copy=True, scaler=’patterson’, ploidy=2)[source]¶ Perform principal components analysis of genotype data, via singular value decomposition.
Parameters: gn : array_like, float, shape (n_variants, n_samples)
Genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).
n_components : int, optional
Number of components to keep.
copy : bool, optional
If False, data passed to fit are overwritten.
scaler : {‘patterson’, ‘standard’, None}
Scaling method; ‘patterson’ applies the method of Patterson et al 2006; ‘standard’ scales to unit variance; None centers the data only.
ploidy : int, optional
Sample ploidy, only relevant if ‘patterson’ scaler is used.
Returns: coords : ndarray, float, shape (n_samples, n_components)
Transformed coordinates for the samples.
model : GenotypePCA
Model instance containing the variance ratio explained and the stored components (a.k.a., loadings). Can be used to project further data into the same principal components space via the transform() method.
See also
randomized_pca
,allel.stats.ld.locate_unlinked
Notes
Genotype data should be filtered prior to using this function to remove variants in linkage disequilibrium.
-
allel.
randomized_pca
(gn, n_components=10, copy=True, iterated_power=3, random_state=None, scaler=’patterson’, ploidy=2)[source]¶ Perform principal components analysis of genotype data, via an approximate truncated singular value decomposition using randomization to speed up the computation.
Parameters: gn : array_like, float, shape (n_variants, n_samples)
Genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).
n_components : int, optional
Number of components to keep.
copy : bool, optional
If False, data passed to fit are overwritten.
iterated_power : int, optional
Number of iterations for the power method.
random_state : int or RandomState instance or None (default)
Pseudo Random Number generator seed control. If None, use the numpy.random singleton.
scaler : {‘patterson’, ‘standard’, None}
Scaling method; ‘patterson’ applies the method of Patterson et al 2006; ‘standard’ scales to unit variance; None centers the data only.
ploidy : int, optional
Sample ploidy, only relevant if ‘patterson’ scaler is used.
Returns: coords : ndarray, float, shape (n_samples, n_components)
Transformed coordinates for the samples.
model : GenotypeRandomizedPCA
Model instance containing the variance ratio explained and the stored components (a.k.a., loadings). Can be used to project further data into the same principal components space via the transform() method.
See also
pca
,allel.stats.ld.locate_unlinked
Notes
Genotype data should be filtered prior to using this function to remove variants in linkage disequilibrium.
Based on the
sklearn.decomposition.RandomizedPCA
implementation.