Hi,
I'm trying to perform ancestry estimation for a study sample of size one using 1000 genomes (g1k) as reference. That is, ancestry estimation for one individual. I have performed variant calling against GRCh38.
However, all guides I can find for doing this using PLINK & g1k assume that the study data contains more than one individual. Example of such guide is here. For example,it's not possible to prune for variants in high LD if the data contains only one individual (too few founders).
I'm not sure how to approach this, should I be merging the g1k data with the study sample?
Any guidance would be appreciated! thanks :)
i can't imagine the LD calculations derived from the query data would be helpful until you had several hundred individuals. most people use LD measurements from reference data, which is what
--indep-pairwise
will do anyway.