Data Processing#
First steps#
A list of individuals present in the local ancestry calls downloaded from 1KGP was first compiled.
Sex information for individuals was obtained from the file:
integrated_call_samples_v3.20130502.ALL.panel
available at:
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/.
We then identified individuals present in the local ancestry data but missing from this panel.
Individuals Not Included in the Panel#
The following command identifies individuals present in the ancestry calls but absent from the panel:
awk '
NR==FNR { seen[$1]=1; next }
!($1 in seen) { print $1 }
' integrated_call_samples_v3.20130502.ALL.panel ./ASW/TrioPhased/individuals.txt
Output:
NA19985
NA20322
NA20336
NA20341
NA20344
Output Formatting#
To format the list of individuals as a CSV string (e.g., for inclusion in a driver file):
awk '
NR==FNR { seen[$1]=1; next }
($1 in seen) { printf "\"%s\"," , $1 }
' 20140625_related_individuals.txt ./ASW/TrioPhased/individuals.txt
Male Individuals#
To extract male individuals in the same format:
awk '
NR==FNR { seen[$1]=1; next }
($1 in seen) { printf "\"%s\"," , $1 }
' ./males_ASW.panel ./ASW/TrioPhased/individuals_unrelated.txt