The 1000 Genomes Phase 3 reference data uses hg19 (GRCh37) coordinates, while our Uzbek genotyping data was called on hg38 (GRCh38). To merge the datasets, we must convert Uzbek coordinates to hg19.
3.1 Convert PED/MAP to Binary PLINK
cd /staging/ALSU-analysis/Fst_analysis
# Convert PED/MAP to binary format
plink --file uzbek_data --make-bed --out uzbek_data_hg38
650181 variants loaded from .bim file.
1199 people (0 males, 0 females, 1199 ambiguous) loaded from .fam.
Warning: Variant 1 (post-sort) triallelic; setting rarest alleles missing.
[multiple triallelic warnings omitted]
Total genotyping rate is 0.96741.
--make-bed to uzbek_data_hg38.bed + uzbek_data_hg38.bim + uzbek_data_hg38.fam ... done.
3.2 Run UCSC LiftOver
Extract BED-format positions from the BIM file and convert coordinates using the UCSC liftOver tool:
# Create BED file from BIM (chr, start, end, SNP_ID)
# BIM columns: $1=chr $2=snpID $3=cM $4=bp_position $5=A1 $6=A2
# BED format requires 0-based start → $4-1; end stays 1-based → $4
awk '{print "chr"$1, $4-1, $4, $2}' OFS='\t' uzbek_data_hg38.bim > uzbek_hg38_positions.bed
# liftOver args: input.bed chain_file mapped_output.bed unmapped_output.bed
liftOver uzbek_hg38_positions.bed hg38ToHg19.over.chain uzbek_hg19_lifted.bed unlifted.bed
# Create position update file (old_ID → new_position)
# BED $4=SNP_ID, $3=end coord (= 1-based position for single-bp SNPs)
awk '{print $4, $3}' uzbek_hg19_lifted.bed > update_positions.txt
# List of SNPs that successfully lifted
awk '{print $4}' uzbek_hg19_lifted.bed > snps_to_keep.txt
3.3 Apply New Coordinates
# Extract only liftable SNPs
plink --bfile uzbek_data_hg38 \
--extract snps_to_keep.txt \
--make-bed \
--out uzbek_data_hg19_temp
650181 variants loaded from .bim file.
1199 people loaded from .fam.
--extract: 647854 variants remaining.
--make-bed to uzbek_data_hg19_temp.bed + uzbek_data_hg19_temp.bim + uzbek_data_hg19_temp.fam ... done.
# --update-map FILE: 2-column file (SNP_ID new_bp_position)
# Replaces genomic coordinates in .bim for each matched variant
plink --bfile uzbek_data_hg19_temp \
--update-map update_positions.txt \
--make-bed \
--out uzbek_data_hg19
647854 variants loaded from .bim file.
1199 people loaded from .fam.
--update-map: 647854 values updated.
Warning: Base-pair positions are now unsorted!
--make-bed to uzbek_data_hg19.bed + uzbek_data_hg19.bim + uzbek_data_hg19.fam ... done.
Variants Lifted:
647,854 / 650,181 (99.6% success rate)
Variants Lost:
2,327 (unmappable between assemblies)