1. Overview
The Population Branch Statistic (PBS) measures allele frequency divergence along a specific
population lineage relative to two outgroup populations. By computing PBS for the Uzbek branch
of a three-population tree (UZB–EUR–EAS), we identify SNPs where the Uzbek population
has experienced unusually large frequency shifts — potential signatures of local adaptation or
genetic drift.
Spring 2026 PBS Results
79,767 SNPs analyzed (vs 77,111 in winter). Pop sizes: UZB=1,047, EUR=522, EAS=515, SAS=492, AFR=671.
| Metric | Spring 2026 | Winter 2025 |
| SNPs analyzed | 79,767 | 77,111 |
| Mean PBSUZB | −0.01001 | −0.00979 |
| Median PBSUZB | −0.00604 | −0.00601 |
| Stdev | 0.02677 | 0.02648 |
| Tier 1 (PBS≥0.3) | 8 | 8 |
| Tier 2 (significant) | 4,995 | 1 |
Spring vs Winter: Core PBS statistics are nearly identical. Same 8 Tier 1 SNPs detected.
Tier 2 difference reflects different tier criteria between runs (spring uses ΔAF≥0.3 threshold).
genetic drift.
Goal: Identify SNPs with elevated PBS scores (Uzbek-specific allele frequency
changes) and classify them by tier: high PBS (≥0.3), large absolute frequency difference
(ΔAF ≥0.3 vs all populations), or near-private alleles (UZB MAF ≥5%, all others ≤1%).
2. Prerequisites
| Source | File(s) | Description |
| Merged Dataset |
UZB_1kG_merged.{bed,bim,fam} |
3,595 samples × 77,111 LD-pruned SNPs (from Step 8) |
| Population Mapping |
pop_mapping.txt |
Sample-to-superpopulation assignments (from Step 8) |
Population Panel
| Population | Code | N | Source |
| Uzbek cohort | UZB | 1,047 | ALSU QC-passed set |
| European | EUR | 522 | 1000 Genomes Phase 3 |
| East Asian | EAS | 515 | 1000 Genomes Phase 3 |
| South Asian | SAS | 492 | 1000 Genomes Phase 3 |
| African | AFR | 671 | 1000 Genomes Phase 3 |
3. Pipeline
3.1 Build Population Cluster File
Create PLINK cluster assignments from the population mapping and the merged FAM file:
# Build PLINK cluster file: FID IID CLUSTER
awk 'NR==FNR {pop[$1]=$2; next} {
fid=$1; iid=$2;
if (iid in pop) p=pop[iid];
else if (fid in pop) p=pop[fid];
else p="UZB";
print fid, iid, p
}' pop_mapping.txt UZB_1kG_merged.fam > clusters.txt
# Verify population counts
awk '{print $3}' clusters.txt | sort | uniq -c | sort -rn
1047 UZB
671 AFR
522 EUR
515 EAS
492 SAS
348 AMR
3.2 Extract Per-Population Sample Lists
for POP in UZB EUR EAS SAS AFR; do
awk -v p="$POP" '$3==p {print $1, $2}' clusters.txt > keep_${POP}.txt
echo "$POP: $(wc -l < keep_${POP}.txt) samples"
done
3.3 Compute Per-Population Allele Frequencies
for POP in UZB EUR EAS SAS AFR; do
plink --bfile UZB_1kG_merged \
--keep keep_${POP}.txt \
--freq \
--out freq_${POP} \
--allow-no-sex --silent
done
3.4 Pairwise Per-SNP FST
Compute per-SNP Weir & Cockerham FST for the PBS triangle (UZB–EUR, UZB–EAS, EUR–EAS)
plus two additional pairs (UZB–SAS, UZB–AFR) for ΔAF context:
# Function to compute pairwise FST
compute_fst() {
POP1=$1; POP2=$2
cat keep_${POP1}.txt keep_${POP2}.txt > keep_${POP1}_${POP2}.txt
awk -v p="$POP1" '{print $1, $2, p}' keep_${POP1}.txt > within_${POP1}_${POP2}.txt
awk -v p="$POP2" '{print $1, $2, p}' keep_${POP2}.txt >> within_${POP1}_${POP2}.txt
plink --bfile UZB_1kG_merged \
--keep keep_${POP1}_${POP2}.txt \
--fst \
--within within_${POP1}_${POP2}.txt \
--out fst_${POP1}_${POP2} \
--allow-no-sex --silent
}
# PBS triangle
compute_fst UZB EUR
compute_fst UZB EAS
compute_fst EUR EAS
# Extra pairs for delta-AF context
compute_fst UZB SAS
compute_fst UZB AFR
3.5 PBS Calculation (Python)
PBS is derived from pairwise FST values by converting each to a divergence time
T = −ln(1 − FST), then computing the branch length for the target (UZB) population:
PBS formula:
PBSUZB = (TUZB-EUR + TUZB-EAS − TEUR-EAS) / 2
# Core PBS computation (from 02_calculate_pbs.py)
import math
def fst_to_T(fst_val):
"""Convert FST to divergence time, capping FST at 0.999"""
f = max(0.0, min(fst_val, 0.999))
return -math.log(1.0 - f)
# For each SNP present in all three FST files:
T_ue = fst_to_T(fst_UZB_EUR[snp])
T_ua = fst_to_T(fst_UZB_EAS[snp])
T_ea = fst_to_T(fst_EUR_EAS[snp])
pbs_uzb = (T_ue + T_ua - T_ea) / 2.0
# Tier classification
tier1 = pbs_uzb >= 0.3 # Strong PBS
tier2 = min_delta_af >= 0.3 # High delta-AF vs ALL populations
tier3 = (maf_uzb >= 0.05 and # Near-private: common in UZB,
all(maf_other <= 0.01)) # rare everywhere else
python3 02_calculate_pbs.py --outdir ./pbs_results
Loading Fst data...
UZB-EUR: 77,111 SNPs
UZB-EAS: 77,111 SNPs
EUR-EAS: 77,111 SNPs
Computing PBS...
Total SNPs analyzed: 77,111
=== PBS SUMMARY ===
n_snps: 77111
mean: -0.009794
median: -0.006012
stdev: 0.026484
min: -0.362111
max: 2.988450
n_pbs_ge_03: 8
n_pbs_ge_01: 13
n_negative: 63414
n_tier1: 8
n_tier2: 1
n_tier3: 0
n_candidates: 8
4. Results
4.1 Pairwise FST (Weighted)
| Comparison | Weighted FST | Mean FST |
| UZB vs EUR | 0.01448 | 0.00997 |
| UZB vs EAS | 0.03929 | 0.02357 |
| EUR vs EAS | 0.08448 | 0.05023 |
| UZB vs SAS | 0.01441 | 0.00980 |
| UZB vs AFR | 0.12930 | 0.06096 |
Interpretation: UZB is genetically closest to EUR (F
ST=0.014) and SAS (F
ST=0.014),
moderately distant from EAS (0.039), and most distant from AFR (0.129). This is consistent with the PCA
positioning (
Step 8) showing Uzbeks intermediate between EUR and SAS.
4.2 PBS Summary Statistics
| Metric | Value |
| SNPs analyzed | 77,111 |
| Mean PBSUZB | −0.00979 |
| Median PBSUZB | −0.00601 |
| Standard deviation | 0.02648 |
| Min | −0.36211 |
| Max | 2.98845 |
| PBS ≥ 0.3 (Tier 1) | 8 SNPs |
| PBS ≥ 0.1 | 13 SNPs |
| Negative PBS | 63,414 (82.2%) |
4.3 Top PBS Candidates
| # | SNP | Chr | Position | PBSUZB |
MAFUZB | MAFEUR | MAFEAS | MAFSAS | MAFAFR |
Observation: The top 3 SNPs (all on chromosome 12) show extremely high PBS values
(>2.5) with very low UZB MAF (~2–3%) compared to high AFR MAF (~49%). These likely reflect
ancestral allele retention rather than positive selection. The more biologically interesting candidates
are SNPs 5–8 (PBS 0.32–0.53), where UZB shows high MAF (~48%) diverging from other populations.
5. Output Files
| File | Description |
clusters.txt | Population cluster file (FID IID POP) |
freq_{UZB,EUR,EAS,SAS,AFR}.frq | Per-population allele frequencies |
fst_{POP1}_{POP2}.fst | Per-SNP pairwise FST (5 pairs) |
pbs_all.tsv | PBS scores for all 77,111 SNPs |
pbs_candidates.json | Filtered candidate SNPs (Tier 1/2/3) |
pbs_stats.json | Summary statistics |
pbs_histogram.json | PBS distribution for plotting |
6. Next Steps