Step 15: Runs of Homozygosity & IBD
Autozygosity mapping and identity-by-descent analysis — consanguinity, founder effects, and cryptic relatedness in the Uzbek cohort
🧬 Step 15 of 15 • 1,047 samples • 5,405,898 SNPs 📅 March 20261. Overview
Runs of Homozygosity (ROH) are contiguous stretches of homozygous genotypes that arise when an individual inherits two copies of the same ancestral haplotype — a hallmark of parental relatedness (consanguinity) or population-level founder effects. Identity-by-Descent (IBD) analysis detects pairs of individuals sharing long haplotype segments, revealing cryptic relatedness not apparent from pedigree records.
Both analyses are critical for understanding the Uzbek population's demographic history and for designing association studies (e.g., pregnancy loss GWAS) where cryptic relatedness inflates test statistics if unaccounted for.
2. ROH Summary Statistics
PLINK 1.9 --homozyg was run on the Uzbek post-QC dataset (1,047 × 5,405,898 SNPs)
with parameters tuned for detecting long ROH segments reflecting recent consanguinity:
Parameter rationale
| Parameter | Value | Meaning |
|---|---|---|
--homozyg-kb | 1000 | Minimum ROH length 1,000 kb — focuses on long ROH reflecting recent consanguinity |
--homozyg-snp | 50 | Minimum 50 SNPs per ROH — avoids sparse-coverage artefacts |
--homozyg-window-snp | 50 | Scanning window of 50 SNPs |
--homozyg-density | 50 | Max 50 kb/SNP density — ensures ROH are in SNP-dense regions |
--homozyg-gap | 1000 | Max 1 Mb gap between consecutive SNPs within a ROH |
--homozyg-window-het | 1 | Max 1 heterozygous call per scanning window (allows for genotyping error) |
--homozyg-window-missing | 5 | Max 5 missing genotypes per window before excluding that region |
--homozyg-window-threshold | 0.05 | Min proportion (5%) of overlapping homozygous windows to call a ROH |
Distribution of ROH per individual
| Statistic | ROH Count | Total ROH (Mb) | FROH |
|---|---|---|---|
| Minimum | 15 | 22.07 | 0.0077 |
| 25th percentile | 27 | 37.08 | 0.0129 |
| Median | 31 | 42.55 | 0.0148 |
| Mean | 35.1 | 54.19 | 0.0188 |
| 75th percentile | 37 | 52.77 | 0.0183 |
| Maximum | 145 | 320.20 | 0.1111 |
Extreme individuals
🔴 Highest FROH
| Individual | NROH | Total Mb | FROH |
|---|---|---|---|
| 705_14-79m | 145 | 320.20 | 0.1111 |
| 548_14-20m | 123 | 299.87 | 0.1041 |
| 629_03-160 | 128 | 298.83 | 0.1037 |
| 70_02-35 | 121 | 280.22 | 0.0973 |
| 927_13-118 | 114 | 274.11 | 0.0951 |
FROH > 0.0625 corresponds to parents who are at least 1st cousins.
🟢 Lowest FROH
| Individual | NROH | Total Mb | FROH |
|---|---|---|---|
| 904_09-129 | 16 | 22.07 | 0.0077 |
| 463_08-442 | 19 | 22.11 | 0.0077 |
7 ROH with just 4.3 Mb total — compatible with outbred ancestry and no recent consanguinity.
3. FROH Distribution
The histogram below shows the distribution of genomic inbreeding coefficients across all 1,047 individuals. Reference thresholds are marked for clinical interpretation:
• FROH < 0.0156 — no evidence of recent parental relatedness (outbred)
• 0.0156 – 0.0625 — background consanguinity, consistent with 3rd–4th cousin parents
• 0.0625 – 0.125 — parents likely 1st cousins or equivalent (6.25% of genome identical-by-descent)
• FROH > 0.125 — parents closer than 1st cousins (half-siblings or double 1st cousins)
FROH class breakdown
4. ROH by Chromosome
ROH frequency varies by chromosome, reflecting both chromosome length and regional recombination rate differences. Longer chromosomes accumulate more ROH simply because they contain more physical sequence, but recombination coldspots (e.g., pericentromeric regions) can elevate local ROH density beyond length expectations.
5. Identity-by-Descent Analysis
IBD was estimated using PLINK 1.9 --genome on the LD-pruned dataset
(1,047 samples, 88,722 SNPs) with --min 0.05. The method-of-moments estimator produces PI_HAT (proportion
of genome shared IBD) for every pair, along with Z0, Z1, Z2 (probabilities of sharing 0, 1,
or 2 alleles IBD).
Relatedness by degree
| Category | PI_HAT threshold | Expected relationship | Pairs | % of total |
|---|---|---|---|---|
| Duplicates / MZ | > 0.98 | Identical genotypes (lab duplicates or MZ twins) | 0 | 0% |
| 1st degree | > 0.354 | Parent–child or full siblings | 3 | 0.0005% |
| 2nd degree | > 0.177 | Half-siblings, avuncular, grandparent | 2 | 0.0004% |
| 3rd degree | > 0.0884 | 1st cousins or equivalent | 423 | 0.077% |
| Total related | > 0.0884 | All degrees combined | 428 | 0.078% |
6. PI_HAT Distribution
The vast majority of pairs (98.8%) share < 5% of their genome IBD, as expected for unrelated individuals. The analysis on LD-pruned variants (88.7K SNPs) with --min 0.05 threshold detected 6,368 pairs:
PI_HAT distribution bins
7. ADMIXTURE × ROH Cross-Reference
Using ADMIXTURE K=2 ancestry proportions (Q1: European-like component, mean 0.650; Q2: East Asian-like component, mean 0.350), we can examine whether autozygosity varies by ancestry proportion. In admixed populations, individuals with more homogeneous ancestry (one dominant component) may show elevated homozygosity due to assortative mating within subgroups.
8. Clinical Relevance for Pregnancy Loss
- Increased homozygosity for recessive lethal or sub-lethal alleles
- Reduced heterozygosity at HLA genes → impaired maternal–fetal immune tolerance
- Increased burden of damaging homozygous variants in developmentally critical genes
Key takeaways for the Uzbek cohort
Population-level
- Median FROH = 0.015 — comparable to outbred European and South Asian populations
- ~2.7% of individuals (≈ 28) have FROH > 0.0625, suggesting 1st-cousin-level parental consanguinity
- Minimal cryptic relatedness (5 pairs at 2nd degree or closer in 1,047 samples)
- Consistent with historical preference for endogamous marriages in Uzbek communities
GWAS design implications
- Covariates required: FROH should be included as a covariate in pregnancy loss GWAS to control for genome-wide recessive burden
- Kinship filter: Remove one individual from each pair with PI_HAT > 0.125, or use GRM-based mixed models
- ROH-enriched genes: Loci consistently within ROH across affected women may harbor recessive pregnancy loss genes
- Stratification: Consider analyzing high-FROH and low-FROH groups separately
FROH vs. published populations
| Population | Typical median FROH | Relative to UZB |
|---|---|---|
| UK Biobank (British) | 0.008–0.012 | 3–4× lower |
| South Asian (1000G SAS) | 0.015–0.025 | 1.3–2× lower |
| Uzbek (this study) | 0.015 | Reference |
| Qatar / Saudi populations | 0.040–0.060 | 1.3–1.9× higher |
| Isolated populations (e.g., Amish) | 0.060–0.120 | 2–4× higher |
9. Methods
Input (ROH): UZB_v2_qc BED/BIM/FAM (1,047 × 5,405,898 SNPs, post-QC)
Input (IBD): UZB_v2_ldpruned BED/BIM/FAM (1,047 × 88,722 SNPs, LD-pruned)
FROH formula: Σ(ROH length in kb) / 2,881,033 kb
IBD method: PLINK method-of-moments estimator (--genome flag)
Degree thresholds: Duplicates > 0.98, 1st > 0.354, 2nd > 0.177, 3rd > 0.0884
Command log
Output files (on server)
| File | Description | Size |
|---|---|---|
UZB_v2_ROH.hom | All 36,702 individual ROH segments with coordinates | ~4 MB |
UZB_v2_ROH.hom.indiv | Per-individual ROH summary (N_ROH, total_KB, avg_KB) | ~77 KB |
UZB_v2_ROH.hom.summary | Per-SNP ROH frequency across individuals | ~18 MB |
ConvSK_mind20_ibd.genome | All pairwise IBD estimates | ~90 MB |
Step 15 of 15 • ALSU Genotyping Analysis Pipeline • March 2026