1. Overview

Runs of Homozygosity (ROH) are contiguous stretches of homozygous genotypes that arise when an individual inherits two copies of the same ancestral haplotype — a hallmark of parental relatedness (consanguinity) or population-level founder effects. Identity-by-Descent (IBD) analysis detects pairs of individuals sharing long haplotype segments, revealing cryptic relatedness not apparent from pedigree records.

Both analyses are critical for understanding the Uzbek population's demographic history and for designing association studies (e.g., pregnancy loss GWAS) where cryptic relatedness inflates test statistics if unaccounted for.

1,047
Individuals
36,702
ROH segments
0.015
Median FROH
6,368
Related pairs (IBD)
0
Duplicate samples
Key finding: Median FROH = 0.015 is comparable to outbred European populations (~0.01–0.02). However, a substantial tail of consanguineous individuals (28 with FROH > 0.0625) indicates historical endogamy patterns in a subset of the Uzbek cohort. This is directly relevant to pregnancy loss genetics, as elevated autozygosity increases exposure to recessive disease alleles.

2. ROH Summary Statistics

PLINK 1.9 --homozyg was run on the Uzbek post-QC dataset (1,047 × 5,405,898 SNPs) with parameters tuned for detecting long ROH segments reflecting recent consanguinity:

plink --bfile UZB_v2_for_roh \ --homozyg \ --homozyg-window-snp 50 \ --homozyg-snp 50 \ --homozyg-kb 1000 \ --homozyg-density 50 \ --homozyg-gap 1000 \ --homozyg-window-het 1 \ --homozyg-window-missing 5 \ --homozyg-window-threshold 0.05 \ --out UZB_v2_ROH

Parameter rationale

ParameterValueMeaning
--homozyg-kb1000Minimum ROH length 1,000 kb — focuses on long ROH reflecting recent consanguinity
--homozyg-snp50Minimum 50 SNPs per ROH — avoids sparse-coverage artefacts
--homozyg-window-snp50Scanning window of 50 SNPs
--homozyg-density50Max 50 kb/SNP density — ensures ROH are in SNP-dense regions
--homozyg-gap1000Max 1 Mb gap between consecutive SNPs within a ROH
--homozyg-window-het1Max 1 heterozygous call per scanning window (allows for genotyping error)
--homozyg-window-missing5Max 5 missing genotypes per window before excluding that region
--homozyg-window-threshold0.05Min proportion (5%) of overlapping homozygous windows to call a ROH

Distribution of ROH per individual

StatisticROH CountTotal ROH (Mb)FROH
Minimum1522.070.0077
25th percentile2737.080.0129
Median3142.550.0148
Mean35.154.190.0188
75th percentile3752.770.0183
Maximum145320.200.1111
FROH calculation: FROH = Σ(ROH length) / Lgenome, where Lgenome = 2,881 Mb (autosomal genome covered by the SNP array). This genomic inbreeding coefficient is more informative than pedigree-based F because it captures ancient as well as recent consanguinity.

Extreme individuals

🔴 Highest FROH

IndividualNROHTotal MbFROH
705_14-79m145320.200.1111
548_14-20m123299.870.1041
629_03-160128298.830.1037
70_02-35121280.220.0973
927_13-118114274.110.0951

FROH > 0.0625 corresponds to parents who are at least 1st cousins.

🟢 Lowest FROH

IndividualNROHTotal MbFROH
904_09-1291622.070.0077
463_08-4421922.110.0077

7 ROH with just 4.3 Mb total — compatible with outbred ancestry and no recent consanguinity.


3. FROH Distribution

The histogram below shows the distribution of genomic inbreeding coefficients across all 1,047 individuals. Reference thresholds are marked for clinical interpretation:

Interpreting FROH:
• FROH < 0.0156 — no evidence of recent parental relatedness (outbred)
• 0.0156 – 0.0625 — background consanguinity, consistent with 3rd–4th cousin parents
• 0.0625 – 0.125 — parents likely 1st cousins or equivalent (6.25% of genome identical-by-descent)
• FROH > 0.125 — parents closer than 1st cousins (half-siblings or double 1st cousins)

FROH class breakdown

606
Outbred (F<0.0156)
413
Background (0.016–0.0625)
28
Consanguineous (>0.0625)

4. ROH by Chromosome

ROH frequency varies by chromosome, reflecting both chromosome length and regional recombination rate differences. Longer chromosomes accumulate more ROH simply because they contain more physical sequence, but recombination coldspots (e.g., pericentromeric regions) can elevate local ROH density beyond length expectations.

Notable: Chromosome 2 has the highest ROH count (3,590), consistent with its large physical size (243 Mb) and known low-recombination pericentromeric block. Chromosome 21 has the fewest ROH (237), reflecting its small size (46.7 Mb).

5. Identity-by-Descent Analysis

IBD was estimated using PLINK 1.9 --genome on the LD-pruned dataset (1,047 samples, 88,722 SNPs) with --min 0.05. The method-of-moments estimator produces PI_HAT (proportion of genome shared IBD) for every pair, along with Z0, Z1, Z2 (probabilities of sharing 0, 1, or 2 alleles IBD).

547,581
Total pairs tested
186
Related (π̂ > 0.125)
0
Duplicates / MZ twins
3
1st-degree relatives

Relatedness by degree

CategoryPI_HAT thresholdExpected relationshipPairs% of total
Duplicates / MZ > 0.98 Identical genotypes (lab duplicates or MZ twins) 0 0%
1st degree > 0.354 Parent–child or full siblings 3 0.0005%
2nd degree > 0.177 Half-siblings, avuncular, grandparent 2 0.0004%
3rd degree > 0.0884 1st cousins or equivalent 423 0.077%
Total related > 0.0884 All degrees combined 428 0.078%
Note on duplicates: The post-QC dataset has 0 duplicate/MZ pairs, as duplicates were already removed during quality control (Step 7, --mind 0.05). Post-QC sample count: N = 1,047.

6. PI_HAT Distribution

The vast majority of pairs (98.8%) share < 5% of their genome IBD, as expected for unrelated individuals. The analysis on LD-pruned variants (88.7K SNPs) with --min 0.05 threshold detected 6,368 pairs:

PI_HAT distribution bins

< 0.05
541,213 (98.8%)
0.05–0.10
6,128 (1.12%)
0.10–0.20
236 (0.043%)
0.20–0.50
3 (0.0005%)
> 0.50
1 (0.0002%)
Implications for GWAS: With only 5 pairs at 2nd degree or closer and 423 3rd-degree pairs, cryptic relatedness is minimal in the post-QC dataset. Standard GRM-based mixed models (BOLT-LMM, SAIGE) remain recommended for the planned pregnancy loss GWAS, but aggressive kinship filtering is unlikely to be necessary.

7. ADMIXTURE × ROH Cross-Reference

Using ADMIXTURE K=2 ancestry proportions (Q1: European-like component, mean 0.650; Q2: East Asian-like component, mean 0.350), we can examine whether autozygosity varies by ancestry proportion. In admixed populations, individuals with more homogeneous ancestry (one dominant component) may show elevated homozygosity due to assortative mating within subgroups.

Expected pattern: In a single-pulse admixture, FROH should be somewhat elevated at both extremes of the ancestry distribution (Q1 < 0.3 or Q1 > 0.8) where individuals are genetically more homogeneous, and lower in the middle where recombination between ancestral haplotypes breaks up long homozygous blocks. Deviations suggest ongoing substructure or community-level endogamy.

8. Clinical Relevance for Pregnancy Loss

Clinical context: Elevated autozygosity (FROH) has been consistently associated with adverse reproductive outcomes including recurrent pregnancy loss (RPL), stillbirth, and congenital anomalies. The mechanisms include:
  • Increased homozygosity for recessive lethal or sub-lethal alleles
  • Reduced heterozygosity at HLA genes → impaired maternal–fetal immune tolerance
  • Increased burden of damaging homozygous variants in developmentally critical genes

Key takeaways for the Uzbek cohort

Population-level

  • Median FROH = 0.015 — comparable to outbred European and South Asian populations
  • ~2.7% of individuals (≈ 28) have FROH > 0.0625, suggesting 1st-cousin-level parental consanguinity
  • Minimal cryptic relatedness (5 pairs at 2nd degree or closer in 1,047 samples)
  • Consistent with historical preference for endogamous marriages in Uzbek communities

GWAS design implications

  • Covariates required: FROH should be included as a covariate in pregnancy loss GWAS to control for genome-wide recessive burden
  • Kinship filter: Remove one individual from each pair with PI_HAT > 0.125, or use GRM-based mixed models
  • ROH-enriched genes: Loci consistently within ROH across affected women may harbor recessive pregnancy loss genes
  • Stratification: Consider analyzing high-FROH and low-FROH groups separately

FROH vs. published populations

PopulationTypical median FROHRelative to UZB
UK Biobank (British)0.008–0.0123–4× lower
South Asian (1000G SAS)0.015–0.0251.3–2× lower
Uzbek (this study)0.015Reference
Qatar / Saudi populations0.040–0.0601.3–1.9× higher
Isolated populations (e.g., Amish)0.060–0.1202–4× higher

9. Methods

Software: PLINK v1.9.0-b.7.7 (64-bit, 515 GB RAM workstation)
Input (ROH): UZB_v2_qc BED/BIM/FAM (1,047 × 5,405,898 SNPs, post-QC)
Input (IBD): UZB_v2_ldpruned BED/BIM/FAM (1,047 × 88,722 SNPs, LD-pruned)
FROH formula: Σ(ROH length in kb) / 2,881,033 kb
IBD method: PLINK method-of-moments estimator (--genome flag)
Degree thresholds: Duplicates > 0.98, 1st > 0.354, 2nd > 0.177, 3rd > 0.0884

Command log

# ROH analysis plink --bfile ~/v2/roh/UZB_v2_for_roh \ --homozyg \ --homozyg-window-snp 50 \ --homozyg-snp 50 \ --homozyg-kb 1000 \ --homozyg-density 50 \ --homozyg-gap 1000 \ --homozyg-window-het 1 \ --homozyg-window-missing 5 \ --homozyg-window-threshold 0.05 \ --out ~/v2/roh/UZB_v2_ROH # Output: 36,702 ROH segments across 1,047 individuals # Files: UZB_v2_ROH.hom, UZB_v2_ROH.hom.indiv, UZB_v2_ROH.hom.summary # IBD analysis plink --bfile ~/v2/plink/UZB_v2_ldpruned \ --genome \ --min 0.05 \ --out ~/v2/ibd/UZB_v2_IBD # Output: 6,368 pairs with PI_HAT >= 0.05

Output files (on server)

FileDescriptionSize
UZB_v2_ROH.homAll 36,702 individual ROH segments with coordinates~4 MB
UZB_v2_ROH.hom.indivPer-individual ROH summary (N_ROH, total_KB, avg_KB)~77 KB
UZB_v2_ROH.hom.summaryPer-SNP ROH frequency across individuals~18 MB
ConvSK_mind20_ibd.genomeAll pairwise IBD estimates~90 MB

Step 15 of 15 • ALSU Genotyping Analysis Pipeline • March 2026