Step 12: PBS Functional Annotation
Annotate PBS candidate SNPs using Ensembl VEP, GWAS Catalog, ClinVar, and GTEx
Completed — March 20261. Overview
The 8 PBS candidate SNPs identified in Step 10 are annotated using four complementary resources to assess their functional significance: Ensembl VEP for variant consequence prediction, GWAS Catalog for known disease associations, ClinVar for clinical pathogenicity, and GTEx for tissue-specific expression effects (eQTL).
2. Prerequisites
| Input | Source | Description |
|---|---|---|
pbs_candidates.json |
Step 10 | 8 candidate SNPs with PBS scores, MAF, and tier classification |
3. Annotation Pipeline
Annotations are retrieved via REST APIs using the Python script annotate_uzb_snps.py.
Each API is queried sequentially with rate limiting to avoid throttling.
3.1 Ensembl VEP (Variant Effect Predictor)
Queries the Ensembl REST API to obtain rsID mapping, gene context, and predicted variant consequence:
3.2 GWAS Catalog
Queries the EBI GWAS Catalog REST API by rsID to find known genome-wide significant associations:
3.3 NCBI ClinVar
Queries NCBI E-utilities to check clinical pathogenicity status:
3.4 GTEx eQTL
Queries the GTEx Portal API to check whether PBS candidates are expression quantitative trait loci (eQTL) in relevant tissues:
3.5 Running the Annotation Script
4. Results
4.1 Candidate Variants
| # | SNP | Chr | Position | PBSUZB | MAFUZB | MAFEUR | MAFEAS | MAFSAS | MAFAFR | ΔAFmin |
|---|
4.2 Annotation Summary
4.3 Method Notes
- PBS triangle: UZB–EUR–EAS (77,111 LD-pruned SNPs; 3,595 samples).
- Population counts: UZB=1,047, EUR=522, EAS=515, SAS=492, AFR=671.
- Candidate filter: Tier 1 (PBS ≥ 0.3) OR Tier 2 (|ΔAF| ≥ 0.3) OR Tier 3 (near-private).
- Genome build: GRCh38.
- Annotation script:
scripts/annotate_uzb_snps.py
5. Output Files
| File | Description |
|---|---|
uzbek_snp_annotations.json | Full annotation for all candidates (27 fields per SNP) |
uzbek_snp_annotations.tsv | Tab-delimited version for downstream use |
6. Next Steps
- Step 13: LD Analysis — LD clumping and decay analysis of PBS candidates.
- Step 14: Fst & MDS — Pairwise FST matrix and multidimensional scaling.