← Back to Roadmap

Step 12: PBS Functional Annotation

Annotate PBS candidate SNPs using Ensembl VEP, GWAS Catalog, ClinVar, and GTEx

Completed — March 2026

1. Overview

The 8 PBS candidate SNPs identified in Step 10 are annotated using four complementary resources to assess their functional significance: Ensembl VEP for variant consequence prediction, GWAS Catalog for known disease associations, ClinVar for clinical pathogenicity, and GTEx for tissue-specific expression effects (eQTL).

Goal: Determine whether the Uzbek-specific allele frequency shifts detected by PBS correspond to known functional variants, disease associations, or gene regulatory effects.

2. Prerequisites

InputSourceDescription
pbs_candidates.json Step 10 8 candidate SNPs with PBS scores, MAF, and tier classification

3. Annotation Pipeline

Annotations are retrieved via REST APIs using the Python script annotate_uzb_snps.py. Each API is queried sequentially with rate limiting to avoid throttling.

3.1 Ensembl VEP (Variant Effect Predictor)

Queries the Ensembl REST API to obtain rsID mapping, gene context, and predicted variant consequence:

# Ensembl VEP — POST batch request (up to 100 variants per request) POST https://rest.ensembl.org/vep/human/region Content-Type: application/json Payload: { "variants": ["9 104189856 . G C . . .", ...], "canonical": true } Returns: rsID, gene symbol, consequence type (missense_variant, synonymous_variant, intergenic_variant, etc.) Rate limit: ~15 requests/sec (0.07 s/request)

3.2 GWAS Catalog

Queries the EBI GWAS Catalog REST API by rsID to find known genome-wide significant associations:

# GWAS Catalog — lookup by rsID GET https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rsid}/associations Returns: trait name, p-value, PubMed ID (top 5 hits per SNP) Rate limit: ~3 requests/sec (0.3 s/request)

3.3 NCBI ClinVar

Queries NCBI E-utilities to check clinical pathogenicity status:

# ClinVar — two-step lookup # Step 1: Search for ClinVar records by rsID GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi ?db=clinvar&term={rsid}[rs]&retmode=json # Step 2: Retrieve clinical significance GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi ?db=clinvar&id={clinvar_id}&retmode=json Returns: clinical significance, associated condition, ClinVar accession Rate limit: 3 requests/sec without API key (0.35 s/request)

3.4 GTEx eQTL

Queries the GTEx Portal API to check whether PBS candidates are expression quantitative trait loci (eQTL) in relevant tissues:

# GTEx — tissue-specific eQTL lookup GET https://gtexportal.org/api/v2/association/singleTissueEqtl ?snpId={rsid} &tissueSiteDetailId={tissue} &datasetId=gtex_v8 Tissues queried: - Whole_Blood - Liver - Artery_Aorta - Brain_Frontal_Cortex_Ba9 - Adipose_Subcutaneous Returns: target gene, p-value, effect size (dosage slope) Rate limit: ~5 requests/sec (0.2 s/request)

3.5 Running the Annotation Script

python3 annotate_uzb_snps.py
Loading 8 PBS candidate SNPs... [1/4] Querying Ensembl VEP... 1 batch (8 variants) [2/4] Querying GWAS Catalog... 8 rsIDs [3/4] Querying ClinVar... 8 rsIDs [4/4] Querying GTEx eQTL... 8 rsIDs x 5 tissues Annotation complete. Output: uzbek_snp_annotations.json

4. Results

4.1 Candidate Variants

#SNPChrPositionPBSUZB MAFUZBMAFEURMAFEAS MAFSASMAFAFRΔAFmin

4.2 Annotation Summary

Annotation status: External API re-annotation (VEP, GWAS Catalog, ClinVar, GTEx) has not yet been completed for the final 8-candidate set. The annotation fields will be populated once the API queries are run on the current reduced set. Functional interpretation should be deferred until annotations are available.

4.3 Method Notes

  • PBS triangle: UZB–EUR–EAS (77,111 LD-pruned SNPs; 3,595 samples).
  • Population counts: UZB=1,047, EUR=522, EAS=515, SAS=492, AFR=671.
  • Candidate filter: Tier 1 (PBS ≥ 0.3) OR Tier 2 (|ΔAF| ≥ 0.3) OR Tier 3 (near-private).
  • Genome build: GRCh38.
  • Annotation script: scripts/annotate_uzb_snps.py

5. Output Files

FileDescription
uzbek_snp_annotations.jsonFull annotation for all candidates (27 fields per SNP)
uzbek_snp_annotations.tsvTab-delimited version for downstream use

6. Next Steps