ALSU Sample Investigation Report — v2 Chip-Level Forensics

654,027 variants × 1,247 samples across 52 physical chips — All data from real server computations

What is this report? This is a forensic investigation into quality problems discovered during genotyping of 1,247 Uzbek cohort DNA samples on Illumina GSA-24v3 BeadChip microarrays. Genotyping means measuring ~654,000 specific DNA positions (called variants or SNPs) for each person. The samples were loaded onto 52 physical silicon chips (each holding ~24 samples), scanned by a laser, and processed through Illumina GenomeStudio software. This report examines three major anomalies: failed chips, identity mismatches, and possible sample contamination.
F_MISS (missingness rate)
The fraction of the 654K DNA positions that could not be read for a sample. Under 5% is normal; over 20% means the genotyping essentially failed. Like a barcode scanner that can't read most of the barcode.
BeadChip / Chip
A silicon slide holding ~24 DNA samples. Each sample sits in a specific physical well (position like R01C01). Each chip has a unique 12-digit barcode.
PI_HAT (identity-by-descent)
A number from 0 to 1 measuring how genetically identical two samples are. PI_HAT ≥ 0.98 means the two samples are essentially the same person (or identical twins).
Heterozygosity (het rate)
The fraction of DNA positions where a person has two different versions (one from each parent). In a healthy population, ~19% of measured positions are heterozygous. Abnormally high het rates (>40%) are consistent with DNA contamination (two genomes mixed together).
F(X) / X-chromosome F
A measure of X-chromosome homozygosity. Males have one X (F near 1.0), females have two (F near 0). Extremely negative values (below -1) indicate contaminated samples.
d/t suffix (дубль/трипликат)
"d" = дубль (duplicate), "t" = трипликат (triplicate). The sample sheet lists the same Sample_ID on two (or three) different chip positions. The d/t suffixes were added by the pipeline operator within GenomeStudio to disambiguate these duplicate Sample_IDs before PLINK export. Sample "08-25d" shares its Sample_ID with "08-25" but is on a different physical chip. Each d/t entry has its own unique barcode + position and its own idat files — these are independent experiments.
Control probes
Built-in test probes on every chip that check whether each step of the chemistry protocol worked. Like calibration marks on a printed page — they should always give the same result regardless of the DNA loaded.
Hybridization
The 16-hour chemical step where sample DNA binds to matching probes on the chip. If this step fails, most probes give no signal, resulting in high missingness.
Table of Contents 1. Summary Statistics 2. Sample Quality vs Position (by Chip) 3. Chip Quality Ranking — The Smoking Gun 4. Chip Position Heatmaps (Worst Chips) 5. Missingness by ID Prefix 6. Relatedness Network (PI_HAT ≥ 0.98) 7. Identity Pairs & d/t Verification 8. Sex Check Results 9. Heterozygosity Analysis 10. Hyper-Connected Samples — Allele Dropout 11. Root Cause Analysis & Recommendations 12. Practical Action Plan ★ 13. Scanner QC & Control Probe Analysis ★★ 14. Full Sample Verdict Table ★ 15. Identity Audit — Unverifiable Samples ★★★ 16. KING-Robust Independent Verification ★★ 17. Resolution Plan: Targeted Fingerprinting ★★★
Loading investigation_data_v2.json...