Estimating statistical significance of exome sequencing data for rare mendelian disorders using population-wide linkage analysis

Duration: 11 mins 38 secs
Share this media item:
Embed this media item:


About this item
Estimating statistical significance of exome sequencing data for rare mendelian disorders using population-wide linkage analysis's image
Description: Albers, K (Haematology)
Monday 26 September 2011, 10:10-10:20
 
Created: 2011-09-30 15:29
Collection: Design and Analysis of Experiments
Cambridge Statistics Initiative 2011
Publisher: Isaac Newton Institute
Copyright: Albers, K
Language: eng (English)
 
Abstract: Exome sequencing of a small number of unrelated affected individuals has proved to be a highly effective approach for identifying causative genes of rare mendelian diseases. A widely used strategy is to consider as candidate causative mutations only those variants that have not been seen previously in other individuals, and those variants predicted to a ect protein sequence, e.g. non-synonymous variants or stop-codons.

For the recessive disorder Gray Platelet Syndrome we identified 7 novel coding mutations in 4 affected individuals, all in different locations in one gene and absent from 994 individuals from the 1000 Genomes project; intuitively a highly significant result (Albers et al. Nat Genet 2011). However, in the case where the candidate causative mutations segregate at low frequency in the general population the significance may be less obvious. This raises a number of questions: what is the statistical significance of such findings in small numbers of affected individuals? If we would assume that the causative mutations are not necessarily in coding sequence, would these results be genome-wide significant? Motivated by these issues, we are developing a statistical model based on the idea that
filtering out previously seen variants can be thought of as performing a whole-population parametric linkage analysis, whereby the individuals carrying previously seen variants represent the unaffected individuals.

We use the coalescent, a mathematical description of the notion that ultimately all individuals in a population are descendants of a single common ancestor, to model the unknown pedigree shared by the a ected individuals and the unaffected individuals. I will discuss implications of population stratification, false positive variant calls and variation in coverage for singleton rates and significance estimates.
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 640x360    1.84 Mbits/sec 161.32 MB View Download
WebM 640x360    542.94 kbits/sec 46.00 MB View Download
Flash Video 484x272    569.09 kbits/sec 48.49 MB View Download
iPod Video 480x270    506.63 kbits/sec 43.17 MB View Download
MP3 44100 Hz 125.09 kbits/sec 10.46 MB Listen Download
Auto * (Allows browser to choose a format it supports)