Home / Week 9 Exercises / GWAS

GWAS

The questions below are due on Tuesday April 16, 2019; 11:00:00 PM.
 
You are not logged in.

If you are a current student, please Log In for full access to this page.
music for this problem


Cheap sequencing enabling confident single-nucleotide readings over large pieces of DNA has only been achieved in the past ten years. One of the results of the ability to read DNA at the single nucleotide resolution with high confidenence is the ability to detect Single Nucleotide Polymorphisms (SNPs for short, but pronounced as "snips") in the human genome.

SNPs are common single-nucleotide variations (an A is instead a T for example) that are known to exist in the genome. Researchers have begun to analyze the frequency and association of SNPs with various diseases and medical disorders. These studies are called Genome Wide Association Studies (GWAS). When used in conjunction with Bayes' theorem, patient predisposition for diseases can be determined from analyzing the presence (or absence) of certain SNPs. We look at the probability that forms the basis of a GWAS on a fictional heart disorder below.

A heart disorder with a prevalence of 0.02 (2%) in the general population is investigated in a GWAS in a hospital study. 3000 subjects with the disorder are included in the study and 7000 subjects without the disorder are included as control subjects. Note that this doesn't mean the disorder has a 30% prevalence since selection bias went into collecting the study participants...general population prevalence is still 2% (i.e. P(\text{Heart Disorder}) = 0.02).

Phenotype Total Has SNP1 Has SNP2 Has SNP3
Has Heart Disorder: 300016009201750
Control (no Heart Disorder):7000325021502100

Based on the data in the table above, answer the following questions with three digits after the decimal place. Consider each SNP independently (do not worry about combinatorials). Also try to carry the full values through in all of your calculations. The checker is looking only three decimal places back but if you use multiple rounded answers in combination the resulting answer can be "too" rounded.

How likely is it that an individual has SNP3, given that they have a heart disorder?
How likely is it that an individual has SNP3, given that they do not have a heart disorder?
Overall, how likely is an individual to have SNP3?
How likely is an individual to have a heart disorder if they have SNP1?
How likely is an individual to have a heart disorder if they have SNP2?
How likely is an individual to have a heart disorder if they have SNP3?
How many times as likely (compared to the general population on average) is an individual to have the heart disorder if they have SNP1?
How many times as likely (compared to the general population on average) is an individual to have the heart disorder if they have SNP2?
How many times as likely (compared to the general population on average) is an individual to have the heart disorder if they have SNP3?

The human genome has millions of SNPs, and humans suffer from tens of thousands of disorders and diseases. Throw in the fact that the SNPs influence one another (not independent events) and it becomes a very, very active area of research, both in developing ways to handle the large data sets and in making the actual discoveries from the data. Papers are being published every day on new discoveries from GWAS.