Wednesday 16 November 2016

Counting Disease Carriers

Once again we are looking at probability, this time regarding the occurrence of alleles in a population following the Hardy-Weinberg principle. For a population in genetic equilibrium for some given alleles, we are given the frequency of homozygous recessive individuals for each one and are asked to return the probabilities of any randomly selected individual having at least one recessive allele.

We can divide the population into three groups:

A - Homozygous recessive
B - Heterozygous
C - Homozygous dominant

We know that A + B + C = 1. The probability of picking individuals who are not homozygous dominants is P = 1 - C = A + B, which is what we are after.

If we denote the probability of a chromosome having a recessive allele q, and the probability of a chromosome having a dominant allele p, then since each individual carries two chromosomes a homozygous recessive can be described as A = q^2, a heterozygous individual is B = 2pq, and a homozygous dominant is C = p^2.

If we put this together we get that P = q^2 + 2pq. We also know that p + q = 1. So p = 1-q, which gives us P=q^2 + 2(1-q)q which in turn can be simplified to P = 2A^0.5-A.

Now we have our formula, so all we need to do is write a program that makes the calculation for all the given alleles. You can find my version here or below.

A = []
with open('rosalind_afrq.txt','r') as f:
    for nr in f.readline().split(' '):
        A.append(float(nr))

for i in A:
    print(round(2*i**0.5-i,3),end=' ')

No comments:

Post a Comment