Monday 22 August 2016

Matching Random Motifs

This problem is very similar to “Introduction to Random Strings”. In this case however, we are given a sting s, and are asked to calculate the probability of randomly forming s when generating N nr of stings of equal length of s and a GC content of x.

Sample Dataset
90000 0.6
ATAGCCGA

Sample Output
0.689

To calculate the probability of randomly getting a string that equals s when constructing it with GC content x, we can use the same equation as in introduction to random strings. To go from this probability (lets call it s_prob) to the probability of getting at least one string that is equal to s we use the following equation:

P(at least 1 match of s) = 1 − P(no matches out of N strings) = 1 − [1 - s_prob]^N

The following program makes the above calculations and outputs the answer with three significant figures:

N = 90000
x = 0.6
s = 'ATAGCCGA'
AT = 0
GC = 0
for nt in s:
    if nt == 'A' or nt == 'T':
        AT += 1
    elif nt == 'G' or nt == 'C':
        GC += 1

s_prob = (((1 - x) / 2)**AT) * (((x) / 2)**GC)
prob = 1 - (1 - s_prob)**N
print('%0.3f' % prob)

No comments:

Post a Comment