Monday, 19 September 2016

Comparing Spectra with the Spectral Convolution

In this task we are given two multisets, S1 and S2, each representing simplified spectra taken from two peptides. We are asked to find the largest multiplicity of the  Minkowski difference (S1⊖S2) and the absolute value of the number x maximizing (S1⊖S2)(x). It took me a while to figure out what I was meant to find, but ultimately I figured that it was the number that occurs most frequently in the multiset formed by S1⊖S2, and its frequency.

Sample Dataset
186.07931 287.12699 548.20532 580.18077 681.22845 706.27446 782.27613 968.35544 968.35544
101.04768 158.06914 202.09536 318.09979 419.14747 463.17369

Expected Output
3
85.03163

Once I had realized what I was after, The programming itself became very easy. All I had to do was generate the multiset formed by S1⊖S2 and pick out the most frequently occurring number. As it turns out, there is a really useful built in library for this called collections. Using the function called most_common, I got precisely what I wanted. Have a look at the code below or on my GitHub.

from collections import Counter

data = []
with open('rosalind_conv.txt','r') as f:
    for line in f:
        data.append(line.strip())
S1 = [float(x) for x in data[0].split()]
S2 = [float(x) for x in data[1].split()]

#calculate Minkowski difference of S1 and S2
min_diff = []
for i in S1:
    for j in S2:
        diff = round(i - j, 5)
        min_diff.append(diff)

#find most common value and its frequency
x = Counter(min_diff).most_common(1)

print(x[0][1])
print(x[0][0])

No comments:

Post a Comment