Thursday 15 September 2016

Inferring Protein from Spectrum

In this problem we are given a prefix spectrum and are asked to infer a protein sequence from it using the monoisotopic mass table.

Sample Dataset
3524.8542
3710.9335
3841.974
3970.0326
4057.0646

Expected Output
WMQS

This was a fairly simple problem to solve, and I guess the tricky part was to get the rounding right. In order to compare the results from the numbers in the dataset with the numbers in the mass table, we need to round all numbers to four decimals, otherwise they will differ slightly. The code below can also be found on my Github.

L = [float(line) for line in open('rosalind_spec.txt','r')]

mass_table = {'A':71.03711,'C':103.00919,'D':115.02694,'E':129.04259,'F':147.06841,'G':57.02146,'H':137.05891,'I':113.08406,'K':128.09496,'L':113.08406,'M':131.04049,'N':114.04293,'P':97.05276,'Q':128.05858,'R':156.10111,'S':87.03203,'T':101.04768,'V':99.06841,'W':186.07931,'Y':163.06333}

aa_masses = []
for i in range(len(L) - 1):
    aa_mass = round(L[i + 1] - L[i], 4)
    aa_masses.append(aa_mass)

rnd_mass_table = {}
for k, v in mass_table.items():
    rnd_mass_table[round(v, 4)] = k

prot = ''
for aa in aa_masses:
    prot += rnd_mass_table[aa]

print(prot)

No comments:

Post a Comment