I wrote two programs for this problem, the first one uses Biopython and the second doesn't. The following code is the one using Biopython:
from Bio.Seq import Seq
from Bio.Alphabet import generic_protein
from Bio.SeqUtils import molecular_weight
with open('sampledata.txt', 'r') as f:
for line in f:
prot_seq = line.strip('\n')
print('%0.3f' % (molecular_weight(
Seq(prot_seq, generic_protein),
monoisotopic=True) - 18.01056))
This program uses the Biopython function molecular_weight from SeqUtils. The function sets monoisotopic=False by default, but because the problem specified that we should use the monoisotopic weitghts we need to set it to monoisotopic=True. Also, the function includes the extra weight of one water molecule, so we need to remove that manually (hence the - 18.01056). Come to think of it there is an option called circular that states that the sequence has no ends. Maybe that would work as well?
Anyhow, I thought I'd try to make this program without the use of Biopython as well, and the following is what I came up with:
weights = {'A': 71.03711,
'C': 103.00919,
'D': 115.02694,
'E': 129.04259,
'F': 147.06841,
'G': 57.02146,
'H': 137.05891,
'I': 113.08406,
'K': 128.09496,
'L': 113.08406,
'M': 131.04049,
'N': 114.04293,
'P': 97.05276,
'Q': 128.05858,
'R': 156.10111,
'S': 87.03203,
'T': 101.04768,
'V': 99.06841,
'W': 186.07931,
'Y': 163.06333}
with open('sampledata.txt', 'r') as f:
for line in f:
prot_seq = line.strip('\n')
weight = 0
for aa in prot_seq:
weight += weights[aa]
print('%0.3f' % weight)
No comments:
Post a Comment