The first thing i wrote was a piece of code to load the two sequences into a list. Because the sequences are separated by a new line in the sample file, I used the command line.strip to remove this from the resulting list. To be able to compare each position of the two sequences I used the command zip(), which iterates over each element of the strings. Then it was just a simple matter of comparing the nucleotides and adding 1 to the variable called "hamming" each time they differed. The following is the final code that I came up with:
seq = [line.strip('\n') for line in open('sampledata.txt')]
hamming = 0
for nt1, nt2 in zip(seq[0],seq[1]):
if nt1 != nt2:
hamming += 1
print(hamming)
Note that I haven't added a check to make sure that the length of the two sequences are the same. For real applications this would definitely be a good idea, but with the data set given from Rosalind I didn't feel it was necessary this time.
A simpler way to write this program would be to use the already existing package Distance which includes a function for calculating the Hamming distance.
No comments:
Post a Comment