Sara does Bioinformatics: Counting Point Mutations

In this problem we are asked to calculate the Hamming distance of two sequences. For this task, my program would need to read the file containing the two sequences and then compare each position of the sequences to see if the nucleotides were the same. Each time there is a difference the program should add 1 to the Hamming distance.

The first thing i wrote was a piece of code to load the two sequences into a list. Because the sequences are separated by a new line in the sample file, I used the command line.strip to remove this from the resulting list. To be able to compare each position of the two sequences I used the command zip(), which iterates over each element of the strings. Then it was just a simple matter of comparing the nucleotides and adding 1 to the variable called "hamming" each time they differed. The following is the final code that I came up with:

seq = [line.strip('\n') for line in open('sampledata.txt')]

hamming = 0

for nt1, nt2 in zip(seq[0],seq[1]):

if nt1 != nt2:

hamming += 1

print(hamming)

Note that I haven't added a check to make sure that the length of the two sequences are the same. For real applications this would definitely be a good idea, but with the data set given from Rosalind I didn't feel it was necessary this time.

A simpler way to write this program would be to use the already existing package Distance which includes a function for calculating the Hamming distance.

Sara does Bioinformatics

Wednesday, 22 June 2016

Counting Point Mutations

No comments:

Post a Comment