Sample Dataset
>Rosalind_9499
TTTCCATTTA
>Rosalind_0942
GATTCATTTC
>Rosalind_6568
TTTCCATTTT
>Rosalind_1833
GTTCCATTTA
Expected Output
0.00000 0.40000 0.10000 0.10000
0.40000 0.00000 0.40000 0.30000
0.10000 0.40000 0.00000 0.20000
0.10000 0.30000 0.20000 0.00000
To write this program I reused parts of the code from "Counting Point Mutations" and "Error Correction in Reads" and modified it to suit this problem. The following is the final code, which took me only 20 minutes to write. Hurray!
from Bio import SeqIO
reads = []
with open('sampledata.fasta', 'r') as f:
for record in SeqIO.parse(f, 'fasta'):
reads.append(str(record.seq))
read_len = len(reads[0])
for curr_read in reads:
distance = []
for comp_read in reads:
hamming = 0
for nt1, nt2 in zip(curr_read, comp_read):
if nt1 != nt2:
hamming += 1
distance.append(str.format('{0:.5f}', hamming / read_len))
print(*distance, sep=' ')
No comments:
Post a Comment