Friday 29 July 2016

Transitions and Transversions

This problem was a really quick one. It took me less than 20 minutes to solve! Hurray!

We are asked to compare two sequences of equal length and classify the mutations as either transitions (substituting a purine to another purine or a pyrimidine to another pyrimidine) or transversions (substituting a purine to a pyrimidine or vice versa). We should then return the transition/transversion ratio for the sequences.

Sample dataset:
>Rosalind_0209
GCAACGCACAACGAAAACCCTTAGGGACTGGATTATTTCGTGATCGTTGTAGTTATTGGA
AGTACGGGCATCAACCCAGTT
>Rosalind_2200
TTATCTGACAAAGAAAGCCGTCAACGGCTGGATAATTTCGCGATCGTGCTGGTTACTGGC
GGTACGAGTGTTCCTTTGGGT

Expected output:
1.21428571429

The problem is very similar to Counting Point Mutations in which we calculated the Hamming distance. I used my code from that problem as a starting point and this is the altered code:

from Bio import SeqIO                        
sequences = []                               
handle = open('sampledata.fasta', 'r')       
for record in SeqIO.parse(handle, 'fasta'):  
    sequences.append(str(record.seq))        
handle.close()                               
s1 = sequences[0]                            
s2 = sequences[1]                            

transition = 0                               
transversion = 0                             
AG = ['A', 'G']                              
CT = ['C', 'T']                              
for nt1, nt2 in zip(s1, s2):                 
    if nt1 != nt2:                           
        if nt1 in AG and nt2 in AG:          
            transition += 1                  
        elif nt1 in CT and nt2 in CT:        
            transition += 1                  
        else:                                
            transversion += 1                
print('%0.11f' % (transition / transversion))

No comments:

Post a Comment