Friday, 29 July 2016

Finding a Spliced Motif

This time we are once again asked to find the position of a given subsequence for a given sequence. However, this time we should take into account that the subsequence can be spliced, i.e. it can be split up in the sequence and there can be other nucleotides between the parts. There could be multiple ways that the subsequence can be found in the sequence, but we only have to return one of them in the form of the positions each letter of the subsequence has in the sequence.

Sample dataset:
>Rosalind_14
ACGTACGTGACG
>Rosalind_18

GTA

Expected output:
3 8 10
(or any of the other possible combinations)

My first thought was to look at my solution for Finding a Motif in DNA, but in that problem I used Biopython to find the motifs and I wasn't able to find a way to adapt it to finding spliced motifs. Another thought was to use regular expressions. However, I quite quickly managed to come up with this rather simple solution instead:

from Bio import SeqIO                      
sequences = []                             
handle = open('sampledata.fasta', 'r')     
for record in SeqIO.parse(handle, 'fasta'):
    sequences.append(str(record.seq))      
handle.close()                             
s = sequences[0]                           
t = sequences[1]                           

pos = 0                                    
positions = []                             
for i in range(len(t)):                    
    for j in range(pos, len(s)):           
        pos += 1                           
        if len(positions) < len(t):        
            if t[i] == s[j]:               
                positions.append(pos)      
                break                      
print(*positions, sep=' ')                 

2 comments:

  1. After 3 days of coding and trying different things to no avail, seeing your solutions makes me want to cry

    ReplyDelete
    Replies
    1. I take my words back, I thought this algorithm would be able to locate every spliced motif :/

      Delete