Sample dataset:
>Rosalind_14
ACGTACGTGACG
>Rosalind_18
GTA
Expected output:
3 8 10
(or any of the other possible combinations)
My first thought was to look at my solution for Finding a Motif in DNA, but in that problem I used Biopython to find the motifs and I wasn't able to find a way to adapt it to finding spliced motifs. Another thought was to use regular expressions. However, I quite quickly managed to come up with this rather simple solution instead:
from Bio import SeqIO
sequences = []
handle = open('sampledata.fasta', 'r')
for record in SeqIO.parse(handle, 'fasta'):
sequences.append(str(record.seq))
handle.close()
s = sequences[0]
t = sequences[1]
pos = 0
positions = []
for i in range(len(t)):
for j in range(pos, len(s)):
pos += 1
if len(positions) < len(t):
if t[i] == s[j]:
positions.append(pos)
break
print(*positions, sep=' ')
After 3 days of coding and trying different things to no avail, seeing your solutions makes me want to cry
ReplyDeleteI take my words back, I thought this algorithm would be able to locate every spliced motif :/
Delete