- Make the reverse complements and combine with the initial reads.
- Add all unique reads (whether they are reverse complements or the original reads) to a set.
- For each of the reads in the set, print the prefix followed by the suffix.
I set about to write a program that performs these three steps and the following is what I managed to come up with:
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
def rev_comp(read):
seq = Seq(read, IUPAC.unambiguous_dna)
rev_seq = seq.reverse_complement()
return(str(rev_seq))
def mk_set(seqs):
S = set()
for i in seqs:
if i not in S:
S.add(i)
C = rev_comp(i)
if C not in S:
S.add(C)
return(S)
data = []
with open('rosalind_dbru.txt','r') as f:
for line in f:
data.append(line.strip())
s = mk_set(data)
for j in s:
print("(" + j[:-1] + ", " + j[1:] + ")")
The code can also be found on my GitHub along with the sample data and problem description.
No comments:
Post a Comment