Sara does Bioinformatics: Constructing a De Bruijn Graph

In this problem we are asked to find the adjacency list corresponding to the De Bruijn graph constructed from a set of short reads and their reverse complements. To do this was actually a lot easier than I initially thought. After a bit of research on De Bruijn graphs and a bit of doodling, I figured out the following three steps to complete the task:

Make the reverse complements and combine with the initial reads.
Add all unique reads (whether they are reverse complements or the original reads) to a set.
For each of the reads in the set, print the prefix followed by the suffix.

I set about to write a program that performs these three steps and the following is what I managed to come up with:

from Bio.Seq import Seq

from Bio.Alphabet import IUPAC

def rev_comp(read):

seq = Seq(read, IUPAC.unambiguous_dna)

rev_seq = seq.reverse_complement()

return(str(rev_seq))

def mk_set(seqs):

S = set()

for i in seqs:

if i not in S:

S.add(i)

C = rev_comp(i)

if C not in S:

S.add(C)

return(S)

data = []

with open('rosalind_dbru.txt','r') as f:

for line in f:

data.append(line.strip())

s = mk_set(data)

for j in s:

print("(" + j[:-1] + ", " + j[1:] + ")")

The code can also be found on my GitHub along with the sample data and problem description.

Sara does Bioinformatics

Monday, 7 November 2016

Constructing a De Bruijn Graph

No comments:

Post a Comment