Monday 7 November 2016

Constructing a De Bruijn Graph

In this problem we are asked to find the  adjacency list corresponding to the De Bruijn graph constructed from a set of short reads and their reverse complements. To do this was actually a lot easier than I initially thought. After a bit of research on De Bruijn graphs and a bit of doodling, I figured out the following three steps to complete the task:


  1. Make the reverse complements and combine with the initial reads.
  2. Add all unique reads (whether they are reverse complements or the original reads) to a set.
  3. For each of the reads in the set, print the prefix followed by the suffix.
I set about to write a program that performs these three steps and the following is what I managed to come up with:


from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

def rev_comp(read):
    seq = Seq(read, IUPAC.unambiguous_dna)
    rev_seq = seq.reverse_complement()
    return(str(rev_seq))

def mk_set(seqs):
    S = set()
    for i in seqs:
        if i not in S:
            S.add(i)
        C = rev_comp(i)
        if C not in S:
            S.add(C)
    return(S)

data = []
with open('rosalind_dbru.txt','r') as f:
    for line in f:
        data.append(line.strip())

s = mk_set(data)
for j in s:
    print("(" + j[:-1] + ", " + j[1:] + ")")

The code can also be found on my GitHub along with the sample data and problem description.

No comments:

Post a Comment