The problem is linked to RNA folding and we are assuming that in the given RNA string, every nucleotide forms part of a base pair in the RNA molecule. We then look at the n long molecule as a graph containing n nodes where the A-nodes can form an edge (i.e base pair) with the U-nodes and the G-nodes can form an edge with the C-nodes. From this scenario we are asked to find all possible perfect matchings, where a matching is a set of edges in a graph where none of the edges include the same node. A matching is said to be perfect if for a graph of 2n nodes it contains n edges.
Given dataset:
>Rosalind_23
AGCUAGUCAU
(The sequence always contains equally many A's as U's and equally many G's as C's)
Expected Output:
12
What we need to realise is that what we are looking at can be viewed as two separate graphs, one for AU bonding and one for GC bonding. These two are complete bipartite graphs and we can describe the number of perfect matchings for each graph, if n1 = nr of A's and n2 = nr of G's, as n1! and n2!. To get the total amount of perfect matchings for the two graphs combined, we simply multiply these numbers. We thereby get:
matchings = n1! * n2!
All that is left to do now is to write a simple program to run the calculations:
from Bio import SeqIO
from math import factorial
sequence = ''
handle = open('sampledata.fasta', 'r')
for record in SeqIO.parse(handle, 'fasta'):
sequence = str(record.seq)
handle.close()
AU = 0
GC = 0
for nt in sequence:
if nt == 'A':
AU += 1
elif nt == 'G':
GC += 1
matchings = factorial(AU) * factorial(GC)
print(matchings)
No comments:
Post a Comment