Sara does Bioinformatics: October 2016

Last week I finished going through the interactive edition of the book "How to Think Like a Computer Scientist". Even though I knew quite a bit of what was mentioned in the book, I learned quite a bit about Python programming and computer science in general. I would definitely recommend people with a similar background as me (coming from the biological sciences) to have a go at this book, and I especially found the parts on classes and objects useful.

When I was finished with the book I went back to working through the remaining Rosalind problems. I am currently working on "Creating a Character Table". It took me a while just to understand what they are after in this problem, but I finally figured it out and I have been able to make some progress on my program.

So, what they are after is the so called character table of an unrooted binary tree in Newick format. This table is formed of rows representing the character arrays of the tree. A character array is formed as follows:

Remove one edge of the tree so that two trees are formed (the new trees can't consist of only one taxon, i.e. the split must be nontrivial).
Assign one tree as the 0 tree (representing its taxons not having a specific trait), and the other tree as the 1 tree (representing its taxons having a specific trait). This is done arbitrarily.
Sort the taxons lexicographically and print their corresponding number (0 or 1).

The following code is the beginning of my solution to the problem. So far it opens and reads the data file. It then iterates over the tree (in string format) and finds where the splits are, i.e. where all '),' are located in the tree. I have also started writing code for a class that can couple the name of a taxon with its assigned value and then sort these lexicograpically.

class tree_index:

def __init__(self, n, v):

self.name = n

self.value = v

def read_tree(s):

for i in range(len(s)-1):

if s[i] + s[i+1] == '),':

count = 0

for j in range(len(s)):

if s[j] == '(':

count += 1

if s[j] == ')':

count -= 1

print(count)

with open('sampledata.txt', 'r') as f:

tree = f.read().strip()

read_tree(tree)

What I want to do next is to find a way to split the tree up into two trees each time '),' appears. I then want to couple the names of the taxons to the number of the tree they belong to by using the class tree_index. If I can then add a sort functionality to the class I think the problem would be solved. I am currently working on this and will return here when I manage to find a solution.

Until then!

Sara does Bioinformatics

Thursday 13 October 2016

Creating a Character Table (Finished!)

Wednesday 12 October 2016

Creating a Character Table