Thursday, 14 July 2016

Enumerating k-mers Lexicographically

In this problem we  are given a set of characters and a positive integer n and we are asked to return all the strings of length n that can be formed from the characters (including duplicates). The answer should be sorted lexicographically according to the order that the characters are presented in the sample data (i.e. not alphabetically). Once again the built-in Python library itertools comes in handy. This time I used the function products because the problem asks for duplicates as well (that is, each letter may occur up to n times in a string an not only once). The following is the code I came up with:


import itertools                                  
n = 4                                             
s = ['H', 'U', 'P', 'M']                          
perm = itertools.product(s, repeat=n)             
for i, j in enumerate(list(perm)):                
    permutation = ''                              
    for item in j:                                
        permutation += str(item)                  
    with open('answer.txt', 'a') as text_file:    
        print(permutation.strip(), file=text_file)

Now, this code works just fine, and when I look at the results they match what is described in the problem description, but I can't seem to get my answers to pass. I have tried seven different datasets now, either pasting the answer into the browser or uploading the answer file. I have made sure there are no additional blank spaces or new lines and even tried concatenating the answer into one long string, but nothing I do seems to work. I have noticed that several people have had the same problem rather recently, which has lead me to believe that the problem is probably not from my side, but rather from Rosalind's side. Either way, I think I will just let this problem rest for a while and press on with the other problems instead.

EDIT:

The problem has finally been solved! Apparently, the answer SHOULD be sorted alphabetically, even though it says not to in the note below the problem statement. Also, according to someone in the questions section, the permutations should not be separated with new lines but instead with blank spaces. This is rather annoying because this is exactly what my first attempt at the program did, but then I read the note and changed it to the code above... Anyhow, I can now move on with my life. Below is the new code that produces results that are accepted by Rosalind, although that isn't what they asked for...

import itertools                                  
n = 4                                             
s = ['H''U', 'P', 'M']                          
perm = itertools.product(s, repeat=n)
answer = []                                       
for i, j in enumerate(list(perm)):                
    permutation = ''                              
    for item in j:                                
        permutation += str(item)                  
    answer.append(permutation)                    

sorted_answer = sorted(answer)                    
with open('answer.txt', 'a') as text_file:        
    print(*sorted_answer, sep=' ', file=text_file)

1 comment:

  1. Here is a much simpler solution.

    import itertools
    n = 2
    s='A C G T'
    lst = list(map(str,s.split()))
    x = list(itertools.product(lst, repeat=n))
    #print(x)
    #print(len(x))
    print(*[''.join(i) for i in x],sep='\n')

    ReplyDelete