Monday 15 August 2016

Ordering Strings of Varying Length Lexicographically

In this problem we are given a string representing an alphabet. We are also given an integer n ≤ 4. We are to return a list of all strings of length at most n that can be formed from the alphabet string. Repeats should be included and the strings should be sorted lexicographically according to the given alphabet (and this time they really should, as opposed to in “Enumerating k-mers Lexicographically”).

Sample Dataset
D N A
3

Expected Output
D
DD
DDD
DDN
DDA
DN
DND
DNN
DNA
DA
DAD
DAN
DAA
N
ND
NDD
NDN
NDA
NN
NND
NNN
NNA
NA
NAD
NAN
NAA
A
AD
ADD
ADN
ADA
AN
AND
ANN
ANA
AA
AAD
AAN
AAA

To begin with this problem is fairly similar to “Enumerating k-mers Lexicographically”, so I started by having a look at what I wrote there. As in that problem, I decided to use itertools, but this time, since the answer should include repeats, I used product instead. I also put the creation of the permutations into a for-loop in order to get all the different lengths, as you can see below. This generated a nested list, which I then flattened into a single list using the itertools function chain(). Then all that was left to do was to sort and print the permutations to a file. In order to sort them, I made use of the function sorted() and used a lambda function as key. Here is an explanation of the lambda function in Python. The final code can be seen below:

import itertools
n = 3
alphabet = 'DNA'
perm = []
for i in range(1, n + 1):
    perm.append(list(map(''.join, (itertools.product(alphabet, repeat=i)))))
permutations = list(itertools.chain(*perm))
srt_perm = sorted(permutations,
                  key=lambda word: [alphabet.index(c) for c in word])
with open('answer.txt', 'a') as f:
    for j in srt_perm:
        f.write('%s\n' % j)

No comments:

Post a Comment