Sample Dataset
>Rosalind_6431
CTTCGAAAGTTTGGGCCGAGTCTTACAGTCGGTCTTGAAGCAAAGTAACGAACTCCACGGCCCTGACTACCGAACCAGTTGTGAGTACTCAACTGGGTGAGAGTGCAGTCCCTATTGAGTTTCCGAGACTCACCGGGATTTTCGATCCAGCCTCAGTCCAGTCTTGTGGCCAACTCACCAAATGACGTTGGAATATCCCTGTCTAGCTCACGCAGTACTTAGTAAGAGGTCGCTGCAGCGGGGCAAGGAGATCGGAAAATGTGCTCTATATGCGACTAAAGCTCCTAACTTACACGTAGACTTGCCCGTGTTAAAAACTCGGCTCACATGCTGTCTGCGGCTGGCTGTATACAGTATCTA
CCTAATACCCTTCAGTTCGCCGCACAAAAGCTGGGAGTTACCGCGGAAATCACAG
Expected Output
4 1 4 3 0 1 1 5 1 3 1 2 2 1 2 0 1 1 3 1 2 1 3 1 1 1 1 2 2 5 1 3 0 2 2 1 1 1 1 3 1 0 0 1 5 5 1 5 0 2 0 2 1 2 1 1 1 2 0 1 0 0 1 1 3 2 1 0 3 2 3 0 0 2 0 8 0 0 1 0 2 1 3 0 0 0 1 4 3 2 1 1 3 1 2 1 3 1 2 1 2 1 1 1 2 3 2 1 1 0 1 1 3 2 1 2 6 2 1 1 1 2 3 3 3 2 3 0 3 2 1 1 0 0 1 4 3 0 1 5 0 2 0 1 2 1 3 0 1 2 2 1 1 0 3 0 0 4 5 0 3 0 2 1 1 3 0 3 2 2 1 1 0 2 1 0 2 2 1 2 0 2 2 5 2 2 1 1 2 1 2 2 2 2 1 1 3 4 0 2 1 1 0 1 2 2 1 1 1 5 2 0 3 2 1 1 2 2 3 0 3 0 1 3 1 2 3 0 2 1 2 2 1 2 3 0 1 2 3 1 1 3 1 0 1 1 3 0 2 1 2 2 0 2 1 1
Regardless of the given sequence, the possible 4-mers are always the same and we can generate them with the following code:
import itertools
nt = 'ACGT' #Use this order of nt to get correct order later without sorting
permutations = itertools.product(nt, repeat=4)
kmers = []
for i, j in enumerate(list(permutations)):
kmer = ''
for item in j:
kmer += str(item)
kmers.append(kmer)
This gives us a list of all the possible 4-mers in alphabetical order (note that the funktion sorts the permutations in the order the letters are listed in. If nt is not entered alphabetically you will need to sort kmers).
Now we can extrakt the sequence from the FASTA file and use regex to find all the occurrences of the k-mers. Remember to use ?= in the pattern to include overlapping k-mers:
import re
from Bio import SeqIO
record = SeqIO.read('sampledata.fasta', 'fasta')
sequence = record.seq
A = []
for k in kmers:
occurence = 0
pattern = re.compile(r'(?=(' + k + '))')
for l in re.findall(pattern, str(sequence)):
occurence += 1
A.append(occurence)
print(*A, sep=' ')
No comments:
Post a Comment