Problem 1 - Counting DNA Nucleotides
This problem was fairly straightforward. You are to count how many of each nucleotide a given DNA sequence contains. I managed to solve it quickly and the solution was accepted at my first attempt with a real dataset.The first thing I needed my program to do was to read the input file containing the sequence. I remembered that in the section on code style at the Hitchhiker's Guide to Python they recommend using with open as this automatically closes the file for you. When the program had opened the file, I needed it to read each letter of the string in the file separately and then count how many of each nucleotide the string contained. The program was then to print these four numbers with a blank space in between. The following is the code I came up with:
Acount = 0
Ccount = 0
Gcount = 0
Tcount = 0
with open('sampledata2.txt') as f:
for line in f:
for nt in line:
if nt == 'A':
Acount = Acount + 1
if nt == 'C':
Ccount = Ccount + 1
if nt == 'G':
Gcount = Gcount + 1
if nt == 'T':
Tcount = Tcount + 1
print(Acount, Ccount, Gcount, Tcount)
In the first section of the code I define my variables that I then later use to count each of the different nucleotides. The program opens the file named sampledata2.txt, which is the file containing the actual dataset from Rosalind. It then iterates first over each line in the file and then over each letter in the lines. For each letter (named nt in my program) it then checks it the letter is an A and if that's the case it adds 1 to the variable Acount. If the letter is not A it checks if it's a C, and so on. When the iteration is done the program prints the value of the counter variables with a space between them.
Looking at the program once more, I think it might become more efficient if I changed the if statements for C and G to elif and the if statement for T to else. This doesn't feel necessary for the small datasets given for the problem, but it might be worth remembering in larger projects in the future.
Another improvement that could be made to the program is to wright the answer to a file. This didn't feel that necessary for such a simple answer, but I chose to do this in problem 2, which I will describe in my next post.
If you have any questions about what you have just read, or have any suggestions or tips on how I can improve my code, please don't hesitate to leave a comment below!
No comments:
Post a Comment