Wednesday 24 August 2016

Introduction to Alternative Splicing

In this problem we continue looking at subsets. This time we imagine them as exons and want to know how many different combinations we can form from subsets of the set. We are given two integers n and m, which are the length of the set ([1,2,3,4,5,6] in the sample case) and the minimum length of the subsets to be formed. The order of the exons cannot be altered, so for n=3 and m=2 we would get the four subsets [1,2], [1,3], [2,3] and [1,2,3]. The answer should be given modulo 1000000.

Sample Dataset
6 3

Expected Output
42

The problem can quite easily be solved by counting all the different combinations of each length from m to  n, and then summarising them to form the final answer. The number of combinations can be calculated as n!/(k!(n-k)!), where k is the length of the subset. In python3 code this can look like:

from math import factorial
n, m = 6, 3
subsets = 0
for k in range(m, n + 1):
    subsets += (factorial(n) // (factorial(k) * factorial(n - k)))
print(subsets % 1000000)

At first I used / instead of //, which worked fine for the sample dataset, but when I downloaded another dataset the numbers were bigger and therefore they got too big for float division and I got the error message "OverflowError: integer division result too large for a float". A quick Google search took me to this thread on Stack Overflow that solved my problem.

No comments:

Post a Comment