Toy example showing how the rank-cutoff optimizer works. (A) Ranks of each strain in virtual two query and target yeast fitness profiles to be compared are supposed to be deposited in Fit-RankDB. These profiles are also supposed to be generated using a virtual yeast deletion library comprising strain a to j. (B) Efficient calculation of a match number (or an overlapped strain number) accumulated under all possible rank-cutoffs of the query and the target by Dynamic programming (see the details in the Methods). For this calculation, first, rank matches of each strain should be expressed as the match matrix (M). In the M matrix, its row represents 'ranks in the query', its column 'ranks in the target', and its value 'the strain number with same rank in the query and the target'. Then, the current accumulated match number (in red-colored cell in the A matrix) is calculated by adding the current match number (in the orange-colored cell in the M matrix) to the previous accumulated match number (sky-colored cell plus purple-colored cell minus gray-colored cell in the A matrix). In this way, the accumulated match numbers regarding to all possible rank-cutoffs are efficiently calculated and stored in the A matrix. (C) The matrix of cumulative hyper-geometric p-values (P) is filled by calculating the equation (2) as the objective function (Hp) regarding to all possible rank-cutoffs, and used to find the rank-cutoffs with the minimized p-value as described in the equation (3), called optimal rank-cutoffs. The A matrix provides all of the parameters needed for equations (2) and (3) as follows: Its values represent the overlapped strain number in the equation (2); its row-names, the query strain number; its column-names, the target strain number in their respective rank-cutoffs; and its column or row length, the size of population. When the maximal rank-cutoff is set to 10 in the toy example, the query rank-cutoff 5 and the target rank-cutoff 5 shows the minimal p-value, 0.004. At those optimal rank-cutoffs, overlapping significance (hyper-geometric p-value) and overlapping score (Tanimoto coefficients) can be expressed as the similarity between the query and the target.
Lee et al. BMC Genomics 2013 14(Suppl 1):S6 doi:10.1186/1471-2164-14-S1-S6