|
Resolution: standard / high Figure 2.
Toy example showing how the rank-cutoff optimizer works. (A) Ranks of each strain in virtual two query and target yeast fitness profiles
to be compared are supposed to be deposited in Fit-RankDB. These profiles are also
supposed to be generated using a virtual yeast deletion library comprising strain
a to j. (B) Efficient calculation of a match number (or an overlapped strain number)
accumulated under all possible rank-cutoffs of the query and the target by Dynamic
programming (see the details in the Methods). For this calculation, first, rank matches
of each strain should be expressed as the match matrix (M). In the M matrix, its row
represents 'ranks in the query', its column 'ranks in the target', and its value 'the
strain number with same rank in the query and the target'. Then, the current accumulated
match number (in red-colored cell in the A matrix) is calculated by adding the current
match number (in the orange-colored cell in the M matrix) to the previous accumulated
match number (sky-colored cell plus purple-colored cell minus gray-colored cell in
the A matrix). In this way, the accumulated match numbers regarding to all possible
rank-cutoffs are efficiently calculated and stored in the A matrix. (C) The matrix
of cumulative hyper-geometric p-values (P) is filled by calculating the equation (2)
as the objective function (Hp) regarding to all possible rank-cutoffs, and used to
find the rank-cutoffs with the minimized p-value as described in the equation (3),
called optimal rank-cutoffs. The A matrix provides all of the parameters needed for
equations (2) and (3) as follows: Its values represent the overlapped strain number
in the equation (2); its row-names, the query strain number; its column-names, the
target strain number in their respective rank-cutoffs; and its column or row length,
the size of population. When the maximal rank-cutoff is set to 10 in the toy example,
the query rank-cutoff 5 and the target rank-cutoff 5 shows the minimal p-value, 0.004.
At those optimal rank-cutoffs, overlapping significance (hyper-geometric p-value)
and overlapping score (Tanimoto coefficients) can be expressed as the similarity between
the query and the target.
Lee et al. BMC Genomics 2013 14(Suppl 1):S6 doi:10.1186/1471-2164-14-S1-S6 |