|
Resolution: standard / high Figure 4.
Mutual information analysis in homolmapper. The process used to evaluate and report mutual information in homolmapper is shown
from the MSA (1) to final analysis (6) using an alignment of 75 heme oxygenases and
the crystal structure of rat heme oxygenase (1DVE, [55]) for illustrative purposes.
A portion of the MSA is shown in (1), with residues 136 (blue) and 140 (red) highlighted.
The matched sequence is ho1rat, and the total alignment is 684 positions long. Calculation
of mutual information begins with calculation of the Shannon entropies Hi, Hj, Hk for all single positions i, j, k in the alignment [23]. Next, following the method of Gloor and co-workers [36], joint
entropies Hij, Hik for all positions are calculated from the distribution of paired outcomes (2). Diagonal
elements in this joint-entropy matrix are set to zero. The raw mutual information
values are then calculated (3) by subtracting the joint entropy at each pair of positions
from the sum of the single position entropies (Hi + Hj - Hij), with the diagonal elements being kept at zero. Next, the raw mutual information
scores can be normalized (4) by dividing by the joint entropy [36], the sum of the
position entropies (redundancy), or neither. The resulting scores are converted to
Z-scores (distance from the mean in standard deviations) for analysis. Maximum Z-score
is reported to the B-factor field of the output PDB file for all residues (5). If
this maximum Z-score is below a threshold value (by default 5, but user-controllable),
a SegID of 'nast' (nothing above significance threshold) is assigned, as is seen in residues 137-139 in the example. Residues that
exhibit a maximum Z-score above the cutoff value have the residue number associated
with that score reported in SegID. Such residues are considered to belong to mutually
informative groups, and the remaining homolmapper output fields (element and occupancy)
are used to provide information about the group. The number of residues in the group
is reported to element, and the sum of their residue numbers is reported to occupancy.
Thus, in this example, residues 136 and 140 are mutually informative and are the only
members of the group. The Z-score is reported to B-factor (5.29), and each residue
has the other residue number reported to SegID. The element field for these two residues
is 2, because there are two residues in the mutually informative group, and the occupancy
field is 276 (= 136 + 140). This reporting scheme permits information about mutually
informative positions in the alignment that fall outside of the structure to be reported
nevertheless. It is also possible to punch out the final matrix of Z-scores and the
normalized matrix of mutual information values for the full alignment for further
analysis. The joint-entropy matrix is punched out by default to permit rapid reruns
with different threshold values or different normalizations. In (6), the output PDB
file is shown at a cutoff of 5 (left). Residues 136 (blue) and 140 (red) are colored
by SegID and are immediately adjacent. If the threshold is lowered to 3.75 (center),
additional residues are detected. The mutually informative residues in this case are
colored by occupancy. By examining the significant interactions in the structure or
in the text file that details all significant hits, one can construct a diagram of
the interactions and their Z-scores (right). Residues 136 and 140 are part of a larger
network at the lower threshold. VMD [16], Stride [53], and homolmapper were used to
prepare the structural panels.
Rockwell and Lagarias BMC Bioinformatics 2007 8:123 doi:10.1186/1471-2105-8-123 |