Department of Ecology and Evolutionary Biology, University of Michigan, 2019 Kraus Nat. Sci. Bldg., 830 North University Ave, Ann Arbor 48109-1048, Michigan, USA

Howard Hughes Medical Institute, Ann Arbor, Michigan, USA

Department of Microbiology, Division of Medical Parasitology, New York University School of Medicine, New York City, New York, USA

Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Abstract

Background

The primary target of the human immune response to the malaria parasite

Results

Conclusions

The distinction between rosetting versus impaired consciousness associated

Background

The main target of the human immune response to

Thousands of distinct

The most comprehensive DBLα tag dataset currently available was previously analyzed by Warimwe et al.

Here we address whether it is possible to capture more of the phenotypically relevant genetic diversity within a

Methods

Homology block nomenclature

The DBLα homology blocks discussed here are those described in Rask et al.

Data and HB assessment of sequences

The expressed sequences and the clinical data for 250 isolates (217 symptomatic, 33 asymptomatic) were obtained from the online supplementary information of

Linkage analysis of HBs in genomic sequences

Linkage analysis was based on the linkage disequilibrium coefficient, D, among HBs within the 53 genomic isolates. The statistical significance for D values is determined by the method described in
^{2}, where p and q are the frequencies of the two HBs being analyzed for linkage.

**Additional figures. Figure S1.** Respiratory distress (RD) as a function of host age and rosetting. **Figure S2.** HB composition of known rosetting **Figure S3.** Linkage disequilibrium coefficient (D) values for all pairs of HBs in the genomic dataset. **Figure S4.** Community partition of weighted linkage network of HBs. **Figure S5.** HB-HB expression rate correlation matrix. **Figure S6.** Model of respiratory distress. **Figure S7.** Relationship between rosetting and respiratory distress. **Figure S8.** Relationship between impaired consciousness and the expression of various **Figure S9.** The best fit relationship between six variables and rosetting using a window analysis. **Figure S10.** Relationship between rosetting and expression rates of **Figure S11.** PC-classic **Figure S12.** PC-HB relationships. **Figure S13.** Principal components in data space. **Figure S14.** The amount of variation explained by each PC. **Figure S15.** PCA for two subsets of the data. **Figure S16.** Representation of select homology blocks. **Figure S17.** HB-classic

Click here for file

HB expression rate

The HB expression rate for a given isolate was defined as follows: the number of HBs of a certain type found within the expressed sequences of a given isolate (the expressed sequences consist of each unique expressed sequence represented as many times as it is found within that isolate), divided by the total number of expressed sequences for that isolate.

Phenotype association networks

For the purposes of creating phenotype association networks, we analyzed the 217 symptomatic isolates within the dataset. For continuous phenotypes, we included in the network any significant correlation or rank correlation between a phenotype and an HB/

Transformation of expression rates and rosetting level

Prior to performing all linear and logistic regression analyses, the expression rates for particular

Principal component analysis

A PCA was carried out on a dataset of the HB expression rate profiles for the 217 symptomatic isolates. The expression rate profile is the set of expression rates for all 29 HBs for a given isolate. A PCA defines differentially expressed HB components—i.e., orthogonal principal components (PCs). Network analyses and phenotype correlation tests were then carried out using these PCs as independent variables. To test the robustness of the PCA results, we repeated the PCA using non-overlapping subsets of isolates.

Modeling genotype-phenotype associations

Phenotype correlation tests consisted of multiple linear and logistic regression models, similar to the tests performed in
^{2} and Adjusted R^{2} were all used to compare the quality of alternative models. Where indicated, host age was included as an independent variable even where it did not appear to have a significant effect in order to eliminate the potential for observing spurious correlations resulting from co-correlation with this variable, since many weak correlations between disease phenotype and host age have been reported previously (e.g.,

Variable selection to optimize models of rosetting

To select a set of independent variables that produce the most informative model of rosetting, we started with many possible independent variables in a multiple linear regression model, and then successively removed the least significant contributing variable, excluding host age, until the BIC stopped decreasing. We then verified that the BIC increased with the removal of any of the final independent genetic variables. The BIC, AIC, R^{2} and adjusted R^{2} scores for the final models after removing host age were also evaluated. Most variable selection procedures were also carried out under the scenario where host age is removed as soon as it is the least significant contributing variable, and in all cases examined this had no influence on the variable selection results.

Identifying rosetting associated HBs or PCs

Warimwe et al. test whether particular expression rates can significantly reduce the explanatory power of rosetting on RD as a means to identify a group of

Results and discussion

Using HBs to classify var types within a local population

Many of the HBs identified in this dataset were also found in the genome of the chimpanzee malaria parasite

The genomic

The homology block architecture of DBLα tags

**The homology block architecture of DBLα tags. (A)** The architecture of a **(B)** The output from Vardom Server

For the dataset of cDNA

**diversity within a local population is captured by homology blocks. (A)** Frequency of each HB in the dataset of genomic **(B-C)** The pairwise similarity among sequence types, where types are defined by homology block composition: the number of HBs shared between any two sequences divided by the average number of HBs within a sequence for those two sequences. **(B)** Frequency distribution of pairwise HB similarities between sequences in the genomic dataset. The approximately normal distribution contrasts with the bimodal distribution that has been observed for other data, when pairwise similarity is defined by amino acid identity
**(C)** Sequences are hierarchically ordered based on pairwise HB similarity using the average-linkage method as implemented in SciPy. The distinction between sequence tags containing two cysteines (cys2) versus four (cys4) is very clear, reflecting that recombination occurs at a faster rate within, relative to between, the two groups.

While the diversity of HB-types is almost an order of magnitude less complex than the diversity of aa-types, the former is nevertheless considerable and potentially functionally informative (Figure

Two HB sub-networks: associated with severe versus mild spectrum disease

**Two HB subnetworks: associated with severe versus mild spectrum disease.** HB networks reveal two discrete HB subsets—one being associated with severe spectrum phenotypes (orange) and the other being associated with mild spectrum phenotypes (blue). **(A)** The network of significant positive linkage disequilibrium coefficients (D) among HBs in the genomic dataset, based on a one-tailed significance threshold of p ≤ .025, reveals two subnetworks of linked HBs. **(B)** The network of significant associations between HB expression rates and phenotypes (p ≤ 0.05) with nodes colored according to the subnetworks of A. The HBs in the orange subnetwork are generally associated with severe disease spectrum phenotypes, whereas those in the blue subnetwork are generally associated with mild. The lack of connectivity between the severe and mild spectrum phenotypes in A is highly significant: even just considering the nodes of degree 3 or less, p < 0.0001 for the fact that each HB in the network is associated with mild or severe spectrum phenotypes, but not both. SMA = severe malarial anemia, Rosett = rosetting, RD = respiratory distress, Severe = severe disease, Mild = mild disease, Older = high host age, Younger = low host age, Par = parasitemia, BGlu = blood glucose (low levels indicate hypoglycemia), BaseE = base excess (low levels indicate metabolic acidosis), AB = antibody response.

Defining groups of associated HBs through linkage or phenotype correlation networks

With genomic samples, groups of HBs can be defined based on analyzing genomic

Using expression data, we can measure the expression rate for each HB in each isolate, and we observe many correlations among HB expression rates (Additional file

Distinguishing two subsets of A-like var tags with different phenotype correlations

Earlier analysis of the data by Warimwe et al. established that, while A-like

Two subsets of A-like

**Two subsets of A-like ****genes differently associated with severe disease.** Prior analyses by Warimwe et al.

**Further explanation of methods.**

Click here for file

In an attempt to identify this hypothesized class of

Next we addressed whether any HBs can provide additional information about rosetting, beyond what is already captured by classic ^{2} and adjusted R^{2}) to determine the benefit of the particular HB expression rate to the model (Additional file
^{2} and provide an insignificant contribution to predicting rosetting (p>> 0.05), two HBs make improvements to the model and have significant p-values even within these over-parameterized models. HB 204 substantially reduces the BIC (from 50.72 down to 48.62), and substantially increases the adjusted R^{2} (from 0.348 up to 0.376). HB 54 is the only other HB to reduce the BIC and increase the adjusted R^{2} of the original model, however it only brings the BIC down slightly (to 50.65) and the adjusted R^{2} up slightly (to 0.367) (Additional file

**Additional tables. Table S1.** Multiple regression models of rosetting that include an HB expression rate as an independent variable. **Table S2.** Multiple regression models of rosetting that include an HB expression PC as an independent variable. **Table S3.** Statistics for multiple regression models predicting rosetting with and without age.

Click here for file

Variable selection to achieve a model of rosetting

In order to identify what genetic variation best explains the variation observed in rosetting, we performed a variable selection procedure to find the optimal set of independent variables for a multiple regression model of rosetting. Three tests were performed, which together show that HB 219 is a better predictor of rosetting than any of the classic

**Independent variables**

**AIC**

**BIC**

**R2**

**Adj. R2**

*The result of removing the least significant genetic variable, one by one, from models of rosetting that start with the expression rates of: (row A) the 7 classic

**A**

**Cys2**, Grp2, Grp3, **BS1CP6**

20.14

37.40

0.358

0.338

**B**

HB36, HB204, HB210, **HB219**, **HB486**

16.48

36.60

0.385

0.361

**C**

**BS1CP6**, HB54, HB171, HB204, **HB219**

14.02

34.14

0.400

0.373

**D**

**BS1CP6, PC1, PC3, PC4, PC22**

4.776

24.90

0.438

0.415

In a first test, we start with a model that initially includes all seven classic

In a second test we start with all 29 HB expression rates plus host age as independent variables and then we follow the same variable selection procedure. In this case the resulting model is one with HB 36, HB 204 and HB 210 as negative predictors of rosetting, and HB 219 and HB 486 as positive predictors of rosetting (BIC = 36.60) (row B in Table

In a third variable selection test we start with all 29 HB expression rates in addition to the expression rates for all seven classic

Two additional anecdotes provide further credibility to our finding that HB 219 expression rate is a robust positive predictor of rosetting: First, we find that in all of the nine cases where there is rosettting data for an isolate that has HB 219 present in its most highly expressed sequence, considerable rosetting is observed (defined as > 0.1). Secondly, we find that the DBLα domains of known rosetting

Based on a comparison of the BIC scores of the models that result from the above variable selection procedures (Table

Principal components of HB expression rate profiles and variation in rosetting

We perform a PCA on the HB expression rate profile, which we define as the set of expression rates for all 29 HBs. This deconstructs the HB expression rate profiles into orthogonal principal components (PCs) based on how they vary across different isolates. We then repeat the above network and variable selection analyses using PCs in place of individual HB expression rates (Additional file

We find that PC 1 is related to the cys2 versus non-cys2 distinction (Figure

Principal components of HB expression rate profiles

**Principal components of HB expression rate profiles. (A)** PCs expressed as coordinates in the original space of the data, which is the expression rates for all 29 HBs for each of the isolates. The amount of variation among the isolates that is explained by each of the PCs is shown on the right. **(B)** The PCA of HB expression rate profiles reflects the differentially expressed HB components, and the first PC defines the extent to which there is a bias toward the expression of **(C)** PC1 (and Cys2 **(D)** PC1 (and Cys2 **(E)** The network of significant correlations between HB expression rate profile principal components (PCs) and disease phenotypes (p ≤ 0.05). SMA = severe malarial anemia, Rosett = rosetting, RD = respiratory distress, Severe = severe disease, Mild = mild disease, Older = high host age, Younger = low host age, Par = parasitemia, BGlu = blood glucose (low levels indicate hypoglycemia), BaseE = base excess (low levels indicate metabolic acidosis), AB = antibody response.

We address whether the PCs provide additional information about rosetting beyond what can be predicted based on the expression rates of the classic ^{2} (from 0.348 to 0.378) (Additional file

The above findings suggest that, regarding the rosetting pattern, PC 3 provides qualitatively different information from any of the classic

Next we perform a variable selection procedure to address whether an optimized model of rosetting will contain PCs or classic

The PC-containing models have much lower BIC scores and higher adjusted R^{2} values compared to all other models (row D in Table

The principal components improve phenotype prediction, but they are less straightforward to interpret than individual HB expression rates. Nevertheless, our results demonstrate that PC 1 clearly corresponds to the major division found by network analyses, severe and mild spectrum associated

Furthermore, the various correlations between phenotypes and PCs, and between the expression rate of various sequence types and PCs, can be summarized in networks, which can provide additional means to interpret the PCs (Figure

The consistency of HB-phenotype associations in distinct populations

HB analysis of a smaller dataset from Mali that was originally analyzed by Kyriacou et al.

For the Malian dataset

Among the unique set of sequences expressed within the cerebral and hyperparasitemia isolates, the rank correlations (both Spearman and Kendall) of rosetting with each of HB 60, 79, 153, and 219 are all greater in magnitude than the rank correlation of rosetting with cys2. These several HBs are also associated with rosetting in the Kenyan dataset

Conclusions

Even though the HBs were designed using a very small number of

All of the HBs within the optimized rosetting model (HBs 171, 204, 54 and 219; row C in Table

HB 204 expression rate is a significant negative predictor of rosetting regardless of the details of the model. However, its expression is positively correlated with the expression of cysPoLV group 2 tags (correlation coefficient = 0.434, p < 10^{-10}), which are by definition cys2. CysPoLV group 2

Warimwe et al. put forward the hypothesis that there are at least two classes of A-like

HB 219 is also interesting because, while its expression is correlated with cysPoLV group 1 expression (Additional file

Within the Kenyan population that is the focus of this study, HB expression rates (and to an even greater extent, PCs of HB expression rate profiles) improve our ability to differentiate mild versus severe spectrum

In summary, HB typing methods allow for the construction of more specific genotype-phenotype models that in turn suggest that two distinct molecular mechanisms underlie severe malaria. Specifically, we find that

Lastly, HBs have the potential to elucidate complex ecological and evolutionary dynamics that potentially shape antigenic diversity within

Competing interests

The authors declare no competing interests.

Authors’ contributions

MMR conceived of the study, carried out the analysis and wrote the manuscript. KPD, MP and TSR contributed to the study design and critically revised the manuscript. EBB contributed to the data analysis and critically revised the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We thank Donald S. Chen and Yael Artzy-Randrup for helpful input related to this work. MP is an Investigator at Howard Hughes Medical Institute. EBB was supported by a Department of Energy Computational Science Graduate Fellowship (grant DE-FG02-97ER25308).