Open Access Highly Accessed Research article

Structure and evolution of barley powdery mildew effector candidates

Carsten Pedersen1, Emiel Ver Loren van Themaat2, Liam J McGuffin3, James C Abbott4, Timothy A Burgis4, Geraint Barton4, Laurence V Bindschedler56, Xunli Lu2, Takaki Maekawa2, Ralf Weßling2, Rainer Cramer5, Hans Thordal-Christensen1, Ralph Panstruga27* and Pietro D Spanu4*

Author Affiliations

1 Department of Agriculture & Ecology, Plant and Soil Science, University of Copenhagen, Copenhagen, Denmark

2 Department of Plant Microbe Interactions, Max-Planck Institute for Plant Breeding Research, Cologne, Germany

3 School of Biological Sciences, University of Reading, RG6 6AS, UK

4 Department of Life Sciences, Imperial College London, Sir Alexander Fleming Building, London, SW 7 2AZ, UK

5 Department of Chemistry, University of Reading, RG6 6AD, UK

6 Present address: School of Biological Sciences, Royal Holloway University of London, Egham, UK

7 Unit of Plant Molecular Cell Biology, Institute for Biology I, RWTH Aachen University, Worringer Weg 1, Aachen, D-52056, Germany

For all author emails, please log on.

BMC Genomics 2012, 13:694  doi:10.1186/1471-2164-13-694

Published: 11 December 2012

Additional files

Additional file 1:

Summary of all CSEPs. The table includes for all 491 CSEPs various types of protein and gene expression data. The table is sorted according to the MCL family of paralogs to improve the overview of the properties of the different families. Footnotes: 1) The CSEPs described previously [10] are in light blue cells and the new CSEPs are in light red cells. 2) The gene Ids are as published [10] and in Blugen database (http://www.blugen.org webcite) 3) Signal peptide predicted with SignalP 4) BLASTP homologies to genomic sequence data [10] 5) InterProScan gene ontologies (http://www.ebi.ac.uk/Tools/pfa/iprscan/ webcite) 6) Only those having structural models belonging to RNases are included 7) IntFOLD model scores 8) Position for the first YxC-motif in the mature protein 9) Disulphide bonds predicted using Disulfind (http://disulfind.dsi.unifi.it/). The positions are for the bond-forming cysteine pairs in the mature protein 10) The ratio of expression in haustorial epidermal strips versus epiphytic material 5 dpi determined by RNA-sequencing 11) The columns Q to Z show the presence of the CSEPs in the EST libraries described in Additional file 8.

Format: XLSX Size: 383KB Download file

Open Data

Additional file 2:

Size distribution histogram of MCL families. A: Number of families with a given family size. B: Number of CSEPs in families with a given family size.

Format: PDF Size: 22KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Analysis of selection on CSEPs. The table shows the full data set from the analyses of positive and purifying selection for all 72 CSEP families. Footnotes: 1) Indicates whether the family has the YxC-motif in the N-terminus of the mature protein. The symbol ½ indicates that some members have and others do not have the motif. 2) The presence of a cysteine close to the C-terminus and the distance to the C-terminus 3) Conserved cysteines are in the mature protein. In some cases there are a few members which are truncated and therefore lacking the terminal cysteine, but in the table it is counted anyway 4) Length of proteins: The average lengths of the proteins were calculated for each family. If the average length was below 150 amino acids, it was coloured light green, if the average length was more than 300 it was coloured grey 5) Gene expression ratio in haustorial samples versus epiphytic samples and calculated as averages for each family. Colour codes: Orange: >100x, yellow: 50-100x, light yellow: 10-50x 6) Percentages of CSEPs in each family found only in haustoria samples by proteome analysis 7) Codon-based test of positive and purifying selection. The two left columns show the numbers of pairs with significant positive selection (z-tests at 5% level) compared to the total number of pairs within each family. The two right columns show the values of P less than 0.05 that are considered significant at the 5% level (modified Nei-Gojobori (assumed transition/transversion bias = 1)). The test statistic (dN - dS) and (dS - dN) are shown for positive and purifying selection respectively. dS and dN are the numbers of synonymous and nonsynonymous substitutions per site, respectively. 8) Codon-based calculations of positive and purifying selection using the Selecton-server and based on a Bayesian inference approach [49]. The left column indicate the number of codons under positive or purifying selection. The middle column shows the significant levels of model M8a versus model M8. The right column shows the average Ka/Ks-values calculated on the mature proteins. Pink: Purifying selection Ka/Ks<0,75, yellow - orange: Positive selection, stronger colour means stronger positive selection 9) Ka/Ks-value based on method 7 of Liberles [52] and calculated by service at the Bergen Center for Computational Science (http://services.cbu.uib.no/tools/kaks). The Ka/Ks-values are calculated on each branch point on a calculated binary cladogram.

Format: PDF Size: 110KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

CSEP bootstrap consensus tree showing CSEPs present in the eight largest MCL families visualized by colour codes. Yellow - Family 1; red - Family 2; blue - Family 3; green - Family 4; purple - Family 5; light blue - Family 6; grey - Family 7; green-blue - Family 8. Numbers at branches indicate bootstrap support on the basis of 100 replicates. The scale denotes the number of amino acid substitutions per site.

Format: PDF Size: 2.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

CSEP bootstrap consensus tree showing CSEPs with Blast2Go hits. Light blue: Ribonucleases: red - coiled coil; yellow, pink and light green are other types of (uncharacterized) domains. Numbers at branches indicate bootstrap support on the basis of 100 replicates. The scale denotes the number of amino acid substitutions per site.

Format: PDF Size: 2MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

CSEP bootstrap consensus tree showing CSEPs conserved in E. pisi and G. orontii . Highlighted are CSEPs with a recognizable hit (TBLASTN, e< 10-05) in the E. pisi and/or G. orontii genome. Colour code: blue - G. orontii, yellow - E. pisi, green - both G. orontii and E. pisi. Numbers at branches indicate bootstrap support on the basis of 100 replicates. The scale denotes the number of amino acid substitutions per site.

Format: PDF Size: 2MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

CSEPs with relationships to ribonucleases. Seventy-one CSEPs showing relationship to ribonucleases were identified by either InterProScan analysis for the identification of functional domains or by structural annotation through analysis of structural templates from IntFOLD predictions. CSEPs are sorted according to family number.

Format: PDF Size: 42KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 8:

The B. graminis EST sources that provide evidence for expression of the CSEPs. A total of almost 52000 EST sequences were searched, but some of the libraries were mixed with barley transcripts and the total number of fungal transcript therein is unknown. The number of CSEP in the table indicates how many of the CSEPs we found represented in the different EST projects. However, in many cases there were several hits, so the number of CSEP ESTs is much larger. The EST library with most CSEP hits is the epidermal EST made from epidermal cells containing many haustoria but no other fungal material [6], and here we found 151 different CSEPs, but the total number of CSEP ESTs was 1299, which was 20% of the total number of fungal transcripts.

Format: PDF Size: 16KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 9:

CSEP expression plot. Plot of sorted haustorial versus epiphytic expression ratios of the 349 CSEPs with a ratio above 2 or below 0.5 and where the expression levels are high enough to calculate a reliable ratio. The plot shows that 216 CSEPs are expressed ≥10-times more in haustoria than in epiphytic tissues. The y-axis is log10-scaled.

Format: PDF Size: 58KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 10:

Level of diversity at the nucleotide level in pairwise comparisons between members of three CSEP families. Diversity was calculated as percentage of different nucleotides for the two exons, the intron and the 500 bp up- and downstream to the coding region. In case there is no homology in parts of the up- and downstream regions only the homologous region was used for the calculation.

Format: PDF Size: 37KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 11:

Protein structure and positive selection in CSEP family 21. A: Amino acid alignment of the seven members obtained with CLC main workbench (see Methods). B: Evidence for selection on the paralog members of family 21 was estimated using the Selecton server ([49,50]; http://selecton.tau.ac.il/). Codon sites under positive diversifying (red) or purifying (purple and yellow) selection and conserved cysteines (yellow) are indicated by coloured circles. C: Cladogram with Ka/Ks-values indicated for the individual branches calculated using the on-line server at http://services.cbu.uib.no/tools/kaks. D: 3D protein models of two family 21 members are shown and the amino acids under positive diversifying selection are highlighted in red.

Format: PDF Size: 288KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 12:

Distribution of codons under selection in selected CSEP families. Protein sequences and distribution of the amino acids under positive and purifying selection in families 1–35. The residues are coloured according to their calculated Ka/Ks-values, estimated using the Selecton server ( [49,50] http://selecton.tau.ac.il/). Codon sites under positive diversifying (red) or purifying (purple and yellow) selection are highlighted. The conserved cysteines are shown in yellow.

Format: PDF Size: 538KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 13:

Graphs of the distribution of codons under selection. Thirteen CSEP families with amino acid sites under positive selection (orange) are represented. The most conserved positions are shown in pink with the conserved cysteines in yellow. The y-axis is the Ka/Ks-value and the x-axis is the position in the protein including the signal peptide, which is mainly under purifying selection. The Ka/Ks-values were calculated using the Selecton server ( [49], http://selecton.tau.ac.il/).

Format: PDF Size: 116KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 14:

CSEP amino acid alignments of families 1–35. The proteins are aligned using CLC main workbench, as described in Methods.

Format: PDF Size: 7.9MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 15:

Summary of data obtained for Blumeria and yeast data sets. This compilation is based on previously published data shown in grey [8] with the new sets and measures added. The CSEPs, Known_Fungal_Effectors and Haustoria_only sets have the lowest values in terms of: mean lengths, mean proportion disorder, mean maximum length of disorder, mean model quality and mean number of domains. In addition these sets have a higher proportion of top hits to ribonuclease and hydrolase structural templates.

Format: PDF Size: 23KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 16:

Calculated p-values for unpaired Wilcoxon signed rank sum tests for the CSEP data set. The table shows the p-values for Wilcoxon signed rank sum significant tests for the CSEP set versus all other sets according to each data type (p<0.05 highlighted green). Footnote: The null hypothesis is that the data from each comparison set is equal to or lower in value than that from the CSEP set. The alternative hypothesis is that the data in the comparison set is greater in value. Significant p-values (p<0.05) are shown in bold.

Format: PDF Size: 19KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 17:

Calculated p-values for Fisher's exact tests for the CSEP data set compared against data from all other sets. Shown are the categorical data regarding the proportion of ribonucleases and hydrolases analysed using a Fisher’s exact test (again, p<0.05 highlighted green). Footnote: p-values (p<0.05) are shown in bold, indicating significant over representation of the data type in the CSEP set.

Format: PDF Size: 64KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 18:

IntFOLD 3D models for selected CSEP families. A: IntFOLD 3D models for CSEPs from family 12. Positively selected residues are highlighted in red. Left, cartoon view showing secondary structure types. Right, surface view showing globular structure. Images were rendered using PyMol. B: IntFOLD 3D models for CSEPs from family 22. Positively selected residues are highlighted in red. Left, cartoon view showing secondary structure types. Right, surface view showing globular structure. Images were rendered using PyMol. C: IntFOLD 3D models for CSEPs from family 5. Positively selected residues are highlighted in red. Left, cartoon view showing secondary structure types. Right, surface view showing globular structure. Images were rendered using PyMol. D: IntFOLD 3D models for CSEPs from family 21. Positively selected residues are highlighted in red. Left, cartoon view showing secondary structure types. Right, surface view showing globular structure. Images were rendered using PyMol. E: IntFOLD 3D models for CSEPs from family 23. Positively selected residues are highlighted in red. Left, cartoon view showing secondary structure types. Right, surface view showing globular structure. Images were rendered using PyMol.A.

Format: PDF Size: 1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 19:

CSEPs show significant differences in amino acid frequencies and secondary structure (part 1). The CSEP set compared with other sets according to: length (as a control), amino acid frequency (A-Y), coiled-coil composition, TM helix composition (as a control), low complexity regions, frequency of helical residues, frequency of strand residues, frequency of loop residues. The Haustoria_only set is compared with other sets according to: length (as a control), amino acid frequency (A-Y), coiled-coil composition, TM helix composition, low complexity regions, frequency of helical residues, frequency of strand residues, frequency of loop residues. The null hypothesis is that the Haustoria_set has greater frequencies of that in each column than the set.

Format: PDF Size: 49KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 20:

CSEPs show significant differences in amino acid frequencies and secondary structure (part 2). The CSEP set compared with other sets according to: length (as a control), amino acid frequency (A-Y), coiled-coil composition, TM helix composition, low complexity regions, frequency of helical residues, frequency of strand residues, frequency of loop residues. The table contains the same information as Additional file 2 but with the reverse null hypothesis (or 1-p).

Format: PDF Size: 128KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 21:

Distribution of the YxC motifs. A: Distribution of the YxC motifs among the 307 CSEPs having this motif within the first 50 amino acids. The cumulative number of the YxC, WxC and FxC versions of the YxC-motif is plotted versus the distance of the first amino acid of the motif from the signal peptide cleavage site. B: Distribution of the YxC motifs among the 352 CSEPs having one or more versions of this motif. The cumulative number of the YxC, WxC and FxC versions of the YxC-motif is plotted versus the distance of the first amino acid of the motif from the signal peptide cleavage site.

Format: PDF Size: 59KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 22:

Cysteines and prediction of disulphide bonds in CSEPs. A: The histogram shows the number of CSEPs versus the position of the last cysteine from the C-terminus of the protein. B: Distribution of CSEPs containing 0 – 16 cysteines. C: The histogram shows the prediction of disulfide bonds in the CSEPs using Disulfind [12].

Format: PDF Size: 103KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 23:

Clustering of CSEPs on sequence scaffolds. The Table shows for each of the studied families how many members are clustered and the length of the scaffold region containing the members. The scaffold length includes both the sum of the sequence contigs and the calculated distances between the contigs. The average distance is the distance between two CSEPs on the scaffold if they were distributed evenly.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 24:

The relationship between CSEP clustering on genome sequence scaffolds and their sequence homology. The Figure shows families 2–13, 15, 16, 25, 30 and 33. The scaffolds are drawn as vertical, solid bars (colours indicate separate contigs) with a scale bar in the right bottom corner. The phylogenetic tree is based on nucleotide sequences and calculated using the UPGMA algorithm with CLC Main Workbench. Bootstrap values on the basis of 100 replicates are shown at the nodes, the scale bar at the left bottom corner indicates the number of nucleotide substitutions per site. The CSEPs not connected to any scaffold with a dotted line are not found to be clustered.

Format: PDF Size: 264KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 25:

Clustering of CSEP genes. A: The 68 genomic sequence scaffolds of more than 100 kb are expressed in % of their sum (92 Mb, blue line) and ordered according to their length. The 463 CSEPs found on each scaffold of more than 100 kb are expressed in % of their total number (green line). B: The family-wise distribution of 455 CSEPs on the 43 scaffolds harboring at least two CSEPs. Families with at least three clustered members are colour-coded so that the coloured histograms show the number of clustered members from each family on each scaffold.

Format: PDF Size: 168KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 26:

Clustering of selected CSEP family members. Genome clustering of four CSEP paralogs from family 8 and four CSEP paralogs from family 30 on their respective sequence scaffolds. The schematic illustration of the genome organizations with repetitive elements is shown below each dendrogram with indications of the sequence homologies in pair-wise comparisons (note that the colour coding in the dendrogram matches the colour coding in the scaffolds). The element Egh24 is a SINE [15], the Bgt repeat is an un-characterized repeat (GenBank AJ002007.1) from B. graminis f.sp. tritici, the EKA paralog is an AvrA10/K1-paralog [32] . Vertical dotted red lines indicate abrupt breaks in sequence homology. The scale bars next to the dendrograms refer to the genomic scaffolds.

Format: PDF Size: 176KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data