A phylogenetic survey of myotubularin genes of eukaryotes: distribution, protein structure, evolution, and gene expression

Kerk, David; Moorhead, Greg BG

doi:10.1186/1471-2148-10-196

Research article
Open access
Published: 24 June 2010

A phylogenetic survey of myotubularin genes of eukaryotes: distribution, protein structure, evolution, and gene expression

David Kerk¹ &
Greg BG Moorhead¹

BMC Evolutionary Biology volume 10, Article number: 196 (2010) Cite this article

6044 Accesses
11 Citations
Metrics details

Abstract

Background

Phosphorylated phosphatidylinositol (PtdIns) lipids, produced and modified by PtdIns kinases and phosphatases, are critical to the regulation of diverse cellular functions. The myotubularin PtdIns-phosphate phosphatases have been well characterized in yeast and especially animals, where multiple isoforms, both catalytically active and inactive, occur. Myotubularin mutations bring about disruption of cellular membrane trafficking, and in humans, disease. Previous studies have suggested that myotubularins are widely distributed amongst eukaryotes, but key evolutionary questions concerning the origin of different myotubularin isoforms remain unanswered, and little is known about the function of these proteins in most organisms.

Results

We have identified 80 myotubularin homologues amidst the completely sequenced genomes of 30 organisms spanning four eukaryotic supergroups. We have mapped domain architecture, and inferred evolutionary histories. We have documented an expansion in the Amoebozoa of a family of inactive myotubularins with a novel domain architecture, which we dub "IMLRK" (inactive myotubularin/LRR/ROCO/kinase). There is an especially large myotubularin gene family in the pathogen Entamoeba histolytica, the majority of them IMLRK proteins. We have analyzed published patterns of gene expression in this organism which indicate that myotubularins may be important to critical life cycle stage transitions and host infection.

Conclusions

This study presents an overall framework of eukaryotic myotubularin gene evolution. Inactive myotubularin homologues with distinct domain architectures appear to have arisen on three separate occasions in different eukaryotic lineages. The large and distinctive set of myotubularin genes found in an important pathogen species suggest that in this organism myotubularins might present important new targets for basic research and perhaps novel therapeutic strategies.

Background

Phosphatidylinositol (PtdIns) phospholipids are quantitatively minor but functionally significant membrane lipid components which have been shown to be involved in regulating diverse aspects of cellular function, such as proliferation, survival, growth, cytoskeletal reorganization, and various membrane trafficking events. The inositol ring can be phosphorylated at the D3, D4 or D5 position to produce a set of seven distinct phosphorylated derivatives, which are preferentially located in various cellular membranes or microdomains, specifying their identity, and mediating cellular functions by recruiting various effector proteins with specialized lipid-binding domains [1]. The homeostasis of these phosphorylated PtdIns lipids is mediated by a number of specific kinases and phosphatases.

Myotubularins are members of the protein tyrosine phosphatase (PTP) superfamily, which feature a characteristic HCX(5)R catalytic motif, where the cysteine is the catalytic residue, the histidine is important for the nucleophilic properties of the cysteine, and the arginine is important in coordinating the substrate phosphate group. Myotubularins have been shown to be specific lipid phosphatases, cleaving the D3 phosphate from PtdIns3P and PtdIns(3,5)P2. There is a large myotubularin family in humans (14 genes) which encode both catalytically active and inactive members. Mutations in either active or inactive members of this family bring about human disease, which involves chiefly skeletal muscle (X-linked myotubular myopathy [XLMTM]) or peripheral neurons (Charcot-Marie-Tooth [CMT] neuropathies) [2–4]. Previous phylogenetic studies have reported the presence of myotubularin genes in plants, fungi and some protists, with the latter group only containing both active and inactive forms [2, 5].

This study presents a systematic survey of myotubularin genes in a large number of completely sequenced eukaryotic genomes, representing a broad array of taxonomic groups. Most genomes contain one to a few myotubularin genes, though they are absent in certain groups. The evidence is consistent with the independent appearance of inactive myotubularin genes, featuring novel domain combinations, in different taxonomic groups. The greatest expansion of the myotubularin gene family yet observed occurs in the pathogenic species Entamoeba histolytica. Functional evidence derived from published gene expression studies indicates that these genes may be important in pathogen transmission and host infection.

Results

Phylogenetic Distribution, Gene Evolution, Domain Architecture

Recent work in eukaryotic systematics has increasingly defined large organismal "supergroups" encompassing many traditional smaller groups [6–8]. We have conducted a broad survey of fully sequenced genomes amongst these large organismal groups for the presence of myotubularin gene homologues. Only the Rhizaria were excluded as there is as yet no completed genome in that group. Our results are summarized in Figure 1. We searched 30 genomes, and identified 80 sequences. We found that myotubularin genes are nearly ubiquitous in eukaryotes, being readily identifiable in all the major eukaryotic groups and in all genomes examined with the notable exception of the obligate intracellular parasites Encephalitozoon cuniculi (Microsporidia) and Plasmodium falciparum (Apicomplexa) and eukaryotic algae, both red (Cyanidioschyzon merolae) and green (Ostreococcus sp., Chlamydomonas reinhardtii). Most organisms (19 out of 24 species with myotubularins) posses one to three genes. The notable exception to this general pattern occurs in members of the Unikonta (Amoebozoa, Choanoflagellida, Metazoa) (for more information on organisms, see the Tree of Life project [9]).

We utilized domain-searching strategies detailed in Methods to determine the molecular architecture of myotubularin gene encoded proteins. The results are presented in Figure 2. It is apparent that nearly all myotubularin proteins contain both a myotubularin phosphatase domain and a PH-GRAM domain (Pleckstrin-Homology, Glucosyltransferases, Rab-like GTPase Activators and Myotubularins). In studies of animal myotubularin proteins it has been shown that the PH-GRAM domain binds phosphoinositide lipids, and confers both specific subcellular localization and regulation of the phosphatase domain [10]. The nearly constant presence of the PH-GRAM domain in myotubularins across a broad range of organisms suggests that this domain architecture was established early in eukaryotic evolution. We observed, however, that there were a number of sequences where complete PH-GRAM domains with the characteristic architecture observed in human proteins could not be detected, despite the use of the most sensitive structural analysis methods available (see Figure 2). This indicates that PH-GRAM domain sequences can be very divergent, which we also noted in multiple sequence alignments including the PH-GRAM domain region (see the full myotubularin sequences alignment presented as Additional File 1). This suggests that although the architectural coupling of a PH-GRAM along with a myotubularin phosphatase domain is a standard feature of these proteins, the specific molecular properties and functions of the PH-GRAM domains have the potential to be quite diverse and distinct.

The catalytic loop signature of human myotubularins is: HCSDGWDR [2]. Inspection of the myotubularin sequence alignment presented in Figure 3 shows that this is found invariant in most of the myotubularin sequences, indicating that they all share a common local active site architecture and catalytic mechanism. One of the notable features of human myotubularins is the presence of several catalytically inactive subunits, resulting from mutations to the key catalytic cysteine and arginine residues in the catalytic loop region. It has been previously noted that myotubularin genes with apparently inactive catalytic loop signatures can be observed in Giardia and Dictyostelium, suggesting that inactive subunits arose early in evolution [2, 5]. Our work confirms these findings, and sheds further light on the origin of these sequences. Three Excavate myotubularin sequences lack a PH-GRAM domain, the only sequences we observed with this characteristic (see Figure 2). Giardia sequence GL50803_112811 lacks both the cysteine and arginine residues from the catalytic loop region (see Figure 3). Leishmania sequence LmjF12.0320 and Trypanosoma sequence Tb927.6.870 each possess both the cysteine and the arginine, but lack the histidine preceding the cysteine. Since this histidine is universally conserved in active PTP phosphatases, and has been shown to be important in the catalytic mechanism by altering the nucleophilic properties of the neighboring cysteine [11], it is likely that these proteins are also catalytically inactive. The lack of a PH-GRAM domain, unique to these Excavate inactive myotubularins, suggests that they comprise a single gene lineage.

Amoebozoan IMLRK (Inactive Myotubularin/LRR/ROCO/Kinase) Genes and Proteins

The amoebozoans Dictyostelium and Entamoeba each have a large number of myotubularin homologues (see Figure 1 and Figure 2). Dictyostelium has nine active myotubularin subunits, and Entamoeba has eight. In addition, there are a number of inactive myotubularin subunits. The Dictyostelium gene pats1 (encoding sequence DDB0191503) was previously incorrectly reported to contain an active myotubularin domain [12]. In addition, this protein contains a LRR domain, a recently described ROCO domain [13, 14] (comprised of a ROC [Ras of complex proteins] and COR [C-terminal of ROC] region), and a protein kinase domain. The LRR/ROCO/kinase architecture was also known to be shared by Dictyostelium sequence DDB0191512, which also has an N-terminal Rho-GAP domain. By use of a sensitive myotubularin-sequence based HMM search strategy, we found that this sequence also contains an inactive myotubularin domain. Further application of this HMM search revealed that Entamoeba contains eleven proteins with divergent, but clearly recognizable inactive myotubularin domains (EHI_140980, EHI_137960, EHI_185230, EHI_048230, EHI_151670, EHI_107230, EHI_135010, EHI_141820, EHI_078170, EHI_197200, EHI_188050). Our findings confirm and extend previous observations [5]. Further examination of the domain architecture of the newly discovered Entamoeba inactive myotubularin sequences revealed that 9 of them also showed significant similarity to the solved structures of LRR proteins and protein kinases, and weaker but still significant similarity to the solved structure of a bacterial ROCO protein (PDB: 3dpu_A) [15] as detected by both the FFAS03 (Fold and Function Assignment System ) sequence:profile technique, and the HHPred (HMM-HMM {Hidden Markov Model} structure prediction) profile:profile technique. This indicated that these nine proteins might also share the inactive myotubularin/LRR/ROCO/kinase architecture previously detected in Dictyostelium sequences. We suggest the acronym "IMLRK" to refer to this somewhat cumbersome domain architecture.

To confirm the identity of the ROCO domains of these Entamoeba sequences we performed iterative multiple sequence alignments, HMM construction, database searches and realignment, to assemble the data presented as Figure 4. During this process, we identified several previously unreported ROCO proteins (2 from Monosiga and 9 from Trichoplax). The alignment presents a comparison between our set of newly identified ROCO domain sequences and those from previously characterized Dictyostelium proteins. In their report of the solved structure of a bacterial ROCO protein, Gotthardt et al. [15] identified residues important to both the function of the bacterial protein, and animal ROCO protein homologues. These include residues in the ROC domain important for GTPase binding and residues in both the ROC and COR domains important for domain interactions and GTPase activity (see Legend to Figure 4). It is evident by inspection of the alignment in Figure 4 that on the whole, conservation of this critical residue set for the Entamoeba IMLRK sequences is poor. Despite the overall apparent similarity of these sequences to the rest of the comparison set, several of the Entamoeba sequences have deletions in these critical residues, and would therefore presumably lack GTPase activity. Only one Entamoeba sequence (EHI_048230) has a set of residues which might confer enzymatic activity.

Comparison with sequence models at NCBI CDD (Conserved Domain Database) indicates that the protein kinase domains of the Entamoeba IMLRK proteins resemble both Ser-Thr and Tyr kinases (see Table 1). This is consistent with previous characterization of kinase domains in ROCO proteins as being of the TKL (Tyrosine kinase-like) group [16, 17]. We performed a multiple sequence alignment with these kinase domains, which is presented in Figure 5. We examined the group of ten important functional sequence positions characterized by Kannan et al. [18]. For 7 of the 9 Entamoeba sequences, all of these critical functional residues are conserved. The exceptions are: EHI_107230, where there is an H to T mutation at the position corresponding to PKA (Protein Kinase A) H158; and EHI_135010, where there is a D to N mutation at the position corresponding to PKA D166, and an N to A mutation at the position corresponding to PKA N171. Thus we would predict that nearly all of these sequences are catalytically active [18].

Table 1 ROCO Kinase Domain CDD Hits

Full size table

Five of the newly discovered Entamoeba sequences have predicted N-terminal Rho-GAP (Rho-GTPase Activating Protein) domains. Of these five domains, however, only one (that for EHI_048230) is strong enough to appear in a domain search at NCBI CDD with default settings, indicating that it is probably enzymatically active. The enzymatic activity of the other four domains is questionable, due to their evident sequence divergence. These five sequences with an N-term Rho-GAP domain resemble the architecture of the Dictyostelium gene roco9 (protein sequence DDB0191512), and it is possible that they represent a distinct gene lineage.

The myotubularin domains of the IMLRK proteins are divergent, as is evident by inspection of our sequence alignments (Figure 3 and Additional File 1). The Entamoeba IMLRK proteins have all suffered deletion of the α14 region of the phosphatase domain (positions 1701 - 1706 of our reference alignment [Additional File 1]). Sequence EHI_197200 is clearly the most divergent of the group. It is also missing the α8 and α9 regions, and the C-terminus of the phosphatase domain (from α14 on). In summary, the IMLRK domain architecture is distinctive, being seen in no other taxonomic group besides the Amoebozoa, which suggests that the origin of these genes comprises a second, independent event in myotubularin gene evolution.

Finally, we attempted to determine, by multiple sequence alignment and phylogenetic tree analysis (data not shown) the possible origin of the two inactive Entamoeba myotubularins without a ROCO domain. EHI_188050 appears to be closely related to an active subunit (EHI_104710), and has therefore probably recently suffered inactivating mutations. The origin of EHI_140980 is more obscure - it does not appear to be closely related to any of the other inactive Entamoeba myotubularins.

Myotubularins in the Choanoflagellate/Metazoan Assemblage

Previous phylogenetic analyses [2, 3] of myotubularin sequences from human, other vertebrates, and a collection of invertebrates defined six similarity clusters - three composed of catalytically active subunits (in human: ["M1 Group": MTM1 {Myotubular myopathy}, MTMR1 {MTM-related}, MTMR2]; ["R3 Group": MTMR3, MTMR4]; ["R6 Group": MTMR6, MTMR7, MTMR8]) and three composed of catalytically inactive subunits (in human: ["R5 Group": MTMR5, MTMR13]; ["R9 Group": MTMR9]; ["R10 Group": MTMR10, MTMR11, MTMR12]). We have extended this analysis by finding previously unreported myotubularin homologues in the metazoans Nematostella and Trichoplax, and the choanoflagellate Monosiga. Our results are presented in Table 2, along with Bayesian and Maximum Likelihood clade supports for each group. Bayesian support is high for all groups, with the mean posterior probability exceeding 0.90 in every case. Bootstrap support in Maximum Likelihood is weaker and more variable, depending more on details of alignment composition, but nevertheless the mean exceeds 80% for each group. Despite repeated attempts using distinct input alignments, data-transformation techniques (i.e. identifying and removing rapidly evolving sites [6]) and amino acid substitution models, we were unable to obtain consistent tree topologies with high support for deep interior branch points. This indicates a high degree of sequence divergence of the several myotubularin sub-types.

Table 2 Placement of Myotubularin Homologue Sequences into Phylogenetic Similarity Clusters

Full size table

The domain architecture data presented in Figure 2 are for the most part consistent with the placement of the new myotubularin homologues into similarity clusters based on phylogenetic tree inference data. All of the new sequences placed into similarity groups have a full PH-GRAM domain, and a myotubularin phosphatase domain (predicted to be active or inactive) consistent with their class placement. The myotubularins of the "R5" group characteristically possess a DENN (Differentially Expressed in Neoplastic versus Normal cells) domain N-terminal to the PH-GRAM domain, and a PH (Plekstrin Homology) domain C-terminal to the phosphatase domain. This is true for the new sequences Tad51481 (Trichoplax) and Nv19357 (Nematostella). However sequence M001750622 (Monosiga) has an additional domain of unknown function at the extreme N-terminus, and lacks the C-terminal PH domain. This may indicate that the stable "R5" subunit domain architecture had not yet been achieved at this early stage of myotubularin gene evolution. Myotubularins of the "R3" group characteristically have a FYVE domain (Fab 1, YO1B, Vac 1, and EEA1 (early endosome antigen 1)) C-terminal to the phosphatase domain. This is true for sequence Tad64213 (Trichoplax). However, sequence N001631983 (Nematostella), also classified as an R3 member, lacks this domain. Furthermore, sequence N001626810 (Nematostella), classified as a member of the "R6" group, has a C-terminal FYVE domain. This is characteristically absent from the members of the R6 group, and for example, is absent from sequence Tad56124 (Trichoplax), also classified in this group. Thus it would appear that the Nematostella sequences in the R3 and R6 groups may have exchanged the FYVE domain. This may represent a novel, interesting genetic event in the evolution of the Nematostella myotubularin genes. Alternatively, it is conceivable that this might represent an error in genomic sequence assembly and annotation. Finally, the sequences M01745983C and Tad51955 are intriguing. These sequences cluster together consistently as a "new clade" in phylogenetic analysis based on alignments made from the PH-GRAM and phosphatase domains (see Legend to Table 2). In addition, each of them also possesses an N-terminal C2 domain (Protein kinase C Conserved region 2; phospholipid binding), which has not been reported previously in Metazoan myotubularins. This data supports the existence of a previously undescribed myotubularin architecture, perhaps restricted to Choanoflagellates and early Metazoa.

It is clear from the above phylogenetic analysis that even in the genome of Trichoplax, the most deeply diverging Metazoan known [19], there is a representative in each of the six typical myotubularin similarity groups. This pattern is continued throughout the rest of the Metazoans. This indicates that the gene diversification into the three catalytically active and three inactive myotubularin groups had been completed at the very base of the Metazoan clade.

The situation is less clear for the Monosiga genome. Representatives can only be identified clearly for the R5, R6, and R9 groups. This would suggest that the split between catalytically active and inactive myotubularins characteristic of the Metazoan clade had occurred already in the common ancestor of Choanoflagellates and Metazoans. However, it is impossible to propose a precise model for this process, as three similarity groups have no identified members. This might represent a genuine absence, and therefore have evolutionary significance. On the other hand, it is conceivable that the apparent absence of myotubularin gene types is an artefact of genome assembly and annotation. That this might be the case is supported by the discovery of the partial sequence Mbrevi5R3, which was manually constructed from unassembled genomic sequence reads. It therefore seems most prudent to say that the precise status of myotubularin genes in Choanoflagellates will have to await the completion of genome sequencing projects for other species in this group.

Accessory Protein Domains

Figure 6 presents a summary tabulation of the various domains found in myotubularin homologue sequences across the diverse eukaryotic groups examined in this study. It indicates the presence of active myotubularins with associated PH-GRAM domains in most species examined, across all the major supergroups. It summarizes the occurrence of the inactive myotubularins without PH-GRAM domains in the Excavates, the IMLRK proteins in the Amoebozoans, and the inactive myotubularins with PH-GRAM domains in the Choanoflagellates and Metazoa. This figure also indicates that several types of accessory domains are also sometimes observed in myotubularin homologues.

Nearly all animal myotubularins characterized to date possess coiled-coil domains C-term to the myotubularin domain. These have been shown to be important in mediating the protein-protein interactions between myotubularin subunits [3], and might conceivably provide interaction sites for other protein partners. We find that the presence of coiled-coil domains is much more sporadic in the entire myotubularin set (see Figure 6 and also Figure 2), with many proteins lacking them. Where they occur, the most common location is C-term to the myotubularin domain, however a number of sequences, particularly in the Amoebozoa, have N-term coiled-coil domains. The lack of coiled-coil domains in a number of myotubularins would suggest that potential protein-protein interactions would need to be facilitated by some other structural feature. It may be relevant in this context that PH domains, as a broad group, are known to often facilitate protein-protein interactions, as well as protein-lipid binding [20]. It may be that some of the structural and sequence diversity we observed in the PH-GRAM domains of the myotubularins in our sequence set arises due to this domain mediating protein-protein interactions. Protein-protein interactions might also be mediated by the observed ANK [Ankyrin], LRR [Leucine-rich repeat], and WD40 domains.

Several domains are found which typically mediate membrane localization (PH, PX [Phox-like], C1, C2 [Protein kinase C conserved region 1 and 2], FYVE [Fab 1, YO1B, Vac 1, and EEA1 (early endosome antigen 1)]), which is consistent with the postulated role of myotubularin proteins in vesicle transport. The presence of Rho-GAP domains might indicate a role in direct regulation of the cytoskeleton.

A few sequences show a predicted transmembrane domain, and one has a predicted signal peptide. These are very unusual for myotubularin sequences, and would be consistent with localization in a particular intracellular membrane compartment, and entry into the endomembrane system, respectively.

Finally, several sequences contain a predicted nuclear localization signal (NLS). This was true for Nv109357 (Nematostella), and several other metazoan members of the R5 clade. This data is presented in Additional File 2. Amongst these sequences was the Drosophila homologue of MTMR5/MTMR13. This is consistent with the observation that this protein (originally called "Sbf1" [SET binding factor 1]) co-localizes with the epigenetic regulatory protein Trithorax (Trx) on polytene chromosomes [21]. The presence in several members of the R5 clade of a well-conserved basic sequence loop and NLS prediction together suggest that nuclear localization may be possible for other members of this group. In addition, we observed predicted NLS in two sequences from the plant Populus trichocarpa. This data is summarized in Additional File 3. Recently the Arabidopsis myotubularin At3g10550 was shown to participate in a partially overlapping drought-response gene regulatory network with the epigenetic regulatory Trithorax homologue protein ATX1 [22]. This has raised the question as to whether this protein might be able to enter the nucleus. Our finding of a conserved basic sequence region supports this possibility.

Myotubularin Gene Expression in Entamoeba

The unusually large complement of myotubularin homologues in Entamoeba histolytica, a well-known pathogenic organism, prompted us to explore the literature to examine patterns of myotubularin gene expression in this species. Davis et al. [23] reported differences in gene expression between the infective E. histolytica strain HM-1:IMSS and the non-pathogenic E. histolytica strain Rahman. Sequence EHI_141820 (one of the IMLRK proteins) showed an increase of 4.4× in expression (p = 1.07E-07). Ehrenkaufer et al. [24] identified "cyst-specific" E. histolytica genes which are differentially expressed in recent clinical isolates (which form cysts) as compared to laboratory strains or strains isolated from the mouse colon (which do not form cysts). Genes encoding two active myotubularins showed increases in expression (EHI_070120 [6.3×, p = 2.4E-03], EHI_049780 [5.3×, p = 7.3E-03]). Genes for four inactive myotubularins also showed increases in expression (EHI_140980 [6.3×, p = 5.0E-03], EHI_188050 [2.6×, p = 2.3E-03], EHI_185230 [10.7×, p = 7.4E-07], EHI_078170 [5.2×, p = 1.3E-04]). The latter two sequences are IMLRK proteins.

Discussion

A variety of experiments in animal and fungal systems including in vitro enzymatic studies, mutational analysis, complementation assays, and in vivo overexpression, agree in characterizing myotubularins as phosphatases of the D3 position in the inositol headgroup of inositol phospholipids. PI3P and PI(3,5)P2 appear to be primarily localized to the cellular endomembrane system and restricted domains of the plasma membrane, mediating transitions between endosomes and lysosomes, retrograde transport between the endosomal compartment and trans Golgi network, and endocytosis of some materials from the cell surface [1, 3]. Mutations of animal and yeast myotubularins lead to abnormal accumulations of PI3P and PI(3,5)P2, apparently disrupting normal cellular membrane trafficking events, perhaps through abnormal concentrations and/or localizations of PI-phosphate specific membrane-binding effector proteins [2–4]. One would anticipate that such intracellular membrane trafficking processes, and the mechanisms regulating them, would be very ancient, having arisen quite early in eukaryotic evolution. This is consistent with our most common observation of a small number of myotubularin genes in organisms across a broad phylogenetic distribution, suggesting the presence of a single such gene in the last common ancestor for all extant eukaryotic groups. The PH-GRAM domain appears to be a very early acquisition, perhaps coincident with the divergence of a generic PTP domain into the characteristic elaborated myotubularin phosphatase domain.

Inactive myotubularin subunits are one of the particularly interesting features of this gene group. Our data are consistent with these having appeared on three separate occasions in eukaryotic evolution, in different taxonomic groups. The distinctive lack of a PH-GRAM domain in the inactive Excavate myotubularins makes it likely that these represent a unique lineage. Similarly, the IMLRK domain architecture of the Amoebozoa inactive myotubularins suggests they too have a unique origin. Finally, it is likely that an active myotubularin lineage then began an independent diversification event somewhere around the base of the Choanoflagellate/Metazoan divergence to produce the six similarity groups characteristic of the Metazoans. This is consistent with our finding of all six myotubularin subgroups being identifiable in the deeply diverging Placozoan Trichoplax, but only three subgroup representatives being clearly identifiable from the Choanoflagellate Monosiga. More completed genome sequences from Choanoflagellates and even more deeply diverging protistan "animal allies" (e.g. Ichthyosporea and Filasterea [25, 26]) will be necessary to precisely define this pivotal period in myotubularin gene history.

Myotubularin function has been most intensively studied in humans, where a number of diseases arising from inherited mutations have been characterized. It has been suggested that a common unifying pathophysiological mechanism in these disorders may be abnormality in the membrane trafficking necessary to alter the characteristic molecular composition and identity of the plasma membrane and specialized derivative membrane structures during cellular differentiation [4]. In this model the disordered membrane trafficking would be secondary to perturbations in the normal levels and perhaps subcellular distribution of PI3P and PI(3,5)P2, the normal substrates of myotubularins. This model suggests that the normal function of myotubularins becomes especially critical in situations where cells are required to turn over and alter, on a large scale, through membrane trafficking, the suites of proteins and perhaps lipids characterizing particular domains on the plasma membrane and components of the endomembrane system.

Myotubularin genes have undergone an expansion in the Amoebozoan species Dictyostelium discoideum. Nine myotubularins are predicted to be enzymatically active, and two inactive. Nothing is known about the function of the myotubularins in this organism. However, it is reasonable to suggest that they are involved in the regulation of the substantial intracellular trafficking events that would accompany membrane reorganization during a complex life cycle. The two inactive Dictyostelium myotubularins also possess the distinctive IMLRK domain architecture. "ROCO" proteins (which usually contain LRR/ROCO/kinase domains, but not myotubularin domains) were initially characterized in Dictyostelium, are biochemically best understood in this organism, but have a widespread phylogenetic distribution in both prokaryotes and eukaryotes [13]. In Dictyostelium, where there are 11 ROCO genes in all, functional evidence is available for four: gene gbpC is involved in chemotaxis; genes QkgA/roco2 and roco5 are involved in growth and development; and gene pats1 (our IMLRK sequence DDB0191503) is involved in cytokinesis. The ROCO proteins have recently received considerable attention because in humans the family member LRRK2 is involved in familial and some cases of sporadic Parkinson's disease [13]. Biochemical approaches, analysis of disease-associated mutations, and solved protein structures have revealed that the protein kinase domain is regulated by the GTPase activity of the ROC domain, through protein-protein dimerization mediated by the COR domain [15, 27]. Thus these proteins have been likened to a "stand-alone" intramolecular signal transduction cascade, mediated by their multiple functional domains. Dictyostelium pats1 (DDB0191503) is essential for cytokinesis, and contains an enzymatically inactive myotubularin domain, whose function has not been experimentally tested. A reasonable proposal would be that the myotubularin-like portion of the protein could provide membrane localization via its PH-GRAM domain. It is known that specialized plasma membrane domains enriched in PI(4,5)P2 accumulate at the intercellular bridge during cytokinesis, where they regulate the underlying actin cytoskeleton [28]. The Dictyostelium gene roco9 (DDB0191512) also encodes an IMLRK protein. Nothing is known about the function of this protein, but it contains a Rho-GAP domain, which might indicate a role in regulation of the actin cytoskeleton. Once again, the myotubularin-like region of the protein could supply membrane localization. Another functional possibility for the inactive myotubularin domains of both pats1 and roco9 is that they might bind to one or more of the many active Dictyostelium myotubularins, and mediate regulation of their activities. Several such combinations of active plus regulatory inactive myotubularin subunits are well characterized in animal cells [3].

In Entamoeba, another Amoebozoan, there is an even larger myotubularin gene set than observed in Dictyostelium - there are 8 active myotubularins, and 11 inactive myotubularins (9 of them with the IMLRK domain architecture). This is the largest collection of myotubularin genes observed to date in any eukaryotic genome examined. This large repertoire of active plus inactive subunits suggests the possibility of a particularly rich network of regulatory protein-protein associations. It is particularly striking that, in contrast to the intricate multicellular associations of Dictyostelium, the Entamoeba life cycle is morphologically rather simple. Underlying this apparently simplicity, however, is probably complex turnover and change to plasma membrane protein sets accompanying life cycle transitions and invasive contact with host tissues [29–31]. It might be hypothesized that the large complement of myotubularin genes found in this organism is necessary for precise spatial and temporal regulation of these membrane trafficking events, over and above the "constitutive" requirements of any eukaryotic cell. Their numbers would suggest that the IMLRK proteins might be particularly important. The data suggest that the protein kinase domains of the IMLRK proteins will be active, and that the ROC domains lack GTPase activity. This would indicate a change to the typical paradigm of ROC GTPase-mediated control of the kinase domain. It is possible that the divergent ROCO domains in these proteins effect protein kinase regulation via interaction with novel accessory proteins.

In most human cases of infection with Entamoeba histolytica, the organism remains in the lumen of the intestine, in contact with the epithelium. In a minority of cases, invasion of the intestinal wall occurs, which may lead to liver abscesses. The life cycle is completed by the organism forming cysts, which are released from the host in excrement, to infect new hosts. A significant increase in gene expression was noted in a myotubularin gene in a pathogenic vs a non-pathogenic strain of E. histolytica [23]. Significant upregulation was noted in several myotubularin genes which appear to be acting specifically in the encystment stage of the life cycle [24]. Taken together, these data suggest that myotubularin genes are important to both completion of the life cycle, and invasive disease in this organism.

Conclusions

We have presented a phylogenetic survey of myotubularin genes across a diverse array of eukaryotes, including distribution, domain architecture, and inferred evolutionary history. We have characterized an expansion of genes in the Amoebozoa encoding proteins with the novel combination of "IMLRK" (inactive myotubularin/LRR/ROCO/kinase) domains. This group is particularly prominent in the pathogenic organism Entamoeba histolytica, which contains the largest myotubularin gene family of any eukaryotic genome yet examined. Gene expression data in E. histolytica indicates that myotubularin function may be important to both critical life cycle transitions and host infection. The data indicate that pathogen myotubularin genes may be important targets for basic research, and perhaps novel strategies for disease control.

Methods

Identification of Putative Myotubularin Homologue Sequences

Sequences of all 14 human myotubularin proteins were obtained from NCBI Entrez [32]. A multiple sequence alignment was constructed and edited as presented in the next section. Eukaryotes with a completely sequenced genome were identified using the Genomes Online Database [33, 34], and organismal protein datafiles were obtained from the sites linked therein. A Hidden Markov Model (HMM) of the human myotubularin multiple sequence alignment was constructed using the HMMER program package, which was then used to search the various eukaryotic protein sequence datafiles (program commands "hmmbuild", "hmmcalibrate" and "hmmsearch", threshold E = 1). Candidate sequences were determined by a combination of low E value (generally less than E = 0.01) and a long alignment to the HMM model. A spreadsheet with the URLs of websites used to obtain protein datasets within which candidate myotubularin homologue sequences were found is presented in Additional File 4.

Determination of Myotubularin Similarity Regions within Sequences

Candidate myotubularin sequences obtained from the initial HMM search of protein datafiles were subjected to sequence:profile (FFAS03) [35, 36] and profile:profile (HHPred) [37–39] analysis to identify the boundaries of the characteristic PH-GRAM and myotubularin phosphatase domains, by comparison with the solved structures of human MTMR2 (PDB: 1LW3, 1ZSQ). For most sequences this was a contiguous region, which was then utilized for multiple sequence alignment. FFAS03 returns standardized variable ("Z") scores for comparisons between a query and a solved template structure sequence, with a score of 9.5 cited by the authors as being statistically significant. Candidate myotubularin sequences routinely exceeded this threshold. HHPred returns a probability score reflecting both the alignment between HMMs formed based on the query sequence and solved structure sequences, and predicted secondary structure. A probability of 95% is cited by the authors as having a very low false positive rate. Candidate myotubularin sequences routinely exceeded this threshold.

Characterization of Non-Myotubularin Domains within Sequences

Candidate myotubularin homologue sequences obtained by HMM search as described above were examined for functional domains using FFAS03 and HHPred as described above (except now using as a comparison all sequences with solved structures in the PDB), and also NCBI CDD [40, 41], Pfam [42, 43], and InterProScan [44, 45], all with default settings. For the identification of ROCO domain sequences the comparison structure was that of the ROCO domain of Chlorobium tepidum (PDB: 3dpu_A [15]). The identity of the domains was confirmed by successive rounds of multiple sequence alignment (as detailed below), Hidden Markov Model construction (as detailed above), and database searching.

Characterization of Additional Protein Sequence Features

Candidate myotubularin homologue sequences obtained by HMM search as described above were examined for the presence of predicted signal peptides (Phobius [46, 47], SignalP [48, 49]), predicted transmembrane helices (Phobius [46, 47], TMHMM [50, 51]), predicted coiled-coil regions (Marcoil [52, 53], PairCoil2 [54, 55]), and nuclear localization signals NLStradamus [56, 57]).

Multiple Sequence Alignment

Candidate Myotubularin Sequences

Candidate myotubularin sequences (including both the PH-GRAM domain and the myotubularin phosphatase domain, or just the phosphatase domain alone (as defined by the sequence of the solved structure of MTMR2_Hu (PDB:1LW3)) were aligned utilizing as necessary several multiple sequence alignment programs: Muscle [58], T-Coffee [59] or M-Coffee [60, 61]. Quality of alignments was guided by evaluation at the T-Coffee web server. In some instances, sub-alignments were constructed, and then either sequences, or other sub-alignments were added using the Profile alignment mode of T-Coffee or ClustalX [62] (default program settings). Alignments were displayed and edited using the program GeneDoc [63]. After alignment analysis, it was found that some database sequences for candidate myotubularin homologues were incomplete due to annotation mistakes. These were supplemented with additional sequence by use of the appropriate organismal genome browser, and search of the organismal genomic DNA utilizing TBLASTN. Such sequences are denoted with the suffix "C" in the figure legends. For the reference multiple sequence alignment presented as Additional File 1 (100 sequences), no sequence regions were deleted.

Protein Kinase and ROCO Domains

Protein kinase domain and ROCO domain sequences within some myotubularin homologue candidates, detected as described above, were subjected to multiple sequence alignment with M-Coffee, displayed and edited with GeneDoc, as described above.

Phylogenetic Tree Inference

Multiple sequence alignments were constructed as detailed above. In some instances rapidly evolving sites (Category 8) were identified with PAML [64] and removed from the alignment (analysis performed using the programs AIR-Identifier and AIR-Remover at the University of Oslo BioPortal http://www.bioportal.uio.no/.

Bayesian phylogenetic trees were inferred with PhyloBayes 3.2d [65]. Two independent Markov Chains were run under various amino acid substitution models, and between-sites rate variation models (UL3, Dirichlet; UL3, Uniform; WLSR5, Dirichlet) for approximately 5,000 cycles, using a 20% (approximately 1,000 cycle) burn-in. Chain convergence was checked using the statistics "maxdiff" < 0.10 and "effsize" > 100. Maximum likelihood trees were inferred with PhyML 3.0 [66] and PhyML-mixture [67]. A two-stage process was used [6], where first the best tree was inferred from 20 random starts, using SPR moves, from a Parsimony input tree (PhyML) or a BioNJ input tree (PhyML-mixture). Various amino acid substitution models and models for between site rate variation were used ([JTT plus 4 Gamma categories, empirical amino acid frequencies, proportion of invariant sites estimated], [WAG plus 4 Gamma categories, empirical amino acid frequencies, proportion of invariant sites estimated], [LG plus 4 Gamma categories, empirical amino acid frequencies, proportion of invariant sites estimated], [EX3, single rate category, model amino acid frequencies]). Then a second stage utilized the best tree from the first stage as a user input tree, and inferred 100 bootstrap replicates, using SPR moves, employing the same amino acid substitution and site rate variation parameters as in the first stage.

Abbreviations

ANK:: Ankyrin domain
CDD:: Conserved Domain Database
CMT:: Charcot-Marie-Tooth
C1:: Protein kinase C conserved region 1 (C1) domain (Cysteine-rich domain)
C2:: Protein kinase C conserved region 2 (C2) domain (Cysteine-rich domain)
DENN:: differentially expressed in neoplastic versus normal cells
FFAS:: Fold and Function Assignment System
FYVE:: Fab 1, YO1B, Vac 1, and EEA1 (early endosome antigen 1)
GAP:: GTPase activating protein
GRAM:: glucosyltransferases, Rab-like GTPase activators and myotubularins
HHPred:: HMM-HMM structure prediction
HMM:: Hidden Markov Model
IMLRK:: Inactive myotubularin/LRR/ROCO/Kinase domain architecture of Amoebozoa
LRR:: Leucine-rich repeat
MTMR:: MTM1-related
NLS:: Nuclear localization signal
PDB:: Protein Data Bank
PH:: Pleckstrin-homology
Pfam:: Protein Families
PKA:: Protein kinase A
PTP:: Protein tyrosine phosphatase
PX:: Phox-like
Rho-GAP:: Rho-GTPase Activating Protein
ROCO:: Ras of complex proteins (ROC) + C-term of ROC (COR)
TKL:: Tyrosine kinase-like
WD40:: structural motif of 40-43 amino acids in the beta subunit of G-proteins
XLMTM:: X-linked myotubular myopathy

References

Rutherford AC, Cullen PJ: Phosphoinositides: Navigation Through the Endosomal Maze. The Biochemist. 2009, 31: 20-25.
CAS Google Scholar
Laporte J, Bedez F, Bolino A, Mandel JL: Myotubularins, a large disease-associated family of cooperating catalytically active and inactive phosphoinositides phosphatases. Hum Mol Genet. 2003, 12 (Spec No 2): R285-292. 10.1093/hmg/ddg273.
Article CAS PubMed Google Scholar
Robinson FL, Dixon JE: Myotubularin phosphatases: policing 3-phosphoinositides. Trends Cell Biol. 2006, 16: 403-412. 10.1016/j.tcb.2006.06.001.
Article CAS PubMed Google Scholar
Nicot AS, Laporte J: Endosomal phosphoinositides and human diseases. Traffic. 2008, 9: 1240-1249. 10.1111/j.1600-0854.2008.00754.x.
Article PubMed Central CAS PubMed Google Scholar
Lecompte O, Poch O, Laporte J: PtdIns5P regulation through evolution: roles in membrane trafficking?. Trends in Biochemical Sciences. 2008, 33: 453-460. 10.1016/j.tibs.2008.07.002.
Article CAS PubMed Google Scholar
Burki F, Shalchian-Tabrizi K, Jan Pawlowski J: Phylogenomics reveals a new 'megagroup'including most photosynthetic eukaryotes. Biol Lett. 2008, 4: 366-369. 10.1098/rsbl.2008.0224.
Article PubMed Central PubMed Google Scholar
Dacks JB, Walker G, Field MC: Implications of the new eukaryotic systematics for parasitologists. Parasitol Int. 2008, 57: 97-104. 10.1016/j.parint.2007.11.004.
Article PubMed Google Scholar
Koonin EV: The Incredible Expanding Ancestor of Eukaryotes. Cell. 2010, 140: 606-608. 10.1016/j.cell.2010.02.022.
Article PubMed Central CAS PubMed Google Scholar
The Tree of Life. [http://tolweb.org/tree/]
Begley MJ, Dixon JE: The structure and regulation of myotubularin phosphatases. Curr Opin Struct Biol. 2005, 15: 614-620. 10.1016/j.sbi.2005.10.016.
Article CAS PubMed Google Scholar
Zhang ZY, Dixon JE: Active site labeling of the Yersinia protein tyrosine phosphatase: the determination of the pKa of the active site cysteine and the function of the conserved histidine 402. Biochemistry. 1993, 32: 9340-9345. 10.1021/bi00087a012.
Article CAS PubMed Google Scholar
Abysalh JC, Kuchnicki LL, Larochelle DA: The identification of pats1, a novel gene locus required for cytokinesis in Dictyostelium discoideum. Mol Biol Cell. 2003, 14: 14-25. 10.1091/mbc.E02-06-0335.
Article PubMed Central CAS PubMed Google Scholar
Marin I, van Egmond WN, van Haastert PJ: The Roco protein family: a functional perspective. FASEB J. 2008, 22: 3103-3110. 10.1096/fj.08-111310.
Article CAS PubMed Google Scholar
Lewis PA: The function of ROCO proteins in health and disease. Biol Cell. 2009, 101: 183-191. 10.1042/BC20080053.
Article CAS PubMed Google Scholar
Gotthardt K, Weyand M, Kortholt A, Van Haastert PJ, Wittinghofer A: Structure of the Roc-COR domain tandem of C. tepidum, a prokaryotic homologue of the human LRRK2 Parkinson kinase. EMBO J. 2008, 27: 2352-10.1038/emboj.2008.167.
Article PubMed Central CAS Google Scholar
Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S: The protein kinase complement of the human genome. Science. 2002, 298: 1912-1934. 10.1126/science.1075762.
Article CAS PubMed Google Scholar
Marin I: The Parkinson disease gene LRRK2: evolutionary and structural insights. Mol Biol Evol. 2006, 23: 2423-2433. 10.1093/molbev/msl114.
Article CAS PubMed Google Scholar
Kannan N, Taylor SS, Zhai Y, Venter JC, Manning G: Structural and functional diversity of the microbial kinome. PLoS Biol. 2007, 5: e17-10.1371/journal.pbio.0050017.
Article PubMed Central PubMed Google Scholar
Dellaporta SL, Xu A, Sagasser S, Jakob W, Moreno MA, Buss LW, Schierwater B: Mitochondrial genome of Trichoplax adhaerens supports placozoa as the basal lower metazoan phylum. Proc Natl Acad Sci USA. 2006, 103: 8751-8756. 10.1073/pnas.0602076103.
Article PubMed Central CAS PubMed Google Scholar
Lemmon MA: Pleckstrin homology domains: not just for phosphoinositides. Biochem Soc Trans. 2004, 32: 707-711. 10.1042/BST0320707.
Article CAS PubMed Google Scholar
Petruk S, Sedkov Y, Smith S, Tillib S, Kraevski V, Nakamura T, Canaani E, Croce CM, Mazo A: Trithorax and dCBP acting in a complex to maintain expression of a homeotic gene. Science. 2001, 294: 1331-1334. 10.1126/science.1065683.
Article CAS PubMed Google Scholar
Ding Y, Lapko H, Ndamukong I, Xia Y, Al-Abdallat A, Lalithambika S, Sadder M, Saleh A, Fromm M, Riethoven JJ, et al: The Arabidopsis chromatin modifier ATX1, the myotubularin-like AtMTM and the response to drought. Plant Signal Behav. 2009, 4: 1049-1058. 10.4161/psb.4.11.10103.
Article PubMed Central CAS PubMed Google Scholar
Davis PH, Schulze J, Stanley SL: Transcriptomic comparison of two Entamoeba histolytica strains with defined virulence phenotypes identifies new virulence factor candidates and key differences in the expression patterns of cysteine proteases, lectin light chains, and calmodulin. Molecular & Biochemical Parasitology. 2007, 151: 118-128.
Article CAS Google Scholar
Ehrenkaufer GM, Haque R, Hackney JA, Eichinger DJ, Singh U: Identification of developmentally regulated genes in Entamoeba histolytica: insights into mechanisms of stage conversion in a protozoan parasite. Cell Microbiol. 2007, 9: 1426-1444. 10.1111/j.1462-5822.2006.00882.x.
Article CAS PubMed Google Scholar
Shalchian-Tabrizi K, Minge MA, Espelund M, Orr R, Ruden T, Jakobsen KS, Cavalier-Smith T: Multigene phylogeny of choanozoa and the origin of animals. PLoS One. 2008, 3: e2098-10.1371/journal.pone.0002098.
Article PubMed Central PubMed Google Scholar
Ruiz-Trillo I, Roger AJ, Burger G, Gray MW, Lang BF: A phylogenomic investigation into the origin of metazoa. Mol Biol Evol. 2008, 25: 664-672. 10.1093/molbev/msn006.
Article CAS PubMed Google Scholar
Deng J, Lewis PA, Greggio E, Sluch E, Beilina A, Cookson MR: Structure of the ROC domain from the Parkinson's disease-associated leucine-rich repeat kinase 2 reveals a dimeric GTPase. Proc Natl Acad Sci USA. 2008, 105: 1499-1504. 10.1073/pnas.0709098105.
Article PubMed Central CAS PubMed Google Scholar
Montagnac G, Echard A, Chavrier P: Endocytic traffic in animal cell cytokinesis. Curr Opin Cell Biol. 2008, 20: 454-461. 10.1016/j.ceb.2008.03.011.
Article CAS PubMed Google Scholar
Debnath A, Das P, Sajid M, McKerrow JH: Identification of genomic responses to collagen binding by trophozoites of Entamoeba histolytica. J Infect Dis. 2004, 190: 448-457. 10.1086/422323.
Article CAS PubMed Google Scholar
Ackers JP, Mirelman D: Progress in research on Entamoeba histolytica pathogenesis. Curr Opin Microbiol. 2006, 9: 367-373. 10.1016/j.mib.2006.06.014.
Article CAS PubMed Google Scholar
Marion S, Guillen N: Genomic and proteomic approaches highlight phagocytosis of living and apoptotic human cells by the parasite Entamoeba histolytica. Int J Parasitol. 2006, 36: 131-139. 10.1016/j.ijpara.2005.10.007.
Article CAS PubMed Google Scholar
NCBI-Entrez. [http://www.ncbi.nlm.nih.gov/sites/entrez?db=protein]
Liolios K, Mavromatis K, Tavernarakis N, Kyrpides NC: The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2008, 36: D475-479. 10.1093/nar/gkm884.
Article PubMed Central CAS PubMed Google Scholar
Genomes Online Database. [http://www.genomesonline.org/]
Rychlewski L, Jaroszewski L, Li W, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 2000, 9: 232-241. 10.1110/ps.9.2.232.
Article PubMed Central CAS PubMed Google Scholar
FFAS03. [http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl?ses=]
Soding J: Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005, 21: 951-960. 10.1093/bioinformatics/bti125.
Article PubMed Google Scholar
Soding J, Biegert A, Lupas AN: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005, 33: W244-248. 10.1093/nar/gki408.
Article PubMed Central PubMed Google Scholar
HHPred. [http://toolkit.tuebingen.mpg.de/hhpred]
Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, et al: CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 2009, 37: D205-210. 10.1093/nar/gkn845.
Article PubMed Central CAS PubMed Google Scholar
NCBI-CDD. [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml]
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2008, 36: D281-288. 10.1093/nar/gkm960.
Article PubMed Central CAS PubMed Google Scholar
Pfam. [http://pfam.janelia.org/]
Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD, et al: The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 2001, 29: 37-40. 10.1093/nar/29.1.37.
Article PubMed Central CAS PubMed Google Scholar
InterProScan. [http://www.ebi.ac.uk/Tools/InterProScan/]
Kall L, Krogh A, Sonnhammer EL: A combined transmembrane topology and signal peptide prediction method. J Mol Biol. 2004, 338: 1027-1036. 10.1016/j.jmb.2004.03.016.
Article CAS PubMed Google Scholar
Phobius. [http://phobius.sbc.su.se/]
Emanuelsson O, Brunak S, von Heijne G, Nielsen H: Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. 2007, 2: 953-971. 10.1038/nprot.2007.131.
Article CAS PubMed Google Scholar
SignalP. [http://www.cbs.dtu.dk/services/SignalP/]
Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305: 567-580. 10.1006/jmbi.2000.4315.
Article CAS PubMed Google Scholar
TMHMM. [http://www.cbs.dtu.dk/services/TMHMM/]
Delorenzi M, Speed T: An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics. 2002, 18: 617-625. 10.1093/bioinformatics/18.4.617.
Article CAS PubMed Google Scholar
Marcoil. [http://www.isrec.isb-sib.ch/webmarcoil/webmarcoilC1.html]
McDonnell AV, Jiang T, Keating AE, Berger B: Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics. 2006, 22: 356-358. 10.1093/bioinformatics/bti797.
Article CAS PubMed Google Scholar
PairCoil2. [http://groups.csail.mit.edu/cb/paircoil2/paircoil2.html]
Nguyen Ba AN, Pogoutse A, Provart N, Moses AM: NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction. BMC Bioinformatics. 2009, 10: 202-10.1186/1471-2105-10-202.
Article PubMed Central PubMed Google Scholar
NLStradamus. [http://www.bar.utoronto.ca/~anguyenba/]
Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004, 5: 113-10.1186/1471-2105-5-113.
Article PubMed Central PubMed Google Scholar
Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302: 205-217. 10.1006/jmbi.2000.4042.
Article CAS PubMed Google Scholar
Wallace IM, O'Sullivan O, Higgins DG, Notredame C: M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 2006, 34: 1692-1699. 10.1093/nar/gkl091.
Article PubMed Central CAS PubMed Google Scholar
T-Coffee Server. [http://www.igs.cnrs-mrs.fr/Tcoffee/tcoffee_cgi/index.cgi]
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876.
Article PubMed Central CAS PubMed Google Scholar
Nicholas KB, Nicholas HBJ, Deerfield DWI: GeneDoc: analysis and visualization of genetic variation. EMBNEWNews. 1997, 4: 1-4.
Google Scholar
Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.
Article CAS PubMed Google Scholar
Lartillot N, Philippe H: A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol. 2004, 21: 1095-1109. 10.1093/molbev/msh112.
Article CAS PubMed Google Scholar
Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52: 696-704. 10.1080/10635150390235520.
Article PubMed Google Scholar
Le SQ, Lartillot N, Gascuel O: Phylogenetic mixture models for proteins. Philos Trans R Soc Lond B Biol Sci. 2008, 363: 3965-3976. 10.1098/rstb.2008.0180.
Article PubMed Central CAS PubMed Google Scholar
Gazave E, Lapebie P, Richards GS, Brunet F, Ereskovsky AV, Degnan BM, Borchiellini C, Vervoort M, Renard E: Origin and evolution of the Notch signalling pathway: an overview from eukaryotic genomes. BMC Evol Biol. 2009, 9: 249-10.1186/1471-2148-9-249.
Article PubMed Central PubMed Google Scholar
NCBI-BLAST. [http://blast.ncbi.nlm.nih.gov/Blast.cgi]
Begley MJ, Taylor GS, Kim SA, Veine DM, Dixon JE, Stuckey JA: Crystal structure of a phosphoinositide phosphatase, MTMR2: insights into myotubular myopathy and Charcot-Marie-Tooth syndrome. Mol Cell. 2003, 12: 1391-1402. 10.1016/S1097-2765(03)00486-6.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank Mhairi Nimick for assistance with figure composition. DK, GBGM and MN are supported by the Natural Sciences and Engineering Research Council of Canada, the Alberta Cancer Board, and the Alberta Ingenuity Carbohydrate Research Group.

Author information

Authors and Affiliations

Department of Biological Sciences, University of Calgary, 2500 University Drive N.W., Calgary, Alberta, T2N 1N4, Canada
David Kerk & Greg BG Moorhead

Authors

David Kerk
View author publications
You can also search for this author in PubMed Google Scholar
Greg BG Moorhead
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Greg BG Moorhead.

Additional information

Authors' contributions

GBGM and DK conceived of the study. DK designed the implementation of the study. DK collected all sequence data, performed domain mapping, multiple sequence alignment, phylogenetic tree analysis, and mined published gene expression data. DK composed the data figures and tables. DK and GBGM wrote and approve the manuscript.

Electronic supplementary material

12862_2009_1412_MOESM1_ESM.PDF

Additional file 1: Full Myotubularin Sequences Alignment. This file presents the full myotubularin sequences alignment, a portion of which was presented in Figure 3. All details of this alignment are the same as described in the Legend to Figure 3, except that in this figure a blue bar is used to denote the extent of the N-terminal PH-GRAM domain, and an orange bar denotes the extent of the phosphatase domain catalytic signature motif. (PDF 529 KB)

12862_2009_1412_MOESM2_ESM.PDF

Additional file 2: Predicted Nuclear Localization Signals (NLS) in Animal Myotubularin Homologue Sequences. This file presents data summarizing predicted nuclear localization signals (NLS) in metazoan myotubularin homologue sequences of the R5 clade. (PDF 329 KB)

12862_2009_1412_MOESM3_ESM.PDF

Additional file 3: Predicted Nuclear Localization Signals (NLS) in Plant Myotubularin Homologue Sequences. This file presents data summarizing predicted nuclear localization signals (NLS) in plant myotubularin homologue sequences. (PDF 332 KB)

12862_2009_1412_MOESM4_ESM.XLS

Additional file 4: URLs for Protein Databases. This file contains the URLs for downloading of all organismal protein datasets searched for myotubularin homologues in this study. It also contains the original literature citation for the publication of each completely sequenced organismal genome. (XLS 44 KB)

12862_2009_1412_MOESM5_ESM.TXT

Additional file 5: Myotubularin Protein Sequences. This file contains the FASTA-formatted sequences for all myotubularin homologues identified in this study, reference human myotubularin proteins, database accession numbers, and sequence designations as used in the data figures. (TXT 110 KB)

12862_2009_1412_MOESM6_ESM.DOCX

Additional file 6: Additional Information on ROCO Sequence Alignment. This file presents species designations and database accession numbers for sequences presented in the multiple sequence alignment of Figure 4. (DOCX 11 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Kerk, D., Moorhead, G.B. A phylogenetic survey of myotubularin genes of eukaryotes: distribution, protein structure, evolution, and gene expression. BMC Evol Biol 10, 196 (2010). https://doi.org/10.1186/1471-2148-10-196

Download citation

Received: 14 August 2009
Accepted: 24 June 2010
Published: 24 June 2010
DOI: https://doi.org/10.1186/1471-2148-10-196

A phylogenetic survey of myotubularin genes of eukaryotes: distribution, protein structure, evolution, and gene expression

Abstract

Background

Results

Conclusions

Background

Results

Phylogenetic Distribution, Gene Evolution, Domain Architecture

Amoebozoan IMLRK (Inactive Myotubularin/LRR/ROCO/Kinase) Genes and Proteins

Myotubularins in the Choanoflagellate/Metazoan Assemblage

Accessory Protein Domains

Myotubularin Gene Expression in Entamoeba

Discussion

Conclusions

Methods

Identification of Putative Myotubularin Homologue Sequences

Determination of Myotubularin Similarity Regions within Sequences

Characterization of Non-Myotubularin Domains within Sequences

Characterization of Additional Protein Sequence Features

Multiple Sequence Alignment

Candidate Myotubularin Sequences

Protein Kinase and ROCO Domains

Phylogenetic Tree Inference

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Ecology and Evolution

Contact us