This article is part of the supplement: Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010)
Computational prediction of type III secreted proteins from gram-negative bacteria
1 Department of Computer Science and Engineering, Information Engineering College, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, PR China
2 Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, School of Life Sciences, Fudan University, 220 Handan Road, Shanghai, 200433, PR China
3 Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521, USA
4 Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
5 Institute for Integrative Genome Biology, University of California, Riverside, CA 92521, USA
6 College of Information Science and Technology, Tsinghua University, Beijing 100084, PR China
BMC Bioinformatics 2010, 11(Suppl 1):S47 doi:10.1186/1471-2105-11-S1-S47Published: 18 January 2010
Type III secretion system (T3SS) is a specialized protein delivery system in gram-negative bacteria that injects proteins (called effectors) directly into the eukaryotic host cytosol and facilitates bacterial infection. For many plant and animal pathogens, T3SS is indispensable for disease development. Recently, T3SS has also been found in rhizobia and plays a crucial role in the nodulation process. Although a great deal of efforts have been done to understand type III secretion, the precise mechanism underlying the secretion and translocation process has not been fully understood. In particular, defined secretion and translocation signals enabling the secretion have not been identified from the type III secreted effectors (T3SEs), which makes the identification of these important virulence factors notoriously challenging. The availability of a large number of sequenced genomes for plant and animal-associated bacteria demands the development of efficient and effective prediction methods for the identification of T3SEs using bioinformatics approaches.
We have developed a machine learning method based on the N-terminal amino acid sequences to predict novel type III effectors in the plant pathogen Pseudomonas syringae and the microsymbiont rhizobia. The extracted features used in the learning model (or classifier) include amino acid composition, secondary structure and solvent accessibility information. The method achieved a precision of over 90% on P. syringae in a cross validation study. In combination with a promoter screen for the type III specific promoters, this classifier trained on the P. syringae data was applied to predict novel T3SEs from the genomic sequences of four rhizobial strains. This application resulted in 57 candidate type III secreted proteins, 17 of which are confirmed effectors.
Our experimental results demonstrate that the machine learning method based on N-terminal amino acid sequences combined with a promoter screen could prove to be a very effective computational approach for predicting novel type III effectors in gram-negative bacteria. Our method and data are available to the public upon request.