VSGdb: a database for trypanosome variant surface glycoproteins, a large and diverse family of coiled coil proteins
1 Wellcome Centre for Molecular Parasitology, University of Glasgow, Glasgow Biomedical Research Centre, 120 University Place, Glasgow G12 8TA, UK
2 Department of Pathology, Henry Wellcome Building, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
3 Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
BMC Bioinformatics 2007, 8:143 doi:10.1186/1471-2105-8-143Published: 2 May 2007
Trypanosomes are coated with a variant surface glycoprotein (VSG) that is so densely packed that it physically protects underlying proteins from effectors of the host immune system. Periodically cells expressing a distinct VSG arise in a population and thereby evade immunity. The main structural feature of VSGs are two long α-helices that form a coiled coil, and sets of relatively unstructured loops that are distal to the plasma membrane and contain most or all of the protective epitopes. The primary structure of different VSGs is highly variable, typically displaying only ~20% identity with each other. The genome has nearly 2000 VSG genes, which are located in subtelomeres. Only one VSG gene is expressed at a time, and switching between VSGs primarily involves gene conversion events. The archive of silent VSGs undergoes diversifying evolution rapidly, also involving gene conversion. The VSG family is a paradigm for α helical coiled coil structures, epitope variation and GPI-anchor signals. At the DNA level, the genes are a paradigm for diversifying evolutionary processes and for the role of subtelomeres and recombination mechanisms in generation of diversity in multigene families. To enable ready availability of VSG sequences for addressing these general questions, and trypanosome-specific questions, we have created VSGdb, a database of all known sequences.
VSGdb contains fully annotated VSG sequences from the genome sequencing project, with which it shares all identifiers and annotation, and other available sequences. The database can be queried in various ways. Sequence retrieval, in FASTA format, can deliver protein or nucleotide sequence filtered by chromosomes or contigs, gene type (functional, pseudogene, etc.), domain and domain sequence family. Retrieved sequences can be stored as a temporary database for BLAST querying, reports from which include hyperlinks to the genome project database (GeneDB) CDS Info and to individual VSGdb pages for each VSG, containing annotation and sequence data. Queries (text search) with specific annotation terms yield a list of relevant VSGs, displayed as identifiers leading again to individual VSG web pages.
VSGdb http://www.vsgdb.org/ webcite is a freely available, web-based platform enabling easy retrieval, via various filters, of sets of VSGs that will enable detailed analysis of a number of general and trypanosome-specific questions, regarding protein structure potential, epitope variability, sequence evolution and recombination events.