ProfileGrids as a new visual representation of large multiple sequence alignments: a case study of the RecA protein family
-
* Corresponding author: Alberto I Roca aroca@uci.edu
Department of Molecular Biology and Biochemistry, 560 Steinhaus Hall, University of California, Irvine, California 92697-3900, USA
BMC Bioinformatics 2008, 9:554 doi:10.1186/1471-2105-9-554
Published: 22 December 2008Additional files
Additional file 1:
Multiple sequence alignment of bacterial RecA homologs. A subset of the 300 sequences is shown representing each of the major bacterial phyla. In the alignment, a dash (-) indicates a gap and a period indicates an amino acid identical to the E. coli RecA protein. NCBI Protein database accession numbers are listed at the end unless the data was taken from the TIGR unfinished microbial genomes database. Summary lines above the alignment were calculated from all 300 sequences. The "Bioin" line indicates the bioinformatic structural elements (nanoanatomy) across the entire RecA protein: 12 motifs and the 10 connecting variable regions. "Secon" are the secondary structural elements from the E. coli RecA crystal structure where "a" are α helices, "b" are β strands, "l" are disordered loops, and "?" are disordered termini [62]. In each case the letter or number name of the element is given in the second position. "Ident" are the 21 resides identical in all 300 sequences. "Chemi" are the 39 chemically conservative substitutions based on the following amino acid classification: a = (DE), b = (HKR), f = (AGILV), m = (NQ), o = (FWY), h = (ST), i = (P), s = (CM). "Funct" lists the 55 functionally conservative residue substitutions based on the classification: a = (DE), b = (HKR), f = (AFILMPVW), p = (CGNQSTY). Finally, "Major" are the 187 residues conserved above a 70% majority threshold (210 sequences) with invariant residues shown in uppercase. The numbering of the alignment is based upon the E. coli RecA protein sequence.
Format: PDF Size: 39KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 2:
Detailed ProfileGrid of the RecA protein family. The frequency values were calculated from the 300 RecA sequences over the full length (352 residues) of the E. coli RecA homolog (top sequence) that determines the position numbering. The "Major" summary line is the 187 residues conserved above a 70% majority threshold. The 12 RecA family motifs are boxed and labeled (as in Additional file 1) while the connecting variable regions are only labeled. Frequency values are shaded in the ranges of 50 to 69% (light gray), 70 to 89% (dark gray), and 90 to 100% (black). Since we anticipate updating the analysis in the future, this is version 1.0 of the RecA ProfileGrid.
Format: PDF Size: 71KB Download file
This file can be viewed with: Adobe Acrobat Reader
