Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

ProfileGrids as a new visual representation of large multiple sequence alignments: a case study of the RecA protein family

Alberto I Roca*, Albert E Almada and Aaron C Abajian

Author Affiliations

Department of Molecular Biology and Biochemistry, 560 Steinhaus Hall, University of California, Irvine, California 92697-3900, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9:554  doi:10.1186/1471-2105-9-554

Published: 22 December 2008

Additional files

Additional file 1:

Multiple sequence alignment of bacterial RecA homologs. A subset of the 300 sequences is shown representing each of the major bacterial phyla. In the alignment, a dash (-) indicates a gap and a period indicates an amino acid identical to the E. coli RecA protein. NCBI Protein database accession numbers are listed at the end unless the data was taken from the TIGR unfinished microbial genomes database. Summary lines above the alignment were calculated from all 300 sequences. The "Bioin" line indicates the bioinformatic structural elements (nanoanatomy) across the entire RecA protein: 12 motifs and the 10 connecting variable regions. "Secon" are the secondary structural elements from the E. coli RecA crystal structure where "a" are α helices, "b" are β strands, "l" are disordered loops, and "?" are disordered termini [62]. In each case the letter or number name of the element is given in the second position. "Ident" are the 21 resides identical in all 300 sequences. "Chemi" are the 39 chemically conservative substitutions based on the following amino acid classification: a = (DE), b = (HKR), f = (AGILV), m = (NQ), o = (FWY), h = (ST), i = (P), s = (CM). "Funct" lists the 55 functionally conservative residue substitutions based on the classification: a = (DE), b = (HKR), f = (AFILMPVW), p = (CGNQSTY). Finally, "Major" are the 187 residues conserved above a 70% majority threshold (210 sequences) with invariant residues shown in uppercase. The numbering of the alignment is based upon the E. coli RecA protein sequence.

Format: PDF Size: 39KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Detailed ProfileGrid of the RecA protein family. The frequency values were calculated from the 300 RecA sequences over the full length (352 residues) of the E. coli RecA homolog (top sequence) that determines the position numbering. The "Major" summary line is the 187 residues conserved above a 70% majority threshold. The 12 RecA family motifs are boxed and labeled (as in Additional file 1) while the connecting variable regions are only labeled. Frequency values are shaded in the ranges of 50 to 69% (light gray), 70 to 89% (dark gray), and 90 to 100% (black). Since we anticipate updating the analysis in the future, this is version 1.0 of the RecA ProfileGrid.

Format: PDF Size: 71KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data