The mining of toxin-like polypeptides from EST database by single residue distribution analysis
Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences ul. Miklukho-Maklaya, 16/10, 117997, Moscow, Russia
BMC Genomics 2011, 12:88 doi:10.1186/1471-2164-12-88Published: 31 January 2011
Additional file 1:
Supplementary Excel table of reduced databank used in the analysis (read only). Use Save as command and allow macros execution to reach SRDA and complementary functions in this example.
Format: XLS Size: 1.5MB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 2:
Supplementary listing of VBA module. General function description and how to use section. Start a MS Excel program and change security level for macros to medium. Open any existent file or create a new one (allow macros execution in the file). Change file type of Add_file 2 SRDA_processing.bas.txt to SRDA_processing.bas and import all functions included in batch via Microsoft Visual Basic editor (File/import file command).
Type in necessary cell "= function name(" and drag "fx" button located on the left from cell input line. Argument(s) required for function processing should be put in the opening window. Then copy equation to other cells.
-Function ShortDo(seq, excpt) - is a main function capable to produce converted sequence, where:
seq - String variable enclosed amino acid sequence processed by SRDA,
excpt - String variable equal to key residues (combination of any single letter coded amino acid(s) with\without termination symbol".") as a solid word.
-Function Translate(seq, frame) - converts nucleotide to amino acids sequence in appropriated frame, where:
seq - String variable enclosed sequence for translation,
frame - Integer variable defined translation frame, acceptable value is 1,2,3,-1,-2,-3 or 0 (by frame = 0 only reverse compliment nucleotide sequence will be created).
-Function SignalFrom(seq, limitMet, frame, format) - is a function for prediction of acceptable Met residue starting a signal peptide, where:
seq - String variable enclosed amino acid sequence for processing,
limitMet - Integer variable defined a searching range (from the beginning) of Met residue,
frame - Integer variable equal to frame used early by translation (frame range 1-6), this variable is important for calculation a position of first nucleotide started possible signal peptide,
format - Integer variable defined output style:
0 - function returns the position of the first nucleotide,
1 - function returns the position of the first Met in the signal peptide,
2 - function returns the position of the last nucleotide in predicted signal peptide,
3 - function returns the position of the last amino acid in predicted signal peptide,
other digit - function returns the best score calculated for the signal peptide.
-Function TrimSeq(seq, start, finish) - is a function for partial sequence presentation, where:
seq - String variable enclosed nucleotide or amino acid sequence,
start - Integer variable defined the first nucleotide (amino acid),
finish - Integer variable defined the last nucleotide (amino acid).
-Function MatureChain(seq, start, frame, format) - is a function for sequence termination search, where:
seq - String variable enclosed amino acid sequence,
start - Integer variable defined a start position for termination symbol searching,
frame - Integer variable equal to frame used early by translation (frame range 1-6), this variable is important for calculation a position of the last nucleotide in termination codon,
format - Integer variable defined output style:
0 - function returns the position of the last nucleotide in gene,
1 - function returns the position of a termination symbol,
other digit - function returns a polypeptide sequence from start to detected terminus.
-Function Frame6Check(pattern, seq1, seq2, seq3, seq4, seq5, seq6) - prints a frame number in which analyzed sequence(s) match query, where:
pattern - String variable defined any text for matching,
seq1 - seq6 - String variables enclosed amino acid sequences (or converted sequences) translated in 1 to 6 reading frame.
Format: TXT Size: 9KB Download file
Additional file 3:
Supplementary Table. Results of A. viridis EST database processing. Accession numbers of EST sequences in GenBank are given. Homology to known structures was estimated by UFO and PSI-BLAST.
Format: DOC Size: 182KB Download file
This file can be viewed with: Microsoft Word Viewer
Additional file 4:
Supplementary Figure. Multiple sequence alignment of toxin-like, cytolysin-like and hypothetical peptides. Removable by maturation predicted domains are given in light brown. Cysteine residues are highlighted yellow, while positively charged residues Lysine and Arginine are shown in blue. (A) short toxin-like polypeptides retrieved with motifs 11 and 13; (B) long toxin-like polypeptides retrieved with motifs 11 and 13; (C) Cytolysin-like polypeptides retrieved with motif K and hemolytic toxin Equinatoxin-2 (P61914); (D) hypothetical polypeptides identified with motif K.
Format: JPEG Size: 4.3MB Download file