Periodicity of SNP distribution around transcription start sites
-
* Corresponding author: Kenshi Hayashi khayashi@gen.kyushu-u.ac.jp
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Maidashi 3-1-1, Higashi-ku, Fukuoka 812-8582 Fukuoka, Japan
BMC Genomics 2006, 7:66 doi:10.1186/1471-2164-7-66
Published: 3 April 2006Additional files
Additional File 1:
Periodicity of mono- and di-nucleotide sequences around TSSs. The spectra of nucleotide frequency for three TSS categories; all TSS (A), CGI-TSSs (B) and nonCGI-TSSs (C). The side views are shown on the left of the diagram panels. The magenta and red lines are the means and 99 % confidence intervals of the power values that were determined from the distributions of the values in simulations using randomly chosen genomic positions as described in the text. The dynamic color range goes from blue to red, corresponding to 0 and 200 in the Z-score, respectively. a.u., arbitrary units.
Format: PDF Size: 817KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional File 2:
Autocorrelation function maps for AA and TT dinucleotides around the TSSs. Autocorrelation function maps around CGI-TSSs (A and B) and nonCGI-TSSs (C and D). After the calculation of autocorrelation function in the sliding windows (146 nucleotides) with a step of 5 nucleotides from -3,000 to +3,000 nucleotides relative to the TSSs, the moving average over 3 distances for each function was applied to remove a period 3 due to the coding region. The position of the window's center is represented. The statistical significance of functions in the each window was evaluated by mean and standard deviation calculated from those of all sliding windows. The dynamic color range goes from blue to red, corresponding to 0 and 10 in the negative log of probability p, respectively. The contour interval is 1.0. The zones of possible nucleosome occupancy as judged from the 10 nucleotide periodicity are in pink.
Format: PDF Size: 611KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional File 3:
Spectrum analysis of the SNP density in the short wave-length range. The spectra of SNP density distribution for three TSS categories; all TSS (A), CGI-TSSs (B) and nonCGI-TSSs (C). Short sliding windows (128 nucleotides) at a step of 5 nucleotides from -3,000 to +3,000 relative to TSSs were adopted. The side views are shown on the left of the diagram panels. The magenta and red lines are the means and 99 % confidence intervals of the power values that were determined from the distributions of the values in simulations using randomly chosen genomic positions as described in the text. The dynamic color range goes from blue to red, corresponding to 0 and 25 in the Z-score, respectively. a.u., arbitrary units.
Format: PDF Size: 482KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional File 4:
List of NM ID in each category. List of the accession numbers of mRNA Reference Sequences (NM ID) in each TSS category.
Format: ZIP Size: 59KB Download file
Additional File 5:
Sequence around TSSs (Part 1). Fasta formatted file containing the sequence from position -3,000 to +3,000 relative to TSS.
Format: GZ Size: 8.3MB Download file
Additional File 6:
Sequence around TSSs (Parts 2). Fasta formatted file containing the sequence from position -3,000 to +3,000 relative to TSS.
Format: GZ Size: 8.3MB Download file
Additional File 7:
Accession numbers of the Reference Sequences and SNP information. List of the accession numbers of the 10,171 unique human mRNA Reference Sequences (NM ID) used to analyze the distribution of the SNP density in this study; includes the associated chromosome number, contig ID, position in contig, and SNP information from position -3,000 to +3,000 relative to TSSs (1 and 0 indicate hit and no-hit of validated SNP at each position, respectively)
Format: GZ Size: 552KB Download file
Additional File 8:
Matlab script and input files. The input files for Matlab script is a 2-column 6001-line table in ASCII format, in which the first and second columns represent the relative position to TSSs and the number of SNPs at each position, respectively. For nucleotide divergence, an additional third column represents the number of aligned sequences between the humans and chimpanzees at each position relative to TSSs.
Format: ZIP Size: 143KB Download file
