Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Open Badges Research article

Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome

Xuan Zhuang1, Chun Yang23, Svein-Erik Fevolden4 and C-H Christina Cheng1*

Author Affiliations

1 Department of Animal Biology, University of Illinois, Urbana-Champaign, Illinois, 61801, USA

2 Department of Molecular and Integrative Physiology, University of Illinois, Urbana-Champaign, Illinois, 61801, USA

3 Currently at EMD Millipore, San Diego, CA, USA

4 Department of Arctic and Marine Biology, University of Tromsø, N-9037, Tromsø, Norway

For all author emails, please log on.

BMC Genomics 2012, 13:293  doi:10.1186/1471-2164-13-293

Published: 2 July 2012

Additional files

Additional file 1:

Nucleotide sequence alignment of the Gm1-1 AFGP gene [GenBank:AF529262] from the Øresund, Denmark Atlantic cod and six of the seven AFGP genes we identified from Atlantic cod genome data [[3]]. ATLCOD1A_AFGP7 was not included in the alignment due to long insertions in the putative intron region at 5′ of the sequence (given in lower case). ‘N’s represent gaps in the Atlantic cod sequence assembly [3]. Dashed lines indicate gaps introduced by the alignment. Asterisks indicate nucleotide identity in the column disregarding “N”. Single-letter amino acid translation of Gm1-1 is given in red, below the first nucleotide of each codon in the nucleotide alignment. ATLCOD1A_AFGP1 has the most number of amino acid substitutions (first line below Gm1-1 aa sequence), while all other ATLCOD1A AFGP sequences have few substitutions (second line below Gm1-1 amino acid sequence). Substitutions given in green would disrupt the regular (Ala/Pro-Ala-Thr) tripeptide units, and those given in blue would not. ATLCOD1A_AFGP1 has a reading frame shift at the 5′ to AFGP coding region, which would render it a pseudogene unless the frame shift reflects sequencing or assembly error. ATLCOD1A_AFGP5 and Gm1-1 are very likely counterparts in the respective individuals as their aligned sequences are 99.8% identical. The grey shaded sequences in ALTCOD1A AFGP genes were identified and masked by Star et al. using RepeatMasker with RepBase Update (teleost) TE library, and a custom library created de novo with RepeatModeler to identify novel repeats in the Atlantic cod genome (Supplementary Note 16 and Supplementary Table 6 of [3]). The repeat masking eliminated almost all partial AFGP coding sequences that remained after the initial removal of highly repetitive sequences from the Roche 454 reads prior to sequence assembly.

Format: DOC Size: 75KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data