Table 2

Example entries of the processed definition file

>NP_ 775259,NM_173167, Q8IWX7

60

I00

SAP

gTC A

|NM_173167|

dbSNP : 16970659

60

I00

SAP

v I

|Q8IWX7|

V I (dbSNP : 16970659): FTId = V AR_ 027506

199

V 00

SAP

GcA T

|NM_173167|

dbSNP : 35749208

377

R00

SAP

AaG G

|NM_173167|

dbSNP : 41389545

496

H00

SAP

d H

|Q8IWX7|

D H (breast cancersomatic mutation). FTId = VAR_035870

778

Q00

SAP

CgG A

|NM_173167|

dbSNP : 34242925

852

N00

SAP

AtC A

|NM_173167|

dbSNP : 11654824

852

N00

SAP

i N

|Q8IWX7|

I N (dbSNP : 11654824). FTId = V AR_027507


>NP_076410,NM_023921,Q9NY W0

92

N08,N09, N10, N11,N12

PTM

Nlinked(GlcNAc...) |Q9NY W0| N linked(GlcNAc...)(Potential)

156

M00

SAP

AcG T

|NM_023921|

dbSNP : 597468

156

M00

SAP

m T

|Q9NY W0|

M T (dbSNP : 597468) FTId = V AR_030009

156

M00

SAP

t M

|NP_076410|

Alignment with Q9NY W0

158

N08,N09,N10, N11,N12

PTM

Nlinked(GlcNAc...) |Q9NY W0| N linked(GlcNAc...)(Potential)


Two sequence clusters are shown in this table to demonstrate the structure of our processed information file. The text line after the ">" symbol contains accession numbers associated with the members of the cluster. The other rows each contains six entries separated by tabs. The first column indicates the residue position. The second column indicates the modified residue(s) that can occur at the position specified in the first column. The third column, labeled by either SAP or PTM, indicates the modification type. The fifth column contains the accession number of the source of modification, this may be a protein sequence or a mRNA. The fourth column explains the origin of the modification; a lower case letter indicates residue content in the source sequence, the upper case letter indicates the modified residue in the variant sequence. The notation, v I, indicates the source sequence with amino acid V can change into I, ie, a SAP. The notation, gTC A, is a short hand for codon change from gtc to atc, ie, a SNP that changes the coded amino acid from V to I as well. The sixth column contains additional information for the fourth column. It may include disease information or database entry index. As an example, in the first row of the first cluster, we have dbSNP : 16970659 indicating this SNP comes from the NCBI's dbSNP with entry index 16970659. In the fifth row, the sixth column contains disease origin. The additional Feature Identifier (FTId), VAR_xxxxxx, is the variant sequence documented by SwissProt. In the second cluster, fourth row, we see in the sixth column "Alignment with Q9NYW0", indicating that this SAP comes from the mismatch in the alignment between protein sequences in the clustering procedure. In the first and the last row of the second cluster, the second column contains N08, N09,..., N12, all of which are possible post-translational modifications associated with Glycosylations [22] as indicated in the sixth column.

Alves et al. BMC Genomics 2008 9:505   doi:10.1186/1471-2164-9-505

Open Data