<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-273</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Database</dochead>
      <bibl>
         <title>
            <p>AMYPdb: A database dedicated to amyloid precursor proteins</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Pawlicki</snm>
               <fnm>Sandrine</fnm>
               <insr iid="I1"/>
               <email>sandrine.pawlicki@laposte.net</email>
            </au>
            <au id="A2">
               <snm>Le B&#233;chec</snm>
               <fnm>Antony</fnm>
               <insr iid="I1"/>
               <email>antony.lebechec@univ-rennes1.fr</email>
            </au>
            <au id="A3" ca="yes" ce="yes">
               <snm>Delamarche</snm>
               <fnm>Christian</fnm>
               <insr iid="I1"/>
               <email>christian.delamarche@univ-rennes1.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Universit&#233; de Rennes I and CNRS UMR 6026, Equipe Structure et Dynamique des Macromol&#233;cules, Campus de Beaulieu, Nb 13, 35042 RENNES Cedex, France</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>273</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/273</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18544157</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-273</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>25</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>10</day>
               <month>6</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>10</day>
               <month>6</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Pawlicki et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Misfolding and aggregation of proteins into ordered fibrillar structures is associated with a number of severe pathologies, including Alzheimer's disease, prion diseases, and type II diabetes. The rapid accumulation of knowledge about the sequences and structures of these proteins allows using of <it>in silico </it>methods to investigate the molecular mechanisms of their abnormal conformational changes and assembly. However, such an approach requires the collection of accurate data, which are inconveniently dispersed among several generalist databases.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We therefore created a free online knowledge database (AMYPdb) dedicated to amyloid precursor proteins and we have performed large scale sequence analysis of the included data. Currently, AMYPdb integrates data on 31 families, including 1,705 proteins from nearly 600 organisms. It displays links to more than 2,300 bibliographic references and 1,200 3D-structures. A Wiki system is available to insert data into the database, providing a sharing and collaboration environment. We generated and analyzed 3,621 amino acid sequence patterns, reporting highly specific patterns for each amyloid family, along with patterns likely to be involved in protein misfolding and aggregation.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>AMYPdb is a comprehensive online database aiming at the centralization of bioinformatic data regarding all amyloid proteins and their precursors. Our sequence pattern discovery and analysis approach unveiled protein regions of significant interest. AMYPdb is freely accessible <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Amyloid deposits are abnormal <it>in vivo </it>extracellular aggregates of insoluble proteinaceous fibers exhibiting a cross-beta structure. The proteins or fragments found in these aggregates derive from diverse full-length precursors belonging to families without any obvious functional or structural resemblance. In addition to these quite typical extracellular deposits, other proteins can also form intracellular inclusions. Under the effect of diverse modifications, including interaction with chaperones, mutations, supraphysiological concentrations, post-translational modifications, and so on, amyloid proteins fail to fold properly, thus accumulating irreversibly over long periods, with toxic effect <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>.</p>
         <p>Protein misfolding is associated with a wide range of human diseases called amyloidoses. These may affect multiple tissues, in the case of systemic amyloidoses, or can be limited to a particular organ. Those pathologies may have major health and social impacts, as in the case of Alzheimer's disease <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, or might be somewhat benign, such as the amyloidosis that can occur among diabetics at the site of their insulin injections <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
         <p>Prions are a special case among amyloid proteins because of their unusual properties. They originate from the conversion of a normal host protein into a fibrillar structure that then acts as an infectious particle <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. To date only one prion, PrP, has been discovered in vertebrates. It is involved in major neurodegenerative diseases including Creutzfeldt-Jakob disease, Gerstmann-Stra&#252;ssler-Scheinker syndrome, and Kuru in humans, scrapie in sheep, and spongiform encephalopathy in cattle. Prion proteins are also described in eukaryotic microorganisms (yeasts and fungi). However, in these latter organisms, the prion isoform is not always toxic and can control normal cellular processes <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. The prion concept has been recently extended to include mammalian prion-like proteins, such as Tia-1. This is an RNA-binding protein implicated in the assembly of the cytoplasmic aggregates known as stress granules <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>Schematically, the conversion of a normal soluble protein into insoluble amyloid fibers begins with a conformational change, resulting in an intermediate form, an amyloidogenic isoform. This new conformation favors self-association in small oligomers that act as nucleation units. The growth of the nucleation units leads to the formation of long protofilaments, which are wrapped to form mature fibers <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Biophysical techniques have shown that protofilaments may have various morphologies, but that they share common properties at the molecular level. The amyloid proteins/peptides form either parallel or anti-parallel arrangements of beta-strands. Since these beta-strands are perpendicular to the fiber axis, this has been described as a cross-beta structure <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Despite the difficulties of using experimental approaches to determine the precise 3D-structure of amyloid proteins in their fibrillar state, several models have recently been proposed <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. These discoveries profit from computer simulations being used more and more often in biology.</p>
         <p>Some authors have demonstrated that amyloid-like structures can be obtained <it>in vitro </it>with almost any protein, suggesting that the ability to form fibers is a common property of polypeptide chains <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. However, the number of proteins aggregating <it>in vivo </it>is low compared to the over 3 million sequences stored in the Universal Protein Ressource (UniProtKB), and only include a few specific members of 31 families. The propensity of a protein to aggregate into amyloid fibrils varies greatly with the amino-acid sequence and with cellular environment. To take just two examples: the globular protein lysozyme is only associated with amyloid deposition in the kidney when it presents site-specific mutations (I56T, F57I, W64R, D67H) <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, while phosphorylation of Huntingtin may modulate its cleavage and toxicity <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <p>During the past few years, bioinformatic approaches have been dedicated to the discovery of sequence segments that are sensitive to self-aggregation or that promote protein destabilization <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. All methods presented in these papers are based on similar ideas. Each one tries to calculate various aggregation indexes and profiles by exploiting the information found in kinetic data, peptide/protein sequences, conformation space, and/or 3D-structures. However, a common problem encountered in these <it>in silico </it>experiments is the difficulty of finding and extracting accurate data from the existing literature and various molecular databases.</p>
         <p>For studies focusing on sequence features, three databases are usually particularly useful: the UniProtKB, which provides sequences with functional annotations, comments, and cross-references <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>; the PROSITE database, which consists of a large collection of sequence signatures <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>; and the bibliographical database MEDLINE <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. However, extraction of information from such general databases can be complex and time-consuming due to the large amount of data available and because of the diversity of the gene or protein families. To compensate for this, Siepen <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> developed a specialized database, fibril_one, dedicated to the analysis of mutations associated with fibrillogenesis. Unfortunately, the usefulness of this resource is limited, as fibril_one contains few data and has never been updated.</p>
         <p>To facilitate <it>in silico </it>comparison of proteins involved in the formation of beta-sheet-rich fibrils <it>in vivo</it>, we have created a new multi-user database, the AMYloid Protein database (AMYPdb). The main goal in developing this relational database was to provide a regularly updated access to protein sequences and patterns describing each family. The 3,621 amino acid sequence patterns stored in the database can be screened to facilitate the assigning of new sequences to a particular family and the formulation of hypotheses about their function(s). Patterns conserved in several families may also help in extracting rules about the mechanisms of fibril formation.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Working with AMYPdb</p>
            </st>
            <p>AMYPdb is freely accessible <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Users can browse web pages to obtain descriptions of each family, visualize protein sequences enriched with links to both UniProtKB and the Protein DataBank (PDB), study multiple sequence alignments using the Jalview editor <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, or access bibliographic references. An identity card for each protein is available from the "protein menu". Links to Wikipedia provide further information on some families. Sequences can be selected and exported in FASTA format for further analysis.</p>
            <p>The amino acid sequence patterns are accessible by browsing the pages from the "pattern" menu, or by using the search interface. The search page contains several menus, allowing the user to focus on particular data. For instance, they can interrogate AMYPdb with UniProtKB or PROSITE identifiers or patterns to determine whether a particular protein or pattern is stored in AMYPdb. They can also submit a personal signature to find any matching amyloid proteins, or inversely, they can submit a sequence to find matching AMYPdb patterns. It is also possible to select patterns using thresholds on quality scores. This method is useful for discovering patterns shared between families.</p>
            <p>Beyond functioning as a pure repository of knowledge, AMYPdb also provides private workspaces to anyone interested in further analysis. This allows users to manage their own working sets of proteins and patterns, which can be easily manipulated and organized accordingly to their research interests.</p>
            <p>Below, we have illustrated several ways that AMYPdb can be useful in pattern research on amyloid proteins.</p>
         </sec>
         <sec>
            <st>
               <p>Protein family signatures</p>
            </st>
            <p>AMYPdb patterns can be used to highlight residues though to be important to the structure, function and evolution of protein families. One of the objectives of our project was to propose a list of specific amino acid sequence patterns for each family stored in the database. On the web interface of AMYPdb, the 3,621 amino acid sequence patterns are classified by their CF value. About 27% of the patterns have a CF &#8805; 0.9 (956 of 3,621) and can be considered as very good descriptors for the corresponding amyloid families (Table <tblr tid="T1">1</tblr>, column n&#176;5). The best results are shown in Table <tblr tid="T2">2</tblr>. It is interesting to note that all of the AMYPdb patterns noted in that table are of better quality than those in the PROSITE database.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Amyloid families in AMYPdb</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Family</p>
                     </c>
                     <c ca="left">
                        <p>Function</p>
                     </c>
                     <c ca="left">
                        <p>Pathology</p>
                     </c>
                     <c ca="left">
                        <p>S</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Alpha Fibrinogen</p>
                     </c>
                     <c ca="left">
                        <p>Involved in the coagulation cascade</p>
                     </c>
                     <c ca="left">
                        <p>Autosomal dominant hereditary hepatic or renal amyloidosis</p>
                     </c>
                     <c ca="left">
                        <p>59</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Alpha Synuclein</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>Synucleopathies, such as Alzheimer's and Parkinson's diseases</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>85</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Amyloid Beta Precursor</p>
                     </c>
                     <c ca="left">
                        <p>Protease inhibitor</p>
                     </c>
                     <c ca="left">
                        <p>Alzheimer's disease and aged Down's syndrome</p>
                     </c>
                     <c ca="left">
                        <p>125</p>
                     </c>
                     <c ca="left">
                        <p>108</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Apolipoprotein A-1</p>
                     </c>
                     <c ca="left">
                        <p>Lipid metabolism</p>
                     </c>
                     <c ca="left">
                        <p>Autosomal dominant systemic amyloidosis</p>
                     </c>
                     <c ca="left">
                        <p>54</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Atrial Natriuretic Factor</p>
                     </c>
                     <c ca="left">
                        <p>Blood pressure and sodium balance</p>
                     </c>
                     <c ca="left">
                        <p>Isolated Atrial Amyloid</p>
                     </c>
                     <c ca="left">
                        <p>34</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Beta-2 Microglobulin</p>
                     </c>
                     <c ca="left">
                        <p>Class 1 human leukocyte antigen</p>
                     </c>
                     <c ca="left">
                        <p>Aggregation in the musculoskeletal system</p>
                     </c>
                     <c ca="left">
                        <p>141</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Bri2</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>Familial British Dementia (FBD)</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C-Protein</p>
                     </c>
                     <c ca="left">
                        <p>Major component of lung surfactant</p>
                     </c>
                     <c ca="left">
                        <p>Pulmonary alveolar proteinosis</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>1+</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Calcitonin</p>
                     </c>
                     <c ca="left">
                        <p>Polypeptidic hormone</p>
                     </c>
                     <c ca="left">
                        <p>Amyloid deposits in case of medullary thyroid cancer</p>
                     </c>
                     <c ca="left">
                        <p>37</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cystatin C</p>
                     </c>
                     <c ca="left">
                        <p>Cystein protease inhibitor</p>
                     </c>
                     <c ca="left">
                        <p>Alzheimer's disease and cerebral amyloid angiopathy</p>
                     </c>
                     <c ca="left">
                        <p>67</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Gelsolin</p>
                     </c>
                     <c ca="left">
                        <p>Modulation of actin filament length</p>
                     </c>
                     <c ca="left">
                        <p>Gelsolin familial amyloidosis</p>
                     </c>
                     <c ca="left">
                        <p>39</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Huntingtin</p>
                     </c>
                     <c ca="left">
                        <p>Fast axonal trafficking</p>
                     </c>
                     <c ca="left">
                        <p>Huntington's disease</p>
                     </c>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>109</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Immunoglobulins</p>
                     </c>
                     <c ca="left">
                        <p>Immune response</p>
                     </c>
                     <c ca="left">
                        <p>Light-chain amyloidosis</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Insulin</p>
                     </c>
                     <c ca="left">
                        <p>Metabolism of carbohydrates and fat</p>
                     </c>
                     <c ca="left">
                        <p>Localized amyloidosis at injection sites of type 1 diabetic patients</p>
                     </c>
                     <c ca="left">
                        <p>160</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Islet Amyloid Polypeptide</p>
                     </c>
                     <c ca="left">
                        <p>Glycaemia regulation</p>
                     </c>
                     <c ca="left">
                        <p>Aggregates in pancreatic islets of type 2 diabetes and insulinomas</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>82</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Lactadherin</p>
                     </c>
                     <c ca="left">
                        <p>Anticoagulant?</p>
                     </c>
                     <c ca="left">
                        <p>Aortic medial amyloidoses</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Lactoferrin</p>
                     </c>
                     <c ca="left">
                        <p>Transferrin</p>
                     </c>
                     <c ca="left">
                        <p>Amyloid deposits in the cornea, seminal vesicles and brain</p>
                     </c>
                     <c ca="left">
                        <p>34</p>
                     </c>
                     <c ca="left">
                        <p>94</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Lysozyme</p>
                     </c>
                     <c ca="left">
                        <p>Bacteriolytic enzyme</p>
                     </c>
                     <c ca="left">
                        <p>Non-neuropathic systemic amyloidosis</p>
                     </c>
                     <c ca="left">
                        <p>122</p>
                     </c>
                     <c ca="left">
                        <p>30</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Microcin E492</p>
                     </c>
                     <c ca="left">
                        <p>Bacterial bacteriocine</p>
                     </c>
                     <c ca="left">
                        <p>Regulation of the protein's activity</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Parkin</p>
                     </c>
                     <c ca="left">
                        <p>Proteasomal degradation?</p>
                     </c>
                     <c ca="left">
                        <p>Parkinson's disease</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Prolactin</p>
                     </c>
                     <c ca="left">
                        <p>Hormone secreted by the pituitary gland</p>
                     </c>
                     <c ca="left">
                        <p>Amyloid deposits in pituitary glands of aging individuals</p>
                     </c>
                     <c ca="left">
                        <p>112</p>
                     </c>
                     <c ca="left">
                        <p>54</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Serpin</p>
                     </c>
                     <c ca="left">
                        <p>Serine protease inhibitors</p>
                     </c>
                     <c ca="left">
                        <p>Serpinopathy</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Serum amyloid A</p>
                     </c>
                     <c ca="left">
                        <p>Cell adhesion, migration, and proliferation</p>
                     </c>
                     <c ca="left">
                        <p>Inflammation-associated reactive amyloidosis</p>
                     </c>
                     <c ca="left">
                        <p>74</p>
                     </c>
                     <c ca="left">
                        <p>63</p>
                     </c>
                     <c ca="left">
                        <p>1+</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tau</p>
                     </c>
                     <c ca="left">
                        <p>Microtubule assembly and stability</p>
                     </c>
                     <c ca="left">
                        <p>Alzheimer's disease and dementias</p>
                     </c>
                     <c ca="left">
                        <p>37</p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Transthyretin</p>
                     </c>
                     <c ca="left">
                        <p>Thyroxine transport</p>
                     </c>
                     <c ca="left">
                        <p>Familial amyloid polyneuropathies</p>
                     </c>
                     <c ca="left">
                        <p>54</p>
                     </c>
                     <c ca="left">
                        <p>97</p>
                     </c>
                     <c ca="left">
                        <p>1+</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Het-S</p>
                     </c>
                     <c ca="left">
                        <p>Heterocaryon incompatibility</p>
                     </c>
                     <c ca="left">
                        <p>Prionization involved in the protein's normal function</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>New 1</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>No stable prion has been shown <it>in vivo</it></p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Prion Protein (PrP)</p>
                     </c>
                     <c ca="left">
                        <p>Signal transmission, copper regulation?</p>
                     </c>
                     <c ca="left">
                        <p>Transmissible spongiform encephalopathies and dementias</p>
                     </c>
                     <c ca="left">
                        <p>353</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>2+</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Rnq 1</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Sup35</p>
                     </c>
                     <c ca="left">
                        <p>Translation termination factor</p>
                     </c>
                     <c ca="left">
                        <p>Prionization might be advantageous in stress conditions</p>
                     </c>
                     <c ca="left">
                        <p>82</p>
                     </c>
                     <c ca="left">
                        <p>48</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ure2</p>
                     </c>
                     <c ca="left">
                        <p>Nitrogen metabolism</p>
                     </c>
                     <c ca="left">
                        <p>Loss of Ure2 function</p>
                     </c>
                     <c ca="left">
                        <p>15</p>
                     </c>
                     <c ca="left">
                        <p>99</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The six last lines of the table show the prion families. S, sequences in AMYPdb (full length and fragments); A, AMYPdb patterns with CF &#8805; 0.9; *, not submitted to pattern discovery method; P, PROSITE super family signatures; P+, PROSITE family signatures.</p>
               </tblfn>
            </tbl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Best patterns describing amyloid protein families</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Family</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>CF</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Best pattern</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Alpha Fibrinogen</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>0.91</p>
                     </c>
                     <c ca="left">
                        <p>C-x(7,8)-C-x(3)-[DGHMNPS]-W-[DGHMNPS]-x-K-C-P</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.13</p>
                     </c>
                     <c ca="left">
                        <p>W-W-[LIVMFYW]-x(2)-C-x(2)-[GSA]-x(2)-N-G</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Alpha Synuclein</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>S-[KR]-T-K-E-G-V-V-H-G</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Amyloid Beta Precursor</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>S-x(0,3)-N-[KV]-[GP]-A-[IV]-[AI]-[DEG]-[EL]-[IM]-[QV]-[DG]-[EG]-V-[DV]-[EI]-[AL]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.75</p>
                     </c>
                     <c ca="left">
                        <p>G-Y-E-N-P-T-Y-[KR]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Apolipoprotein A-1</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>0.97</p>
                     </c>
                     <c ca="left">
                        <p>V-[HKR]-x-K-x-[DEGHKNPQRSTY]-[ENPQTV]-x-L-[DE]-[DEHNPQSWY]-[FILMVY]-[DEGHKNPQRSTY]-x-[EHIKLMQV]-x(2)-[ENPQTV]-[DEHKNPQY]-[ACHILMV]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Atrial Natriuretic Factor</p>
                     </c>
                     <c ca="left">
                        <p>A*</p>
                     </c>
                     <c ca="left">
                        <p>0.98</p>
                     </c>
                     <c ca="left">
                        <p>C-F-G-x-[KR]-[ILM]-D-R-I-G-[ANST]-x-S-[GS]-[LM]-G-C-[GNS]-[GNPRS]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.53</p>
                     </c>
                     <c ca="left">
                        <p>C-F-G-x(3)-[DEA]-[RH]-I-x(3)-S-x(2)-G-C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Beta-2 Microglobulin</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>0.71</p>
                     </c>
                     <c ca="left">
                        <p>P-x(2,3)-Q-[ETV]-[DGY]-[PST]-[ER]-x-[PW]-x-[DENQS]-x-[DGNT]-[DEKRT]-x-[NT]-x-[AILV]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.23</p>
                     </c>
                     <c ca="left">
                        <p>[FY]-{L}-C-x-[VA]-{LC}-H</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C-Protein</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>R-L-L-[IV]-[AIV]-[VY]-[KV]-[PV]-[AIV]-[PV]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.95</p>
                     </c>
                     <c ca="left">
                        <p>I-P-C-C-P-V</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Calcitonin</p>
                     </c>
                     <c ca="left">
                        <p>A*</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>L-S-T-C-[MV]-L-[GS]-x-[LY]-[STW]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.53</p>
                     </c>
                     <c ca="left">
                        <p>C-[SAGDN]-[STN]-x(0,1)-[SA]-T-C-[VMA]-x(3)-[LYF]-x(3)-[LYF]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cystatin C</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>0.68</p>
                     </c>
                     <c ca="left">
                        <p>V-x(2,6)-Q-x(1,2)-V-[AS]-G-x(2)-[HY]-[FIKRY]-[FLMV]-x-[IMV]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.29</p>
                     </c>
                     <c ca="left">
                        <p>[GSTEQKRV]-Q-[LIVT]-[VAF]-[SAGQ]-G-{DG}-[LIVMNK]-{TK}-x-[LIVMFY]-x-[LIVMFYA]-[DENQKRHSIV]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Gelsolin</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>D-[DS]-[IV]-M-[ILMV]-L-D-[AST]-[GW]-[DN]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Huntingtin</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>0.94</p>
                     </c>
                     <c ca="left">
                        <p>L-Y-[GK]-E-I-K-[KR]-N-[AG]-[AN]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Insulin</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>0.95</p>
                     </c>
                     <c ca="left">
                        <p>L-C-G-x-[DGHMNPS]-L-x(0,1)-V-x(5,6)-C-x(3)-G</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.49</p>
                     </c>
                     <c ca="left">
                        <p>C-C-{P}-{P}-x-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Islet Amyloid Polypeptide</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>S-[HNRS]-N-x(1,2)-G-[APT]-[AIV]-[FL]-x-[PS]-[PT]-[DKNS]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.33</p>
                     </c>
                     <c ca="left">
                        <p>C-[SAGDN]-[STN]-x(0,1)-[SA]-T-C-[VMA]-x(3)-[LYF]-x(3)-[LYF]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Lactadherin</p>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.15</p>
                     </c>
                     <c ca="left">
                        <p>[GASP]-W-x(7,15)-[FYW]-[LIV]-x-[LIVFA]-[GSTDEN]-x(6)-[LIVF]-x(2)-[IV]-x-[LIVT]-[QKMT]-G</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Lactoferrin</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>P-V-[AT]-E-A-[EKQR]-[NS]-C-[HY]-L-A-x-A-P-[NS]-H-A-V-V-S</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.38</p>
                     </c>
                     <c ca="left">
                        <p>[DENQ]-[YF]-x-[LY]-L-C-x-[DN]-x(5,8)-[LIV]-x(4,5)-C-x(2)-A-x(4)-[HQR]-x-[LIVMFYW]-[LIVM]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Lysozyme</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>G-[ILV]-[FL]-[EQ]-[IL]-N-[DNS]-x(2)-W</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.95</p>
                     </c>
                     <c ca="left">
                        <p>C-x(3)-C-x(2)-[LMF]-x(3)-[DEN]-[LI]-x(5)-C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Prolactin</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>R-D-S-x-K-[IV]-[DK]-[NST]-[FY]-L</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.44</p>
                     </c>
                     <c ca="left">
                        <p>C-x-[STN]-x(2)-[LIVMFYS]-x-[LIVMSTA]-P-x(5)-[TALIV]-x(7)-[LIVMFY]-x(6)-[LIVMFY]-x(2)-[STACV]-W</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Serum amyloid A</p>
                     </c>
                     <c ca="left">
                        <p>A*</p>
                     </c>
                     <c ca="left">
                        <p>0.96</p>
                     </c>
                     <c ca="left">
                        <p>N-x(1,4)-D-x(3)-[HRY]-[AG]-[PR]-G-[GNS]-x-[DEW]-A-[AQ]-[EKQR]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.95</p>
                     </c>
                     <c ca="left">
                        <p>A-R-G-N-Y-[ED]-A-x-[QKR]-R-G-x-G-G-x-W-A</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tau</p>
                     </c>
                     <c ca="left">
                        <p>A*</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>G-S-x(0,1)-D-N-[IMV]-[KNRT]-H-x-P-G-G-G-[EKNS]-[KV]-[KQ]-I-x-[DHTY]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.40</p>
                     </c>
                     <c ca="left">
                        <p>G-S-x(2)-N-x(2)-H-x-[PA]-[AG]-G(2)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Transthyretin</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>C-P-L-[MT]-V-K-[IV]-L-D</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.56</p>
                     </c>
                     <c ca="left">
                        <p>[KH]-[IV]-L-[DN]-x(3)-G-x-P-[AG]-x(2)-[LIVM]-x-[IV]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Prion Protein (PrP)</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>0.99</p>
                     </c>
                     <c ca="left">
                        <p>A-x(0,1)-A-x(0,1)-G-x(0,1)-A-[AIV]-[AGV]-[GKY]-x-[AILMV]-x-[DGR]-x(2)-[LMR]-[GPS]-[HRS]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.93</p>
                     </c>
                     <c ca="left">
                        <p>E-x-[ED]-x-K-[LIVM](2)-x-[KR]-[LIVM](2)-x-[QE]-M-C-x(2)-Q-Y</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Sup35</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>0.97</p>
                     </c>
                     <c ca="left">
                        <p>G-x(0,1)-A-x(1,2)-A-[ADEGKNPQRST]-x-[ADEGKNPQRST]-x-L-V-I-S-[ADEGKNPQRST]-[ADEGKNPQRST]-[ADEGKNPQRST]-G-E-[CFHILMVWY]-E</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>0.13</p>
                     </c>
                     <c ca="left">
                        <p>D-[KRSTGANQFYW]-x(3)-E-[KRAQ]-x-[RKQD]-[GC]-[IVMK]-[ST]-[IV]-x(2)-[GSTACKRNQ]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ure2</p>
                     </c>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>0.97</p>
                     </c>
                     <c ca="left">
                        <p>E-F-P-E-V-Y-K-W-T-K</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>A, best AMYPdb patterns; P, PROSITE super family signatures; P+, PROSITE family signatures; CF, correlation factor; *, indication that the AMYPdb pattern overlaps the PROSITE pattern.</p>
               </tblfn>
            </tbl>
            <p>There are many advantages in describing a protein family using several patterns rather than only one or two, as is done in PROSITE. First, the occurrence of more than one pattern increases confidence that a protein belongs to a specific family. Pattern distribution along sequences can also be used to assess conserved and variable regions in proteins. Indeed, highly specific patterns only describe conserved regions in proteins. Examples of this are the Tau and prolactin protein families. Human Tau protein is characterized by 13 patterns with CF &#8805; 0.9, all found in the C-terminal region of the protein and covering barely 14% of the sequence. This suggests that the C-terminal part of Tau is the protein's main domain (indeed it is the microtubule-interacting region). On the other hand, 54 patterns with CF &#8805; 0.9 are characteristic of human prolactin, and they are distributed all along the sequence, covering 32% of it. This suggests the presence of numerous important regions, which are likely correlated to the many known biological effects of prolactin.</p>
         </sec>
         <sec>
            <st>
               <p>Amino acid sequence pattern exploration</p>
            </st>
            <sec>
               <st>
                  <p>Signatures of biological interest</p>
               </st>
               <p>Although patterns in AMYPdb were created from precursor proteins, users can easily access signatures matching aggregation features and other biological annotations. Indeed, for each pattern, the AMYPdb interface displays its position in sequences, along with the corresponding UniProtKB features. There are 836 highly specific patterns (CF &#8805; 0.9) covering annotated regions in proteins. Among these, 251 patterns match variants associated with keywords such as aggregation, amyloidosis, Alzheimer, Parkinson, and so on (FT variant lines in UniProtKB). We have successfully used AMYPdb for knowledge-rich data mining concerning three amyloid families: transthyretin; tau; and prion.</p>
               <p>&#8226; In AMYPdb, 97 patterns (CF &#8805; 0.9), distributed over the entire sequence of human transthyretin (hTTR), map 31 of the 37 single-site amyloidogenic variants described in UniProtKB. In pattern G-<ul>E</ul>-[I<ul>L</ul>V]-H-[EGN]-<ul>L</ul>-x(0,1)-<ul>T</ul>-x(3,4)-<ul>F</ul>-x(2)-G-[<ul>I</ul>LV]-[I<ul>Y</ul>]-[<ul>K</ul>R]-[IL<ul>V</ul>]-E, the 9 underlined amino acids correspond to pathogenic missense mutations in human TTR (positions 73&#8211;92). In particular, the variant I88L is associated with an amyloid cardiomyopathy. Interestingly, the multiple sequence alignment available in AMYPdb reveals that leucine exists in the wild-type sequence of seven organisms, including bovine and sheep. Comparative study of these proteins with hTTR could help to understand the effect of the mutation isoleucine/leucine in human disease.</p>
               <p>&#8226; The human Tau protein sequence deduced from the gene is composed of 757 amino acids. It exists however in the human brain as 6 alternatively-spliced isoforms of 352 to 441 amino acids (Tau-A to F), with each isoform containing 3 or 4 repeat domains (R repeats). Using the AMYPdb search interface, we researched each isoform, and found 8 patterns (<it>CF </it>&#8805; 0.9) matching 1 to 6 isoforms. One of the patterns, P-G-G-G-[KNS]-V-Q-I-[FIV]-[DHNY] is observed in all of the isoforms, and matches "hot spot" regions for nucleation, &#946;-sheet aggregation and fibril formation both <it>in vitro </it>and <it>in silico </it><abbrgrp><abbr bid="B19">19</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. The pattern is located at the junction between R repeats. It matches 3 regions in human tau (PGGGKVQIVY, PGGGKVQIIN and PGGGSVQIVY), that correspond to 2 kinds of junctions: R1&#8211;R3 in Tau-A, Tau-B and Tau-C (3 R repeats); and R1&#8211;R2 and R2&#8211;R3 in Tau-D, Tau-E and Tau-F (4 R repeats). Moreover, the pattern includes variants described as being involved in tau pathogenicity: N596K, delV597, P618L, P618S and S622N (numbering according to UniProtKB P10306).</p>
               <p>&#8226; In a recent study, Hamodrakas et al. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> predicted amyloidogenic determinants in several proteins by combining three methods. In the case of human prion (P04156), the authors pointed out the segments 175&#8211;183 (FVHDCVNIT), 209&#8211;215 (VVEQMCI) and 242&#8211;251 (LLISFLIFLI). Using AMYPdb we searched for amino acid sequence patterns and UniProtKB features matching these segments. Some results are summarized in Table <tblr tid="T3">3</tblr>. The high density of mutation/modification sites overlapping the first two amyloidogenic segments is intriguing. Indeed, these segments contain both the cysteines involved in the unique disulfide bond between helix 2 and 3, and a glycozylation site involved in prion strain propagation <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Although other regions have been shown to be important for prion propagation, susceptibility, and other activities, our observations reinforce the idea that <it>in silico </it>investigations are more efficient when they combine several methods, such as sequence pattern discovery, aggregation prediction, bibliographical knowledge, and so on.</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>PrP regions including amyloidogenic determinants according to [21]</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="left">
                           <p>Segments of PrP sequence</p>
                        </c>
                        <c ca="center">
                           <p>Mutations</p>
                        </c>
                        <c ca="center">
                           <p>Modification sites</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>SNQNNFVH<ul>D</ul>C<ul>V</ul>NI<ul>T</ul>IKQHTF</p>
                        </c>
                        <c ca="center">
                           <p>D178N, V180I, T183A</p>
                        </c>
                        <c ca="center">
                           <p>N-Glycozylation 181&#8211;184 Kinase C 183&#8211;185 Disulfide 179&#8211;214</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>KMME<ul>R</ul>V<ul>VEQ</ul>MCIT<ul>Q</ul>YER</p>
                        </c>
                        <c ca="center">
                           <p>R208H, V210I, E211Q, Q212P, Q217R</p>
                        </c>
                        <c ca="center">
                           <p>Kinase II 216&#8211;219 Disulfide 179&#8211;214</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>S<ul>P</ul>PVILLISFLIFLIVG</p>
                        </c>
                        <c ca="center">
                           <p>P238S</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>Column 1: Human prion protein (P04156), segments 170&#8211;188, 204&#8211;220, and 237&#8211;253. Underlined amino acids are those of column 2. Columns 2 and 3: Pathologenic point mutations and potential protein modifications according to UniProtKB.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Sequence patterns conserved in several families</p>
               </st>
               <p>By using fitness scores values in the search interface of AMYPdb, it is possible to discover unexpected relations between families of interest. An intriguing observation of cross-conserved patterns between the huntingtin and prolactin families illustrates this. Although these families do not have any known resemblance at their structural/functional level, there are 4 amino acid sequence patterns in AMYPdb matching these 2 families with Sen > 0.5 and Spe > 0.99. These patterns are described below. No non-amyloid sequence among the more than 2 million stored in UniProtKB (release 6.1) contains these 4 signatures simultaneously.</p>
               <p>(1) L-[DEGHKNPQRSTY]-C-x(0,2)-R-[ACDNPST]-[AGSTW]-x-[FIKLMRY]</p>
               <p>(2) F-x(2)-[LV]-[ILM]-x-[CQS]-x(2)-R</p>
               <p>(3) L-x(0,1)-T-x(2,3)-D-[KS]-[DEHY]</p>
               <p>(4) L-[DEGHKNPQRSTY]-x-L-[DEHKNQR]-C-[DEGHKNPQRSTY]-x(2)-[DEHKNPQY]</p>
               <p>The patterns are distributed all along the sequences. In human huntingtin (3,144 residues) the patterns N&#176;1 to N&#176;4 match respectively at position 212, 1501, 2092, 2789. Positions in human prolactin (227 residues) are 91, 108, 200, and 214 for patterns N&#176; 3, 2, 1 and 4 respectively. Two of those patterns match known structural/functional features in human huntingtin and human prolactin. Pattern N&#176;1 is located in the first "HEAT repeat" of huntingtin, belonging to the N-terminal part of the fragment found in amyloid aggregates <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. The patterns N&#176;1 and N&#176;4 contain cysteines known to be involved in disulfide bridges in prolactin. They are located in the 4th &#945;-helix of prolactin, already established to be part of the site of interaction with one of the prolactin receptors <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. The segment of human huntingtin and prolactin corresponding to pattern N&#176;2 is located in the middle of each protein and is predicted by the TANGO algorithm to be a &#946;-aggregating segment <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. To our knowledge, this is the first time that patterns have been described as shared between huntingtin and prolactin.</p>
               <p>Although it seems unlikely that these results were due to chance, we searched for another pattern to confirm our observations. We used PRATT with a new set of 99 sequences, corresponding to all full-length huntingtin and prolactin proteins. We discovered a new highly specific pattern (N&#176;5), R-[DV]-S-x-K-x(2)-[ANSTV]-x(3)-[FILV]-[AGL]-x-[ACS], conserved in 100% of the data set (Sen = 1). In a recent version of UniProtKB (release 10.5, containing more than 4.7 million sequences), this pattern retrieves new prolactin and huntingtin sequences, and only about 50 false positive sequences ones. Pattern N&#176;5 is located at position 710 and 205 in human huntingtin and prolactin respectively, and includes a potential serine phosphorylation site.</p>
               <p>This process can be applied to other families. However, it is clear that the quality of a prediction depends on the quality and number of patterns found in common. Experimental work should be undertaken to confirm our observations and to further understand the functional/structural significance of the conserved motifs shared between the huntingtin and prolactin families. These could reveal interaction sites with common cofactors such as chaperones, or common motifs involved in aggregation processes.</p>
            </sec>
            <sec>
               <st>
                  <p>Pattern repeats</p>
               </st>
               <p>Amino acid repeats play an important structural role in proteins and are often associated with diseases. This is the case with huntingtin, which shows a polyglutamine tract in its N-terminal part. However, repeats are not limited to single amino acids, but can include domains repetitions <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. For example, repeats are thought to be involved in PrP prionization in mammals, since birds, reptiles, fish and amphibians do not show the same domain architecture <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>. In AMYPdb, 41 distinct patterns cover the N-terminal domain of PrPs. We observed that the amino acid sequences of various patterns and their number of occurrences is closely linked to the phylogenetic differences described above, such as for the pattern: G-[GHKQRY]-[GNPSTY]-x-G-[GHQY]-G-x(0,3)-G-[QSWY]-[GHNPQ]-[GHPQRS]-[GNPQSTY]-x-[AGHNST]. In species from fishes to birds (which PrP is not demonstrated to be pathogenic), only 1 occurrence of this pattern was found in the N-terminal region while it is repeated 2 to 5 times in mammalian PrPs. The repetition might therefore act to facilitate the structural conversion of PrPc into PrPsc.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this paper, we present a knowledge database dedicated to amyloid precursor proteins and their amino acid sequence signatures <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Our work sheds light on the signatures that best describe each amyloid family. Moreover, we have extracted several patterns of interest to demonstrate how users can easily take advantage of the database for their own research. Note that because there are only sparse data on sequences which can form fibrils in vivo, especially non-human organisms, we cannot yet automatically predict aggregation regions. In the future we will continue to enrich the database with new families and new functionalities, ensuring that AMYPdb will remain a reference tool for researchers interested in bioinformatic approaches to protein misfolding and aggregation. A wiki system available in the "identity card" of the proteins allows experts to add high quality data.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Implementation and structure of the database</p>
            </st>
            <p>AMYPdb is a MySQL relational database. The web-based interface was created using PHP and JavaScript, and relies on a modified version of the e107 content management system. The data are stored in 23 tables and occupies nearly 4 GB of disk space (Figure <figr fid="F1">1</figr>). The central table contains general information relevant to protein sequences. This is linked to additional tables containing two kinds of data: information collected from public libraries; and the results of protein sequence analysis. This data organization is suitable for complex queries, such as the extraction of amino-acid signatures matching the regions involved in aggregation.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Database structure diagram</p>
               </caption>
               <text>
                  <p>
                     <b>Database structure diagram.</b>
                  </p>
               </text>
               <graphic file="1471-2105-9-273-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Raw data</p>
            </st>
            <p>The data flow is described in Figure <figr fid="F2">2</figr>. Raw data were extracted from three main public databases: protein files from UniProtKB (release 3.2); references from MEDLINE; and known patterns from PROSITE. We used the Sequence Retrieval System (SRS) program implemented at the European Bioinformatics Institute <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. SRS has the advantage of possessing a unique interface for interrogating multiple databases. In addition, results can be saved in eXtended Markup Language (XML) format, thus facilitating the exchange and manipulation of large amounts of data between different programs. UniProtKB and MEDLINE were searched using keywords describing the amyloid protein families. The keywords used were mainly protein and gene names commonly used in the amyloid research field. Several phylogenetically distant sequences of each family were then submitted to ScanProsite <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. This program scans protein sequences for the occurrence of signatures stored in the PROSITE database. The accession numbers of PROSITE patterns were then queried in SRS. All retrieved XML files were imported into AMYPdb. After this first selection step, we obtained a catalog of 31 amyloid families (Table <tblr tid="T1">1</tblr>), containing 1,284 amyloid proteins, 1,692 references and 38 PROSITE patterns. Data stored in AMYPdb are those of precursor proteins, amyloidogenic peptides and partial sequences. In this first version of AMYPdb, no immunoglobulin light/heavy chains have been stored, due to the high sequence diversity of those proteins.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>AMYPdb data flow</p>
               </caption>
               <text>
                  <p>
                     <b>AMYPdb data flow.</b>
                  </p>
               </text>
               <graphic file="1471-2105-9-273-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Amino acid sequence pattern discovery</p>
            </st>
            <sec>
               <st>
                  <p>Pattern discovery</p>
               </st>
               <p>We employed the commonly-used software PRATT, developed to extract conserved patterns in a set of unaligned protein sequences <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. The "advanced PRATT" version, accessible at OUEST-Genopole<sup>&#174; </sup><abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, allows users to specify amino acids clusters, thus orienting the discovery of interesting patterns. We selected most of the default parameters of the program, and limited the maximum pattern length to 20 amino acids. This choice is in agreement with data from the recent literature, which show that short protein stretches may be involved in the self-assembly process of amyloid proteins <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>.</p>
               <p>Various analyses were carried out with "advanced PRATT" using the default clusters, based on the physico-chemical properties of amino acids. Moreover, we defined two other sets of clusters, corresponding to criteria related to amyloid aggregation. The first set was designed based on the ability of amino acids to either form beta sheets (CIFTWYV), alpha helices (AREQLK) or other secondary structures (NDGHMPS) <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. The second set was based on whether amino acids are found at protein-protein interfaces (CHILMFWYV) or not (ARNDEQGKPST) <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. A refinement parameter was systematically tested (on and off). When the parameter was switched on, ambiguous pattern positions were generalized using the groups of similar amino acids. Among the 31 known amyloid families, 9 could not be submitted to pattern matching because of their small number of known sequences. On the other hand, a few families had enough known sequences to design several data sets. Finally, we applied the pattern discovery method to 42 sets of sequences using 6 parameters, and selected patterns matching 100% of the sequences of each set. Thus, as described in Figure <figr fid="F2">2</figr>, and including the 38 PROSITE patterns, AMYPdb contains 3,621 patterns related to amyloid protein families.</p>
            </sec>
            <sec>
               <st>
                  <p>Pattern matching</p>
               </st>
               <p>In addition to pattern comparison, we also scanned the UniProtKB database (release 6.1) for sequences matching any of the 3,621 patterns. To do this, we used WAPAM <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, specifically developed to parse a list of amino acid patterns and to search for those patterns in sequence databases. Compared to other pattern-matching tools, WAPAM has several advantages. It has no limit in the pattern's length, flexibility, or indetermination. It also uses Rdisk technology, a specialized architecture that can highly accelerate a search. Using WAPAM with Rdisk, the scan of UniProtKB for the 3,621 patterns took less than 15 hours, instead of the estimated 2,000 hours it would have taken without it. UniProtKB returned 267,490 sequences matching AMYPdb patterns, although the number here is underestimated due to WAPAM's retrieval limitation. The UniProtKB files of these sequences were stored as a non-amyloid group in AMYPdb and were used for the classification procedure described below.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Database updating procedure</p>
            </st>
            <p>Since the content of UniProtKB was evolving during the various development phases of our project (631,592 entries added from release 3.2 to 6.1), we updated AMYPdb by semi-automatically sorting the 267,490 sequences extracted with WAPAM. From this group, we picked out 421 protein sequences matching highly specific AMYPdb patterns, and we assigned these sequences to the corresponding amyloid families. This updating procedure increased the AMYPdb sequence group to 1,705 members (1,063 full-length sequences and 642 fragments, Figure <figr fid="F2">2</figr>), and leaving 267,069 sequences in the non-amyloid protein group.</p>
         </sec>
         <sec>
            <st>
               <p>Pattern quality</p>
            </st>
            <p>To measure pattern performance, we used three fitness scores commonly used in classification problems: sensitivity (Sen); specificity (Spe); and correlation (CF) <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr></abbrgrp>. The scores of the 3,621 patterns were calculated for each family, using only full-length sequences. For each pattern, true positives (TP) and false negatives (FN) are sequences of a family respectively either matching or not matching the pattern. False positives (FP) are either amyloid sequences not belonging to the considered family but which match the pattern, or non-amyloid sequences matching the pattern. True negatives (TN) are non-amyloid sequences not matching the pattern. Therefore, when a pattern is specific for one amyloid family, it has high Sen, Spe and CF scores for that family and low scores for other amyloid families. When a pattern is conserved in several amyloid families, the Spe of the pattern remains high for each family, but Sen and CF scores can decrease dramatically. Due to calculation limitations, non-amyloid sequences were obtained from 1 of 3 data sets: the 2,032,835 proteins of UniProtKB was used for patterns matching less than 5000 non-amyloid proteins; the 267,069 non-amyloid sequences resulting from the WAPAM search was used for patterns for matching between 5,000 and 10,000 non-amyloid proteins; and a random pool of 50,000 proteins was used for patterns matching more than 10,000 non-amyloid proteins.</p>
            <p>For each pattern, sensitivity corresponds to its ability to describe an amyloid family, while specificity corresponds to its ability to discard proteins not belonging to the amyloid family. The Pearson-Mathews correlation coefficient measures the global prediction accuracy by which a pattern describes a protein family. A CF of +1 indicates that the pattern perfectly differentiate the amyloid family from other amyloid families and non-amyloid proteins.</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-273-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>S</m:mi>
                                       <m:mi>e</m:mi>
                                       <m:mi>n</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mi>P</m:mi>
                                       <m:mo>/</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mi>P</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mi>F</m:mi>
                                       <m:mi>N</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>S</m:mi>
                                       <m:mi>p</m:mi>
                                       <m:mi>e</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mi>N</m:mi>
                                       <m:mo>/</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>F</m:mi>
                                       <m:mi>P</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mi>N</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>C</m:mi>
                                       <m:mi>F</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>T</m:mi>
                                             <m:mi>P</m:mi>
                                             <m:mo>&#215;</m:mo>
                                             <m:mi>T</m:mi>
                                             <m:mi>N</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>F</m:mi>
                                             <m:mi>P</m:mi>
                                             <m:mo>&#215;</m:mo>
                                             <m:mi>F</m:mi>
                                             <m:mi>N</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msqrt>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mi>T</m:mi>
                                                   <m:mi>P</m:mi>
                                                   <m:mo>+</m:mo>
                                                   <m:mi>F</m:mi>
                                                   <m:mi>P</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                   <m:mo>&#215;</m:mo>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mi>F</m:mi>
                                                   <m:mi>P</m:mi>
                                                   <m:mo>+</m:mo>
                                                   <m:mi>T</m:mi>
                                                   <m:mi>N</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                   <m:mo>&#215;</m:mo>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mi>T</m:mi>
                                                   <m:mi>N</m:mi>
                                                   <m:mo>+</m:mo>
                                                   <m:mi>F</m:mi>
                                                   <m:mi>N</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                   <m:mo>&#215;</m:mo>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mi>F</m:mi>
                                                   <m:mi>N</m:mi>
                                                   <m:mo>+</m:mo>
                                                   <m:mi>T</m:mi>
                                                   <m:mi>P</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                             </m:msqrt>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeaabmqaaaqaaiabdofatjabdwgaLjabd6gaUjabg2da9iabdsfaujabdcfaqjabc+caViabcIcaOiabdsfaujabdcfaqjabgUcaRiabdAeagjabd6eaojabcMcaPaqaaiabdofatjabdchaWjabdwgaLjabg2da9iabdsfaujabd6eaojabc+caViabcIcaOiabdAeagjabdcfaqjabgUcaRiabdsfaujabd6eaojabcMcaPaqaaiabdoeadjabdAeagjabg2da9KqbaoaalaaabaGaeiikaGIaeiikaGIaemivaqLaemiuaaLaey41aqRaemivaqLaemOta4KaeiykaKIaeyOeI0IaeiikaGIaemOrayKaemiuaaLaey41aqRaemOrayKaemOta4KaeiykaKIaeiykaKcabaWaaOaaaeaacqGGOaakcqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbaucqGGPaqkcqGHxdaTcqGGOaakcqWGgbGrcqWGqbaucqGHRaWkcqWGubavcqWGobGtcqGGPaqkcqGHxdaTcqGGOaakcqWGubavcqWGobGtcqGHRaWkcqWGgbGrcqWGobGtcqGGPaqkcqGHxdaTcqGGOaakcqWGgbGrcqWGobGtcqGHRaWkcqWGubavcqWGqbaucqGGPaqkaeqaaaaaaaaaaa@84F1@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
         </sec>
         <sec>
            <st>
               <p>Amyloid classification</p>
            </st>
            <p>The accuracy of hypotheses deduced from bioinformatic methods strongly depends on data sets quality. In the present study, the sequence sets used in the pattern discovery method were those of protein families. In order to facilitate sequence extraction, especially for the discovery of patterns link to misfolding and aggregation, all the proteins were sorted into one of the five following quality categories:</p>
            <p>&#8226; Amyloid <it>in vivo</it>: the precursor protein, or a specific sub-segment, forms fibrils in human, or animals, or is a yeast prion. Proteins of this class are unambiguously described in literature and are identified by specific keywords in UniProtKB ("Amyloid" or "Prion").</p>
            <p>&#8226; Amyloid <it>in vitro</it>: the polypeptide forms fibrils under experimental conditions.</p>
            <p>&#8226; Amyloid <it>in silico</it>: the polypeptide forms fibrils using computational techniques, including protein threading and molecular dynamics simulations.</p>
            <p>&#8226; Putative amyloid protein: the protein is a member of an amyloid family, but the amyloid properties of that specific member were not assessed.</p>
            <p>&#8226; Unclassified protein: the protein family does not fulfill the definition of amyloid <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, but sparse data show that at least one member of the family shares some amyloid properties.</p>
            <p>At the present time, the classes 2 and 3 are empty. Experts are welcome to contribute to relevance of biological information by changing the status of the proteins using the Wiki system.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>SP designed and built the MySQL database and the web interface. Both SP and CD carried out data analysis and drafted the manuscript. ALB participated in the final version of the database, including interface improvements and assignment of the proteins to the amyloid classification. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We are grateful for support from the R&#233;gion Bretagne to S. Pawlicki. We thank the CRITT Sant&#233; Bretagne for financial support. We also thank G. Georges, M. Giraud, L. Guillot, G. Ranchy and A-S. Valin, from the Ouest-Genopole<sup>&#174;</sup>, for providing free bioinformatics services. We thank Juliana Berland for her careful reading of the manuscript, and Erwan Rio for aid in programming. We also acknowledge the anonymous referees for their useful suggestions.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>AMYPdb</p>
            </title>
            <url>http://amypdb.univ-rennes1.fr</url>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Folding proteins in fatal ways</p>
            </title>
            <aug>
               <au>
                  <snm>Selkoe</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>426</volume>
            <issue>6968</issue>
            <fpage>900</fpage>
            <lpage>904</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02264</pubid>
                  <pubid idtype="pmpid" link="fulltext">14685251</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Review: history of the amyloid fibril</p>
            </title>
            <aug>
               <au>
                  <snm>Sipe</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>AS</fnm>
               </au>
            </aug>
            <source>J Struct Biol</source>
            <pubdate>2000</pubdate>
            <volume>130</volume>
            <issue>2-3</issue>
            <fpage>88</fpage>
            <lpage>98</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jsbi.2000.4221</pubid>
                  <pubid idtype="pmpid" link="fulltext">10940217</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Amyloid: toward terminology clarification. Report from the Nomenclature Committee of the International Society of Amyloidosis</p>
            </title>
            <aug>
               <au>
                  <snm>Westermark</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Benson</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Buxbaum</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Frangione</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ikeda</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Masters</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Merlini</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Saraiva</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Sipe</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Amyloid</source>
            <pubdate>2005</pubdate>
            <volume>12</volume>
            <issue>1</issue>
            <fpage>1</fpage>
            <lpage>4</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16076605</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Alzheimer's disease: beta-Amyloid protein and tau</p>
            </title>
            <aug>
               <au>
                  <snm>Morishima-Kawashima</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ihara</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Neurosci Res</source>
            <pubdate>2002</pubdate>
            <volume>70</volume>
            <issue>3</issue>
            <fpage>392</fpage>
            <lpage>401</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/jnr.10355</pubid>
                  <pubid idtype="pmpid" link="fulltext">12391602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Insulin as an amyloid-fibril protein at sites of repeated insulin injections in a diabetic patient</p>
            </title>
            <aug>
               <au>
                  <snm>Dische</snm>
                  <fnm>FE</fnm>
               </au>
               <au>
                  <snm>Wernstedt</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Westermark</snm>
                  <fnm>GT</fnm>
               </au>
               <au>
                  <snm>Westermark</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pepys</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Rennie</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Gilbey</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Watkins</snm>
                  <fnm>PJ</fnm>
               </au>
            </aug>
            <source>Diabetologia</source>
            <pubdate>1988</pubdate>
            <volume>31</volume>
            <issue>3</issue>
            <fpage>158</fpage>
            <lpage>161</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00276849</pubid>
                  <pubid idtype="pmpid">3286343</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Prions</p>
            </title>
            <aug>
               <au>
                  <snm>Prusiner</snm>
                  <fnm>SB</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <issue>23</issue>
            <fpage>13363</fpage>
            <lpage>13383</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">33918</pubid>
                  <pubid idtype="pmpid" link="fulltext">9811807</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.23.13363</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Amyloid formation modulates the biological activity of a bacterial protein</p>
            </title>
            <aug>
               <au>
                  <snm>Bieler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Estrada</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lagos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Baeza</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Castilla</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Soto</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2005</pubdate>
            <volume>280</volume>
            <issue>29</issue>
            <fpage>26880</fpage>
            <lpage>26885</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M502031200</pubid>
                  <pubid idtype="pmpid" link="fulltext">15917245</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Non-mendelian inheritance of the HET-s prion or HET-s prion domains determines the het-S spore killing system in Podospora anserina</p>
            </title>
            <aug>
               <au>
                  <snm>Dalstra</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>van der Zee</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Swart</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hoekstra</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Saupe</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Debets</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Fungal Genet Biol</source>
            <pubdate>2005</pubdate>
            <volume>42</volume>
            <issue>10</issue>
            <fpage>836</fpage>
            <lpage>847</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.fgb.2005.05.004</pubid>
                  <pubid idtype="pmpid" link="fulltext">16043372</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Stress granule assembly is mediated by prion-like aggregation of TIA-1</p>
            </title>
            <aug>
               <au>
                  <snm>Gilks</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kedersha</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Ayodele</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Stoecklin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dember</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Mol Biol Cell</source>
            <pubdate>2004</pubdate>
            <volume>15</volume>
            <issue>12</issue>
            <fpage>5383</fpage>
            <lpage>5398</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">532018</pubid>
                  <pubid idtype="pmpid" link="fulltext">15371533</pubid>
                  <pubid idtype="doi">10.1091/mbc.E04-08-0715</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Probing the origins, diagnosis and treatment of amyloid diseases using antibodies</p>
            </title>
            <aug>
               <au>
                  <snm>Dumoulin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dobson</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Biochimie</source>
            <pubdate>2004</pubdate>
            <volume>86</volume>
            <issue>9-10</issue>
            <fpage>589</fpage>
            <lpage>600</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.biochi.2004.09.012</pubid>
                  <pubid idtype="pmpid" link="fulltext">15556268</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Common core structure of amyloid fibrils by synchrotron X-ray diffraction</p>
            </title>
            <aug>
               <au>
                  <snm>Sunde</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Serpell</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Bartlam</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Pepys</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>CC</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>273</volume>
            <issue>3</issue>
            <fpage>729</fpage>
            <lpage>739</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.1348</pubid>
                  <pubid idtype="pmpid" link="fulltext">9356260</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Recent atomic models of amyloid fibril structure</p>
            </title>
            <aug>
               <au>
                  <snm>Nelson</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>2</issue>
            <fpage>260</fpage>
            <lpage>265</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.sbi.2006.03.007</pubid>
                  <pubid idtype="pmpid" link="fulltext">16563741</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Prediction of "aggregation-prone" and "aggregation-susceptible" regions in proteins associated with neurodegenerative diseases</p>
            </title>
            <aug>
               <au>
                  <snm>Pawar</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Dubay</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Zurdo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chiti</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Vendruscolo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dobson</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>350</volume>
            <issue>2</issue>
            <fpage>379</fpage>
            <lpage>392</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2005.04.016</pubid>
                  <pubid idtype="pmpid" link="fulltext">15925383</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Lysozyme: a paradigmatic molecule for the investigation of protein structure, function and misfolding</p>
            </title>
            <aug>
               <au>
                  <snm>Merlini</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bellotti</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Clin Chim Acta</source>
            <pubdate>2005</pubdate>
            <volume>357</volume>
            <issue>2</issue>
            <fpage>168</fpage>
            <lpage>172</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cccn.2005.03.022</pubid>
                  <pubid idtype="pmpid" link="fulltext">15913589</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Huntingtin phosphorylation sites mapped by mass spectrometry. Modulation of cleavage and toxicity</p>
            </title>
            <aug>
               <au>
                  <snm>Schilling</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gafni</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Torcassi</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cong</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Row</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>LaFevre-Bernt</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Cusack</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Ratovitski</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hirschhorn</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ross</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>BW</fnm>
               </au>
               <au>
                  <snm>Ellerby</snm>
                  <fnm>LM</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2006</pubdate>
            <volume>281</volume>
            <issue>33</issue>
            <fpage>23686</fpage>
            <lpage>23697</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M513507200</pubid>
                  <pubid idtype="pmpid" link="fulltext">16782707</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Proteins associated with diseases show enhanced sequence correlation between charged residues</p>
            </title>
            <aug>
               <au>
                  <snm>Dima</snm>
                  <fnm>RI</fnm>
               </au>
               <au>
                  <snm>Thirumalai</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>15</issue>
            <fpage>2345</fpage>
            <lpage>2354</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth245</pubid>
                  <pubid idtype="pmpid" link="fulltext">15073020</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Prediction of the absolute aggregation rates of amyloidogenic polypeptide chains</p>
            </title>
            <aug>
               <au>
                  <snm>DuBay</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Pawar</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Chiti</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Zurdo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dobson</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Vendruscolo</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>341</volume>
            <issue>5</issue>
            <fpage>1317</fpage>
            <lpage>1326</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.06.043</pubid>
                  <pubid idtype="pmpid" link="fulltext">15302561</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Fernandez-Escamilla</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Rousseau</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Schymkowitz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Serrano</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2004</pubdate>
            <volume>22</volume>
            <issue>10</issue>
            <fpage>1302</fpage>
            <lpage>1306</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1012</pubid>
                  <pubid idtype="pmpid" link="fulltext">15361882</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Prediction of amyloidogenic and disordered regions in protein chains</p>
            </title>
            <aug>
               <au>
                  <snm>Galzitskaya</snm>
                  <fnm>OV</fnm>
               </au>
               <au>
                  <snm>Garbuzynskiy</snm>
                  <fnm>SO</fnm>
               </au>
               <au>
                  <snm>Lobanov</snm>
                  <fnm>MY</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>12</issue>
            <fpage>e177</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1761655</pubid>
                  <pubid idtype="pmpid" link="fulltext">17196033</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020177</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Consensus prediction of amyloidogenic determinants in amyloid fibril-forming proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Hamodrakas</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Liappa</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Iconomidou</snm>
                  <fnm>VA</fnm>
               </au>
            </aug>
            <source>Int J Biol Macromol</source>
            <pubdate>2007</pubdate>
            <volume>41</volume>
            <issue>3</issue>
            <fpage>295</fpage>
            <lpage>300</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ijbiomac.2007.03.008</pubid>
                  <pubid idtype="pmpid" link="fulltext">17477968</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Sequence determinants of amyloid fibril formation</p>
            </title>
            <aug>
               <au>
                  <snm>Lopez de la Paz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Serrano</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <issue>1</issue>
            <fpage>87</fpage>
            <lpage>92</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">314143</pubid>
                  <pubid idtype="pmpid" link="fulltext">14691246</pubid>
                  <pubid idtype="doi">10.1073/pnas.2634884100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Protein aggregation and amyloidosis: confusion of the kinds?</p>
            </title>
            <aug>
               <au>
                  <snm>Rousseau</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Schymkowitz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Serrano</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>1</issue>
            <fpage>118</fpage>
            <lpage>126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.sbi.2006.01.011</pubid>
                  <pubid idtype="pmpid" link="fulltext">16434184</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Prediction of "hot spots" of aggregation in disease-linked polypeptides</p>
            </title>
            <aug>
               <au>
                  <snm>Sanchez de Groot</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Pallares</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Aviles</snm>
                  <fnm>FX</fnm>
               </au>
               <au>
                  <snm>Vendrell</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ventura</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Struct Biol</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <fpage>18</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1262731</pubid>
                  <pubid idtype="pmpid" link="fulltext">16197548</pubid>
                  <pubid idtype="doi">10.1186/1472-6807-5-18</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Prediction of aggregation rate and aggregation-prone segments in polypeptide sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Tartaglia</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Cavalli</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pellarin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Caflisch</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <issue>10</issue>
            <fpage>2723</fpage>
            <lpage>2734</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2253302</pubid>
                  <pubid idtype="pmpid" link="fulltext">16195556</pubid>
                  <pubid idtype="doi">10.1110/ps.051471205</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The 3D profile method for identifying fibril-forming segments of proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Sievers</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Karanicolas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ivanova</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <issue>11</issue>
            <fpage>4074</fpage>
            <lpage>4078</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1449648</pubid>
                  <pubid idtype="pmpid" link="fulltext">16537487</pubid>
                  <pubid idtype="doi">10.1073/pnas.0511295103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Detecting hidden sequence propensity for amyloid fibril formation</p>
            </title>
            <aug>
               <au>
                  <snm>Yoon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Welsh</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <issue>8</issue>
            <fpage>2149</fpage>
            <lpage>2160</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2279810</pubid>
                  <pubid idtype="pmpid" link="fulltext">15273309</pubid>
                  <pubid idtype="doi">10.1110/ps.04790604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The Universal Protein Resource (UniProt): an expanding universe of protein information</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ferro</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Magrane</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Mazumder</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Redaschi</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Suzek</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>Database issue</issue>
            <fpage>D187</fpage>
            <lpage>91</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347523</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381842</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj161</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>The PROSITE database</p>
            </title>
            <aug>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bulliard</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cerutti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>De Castro</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Langendijk-Genevaux</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Pagni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>Database issue</issue>
            <fpage>D227</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347426</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381852</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>MEDLINE</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/sites/entrez/</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The fibril_one on-line database: mutations, experimental conditions, and trends associated with amyloid fibril formation</p>
            </title>
            <aug>
               <au>
                  <snm>Siepen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Westhead</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2002</pubdate>
            <volume>11</volume>
            <issue>7</issue>
            <fpage>1862</fpage>
            <lpage>1866</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2373654</pubid>
                  <pubid idtype="pmpid" link="fulltext">12070339</pubid>
                  <pubid idtype="doi">10.1110/ps.0204302</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The Jalview Java alignment editor</p>
            </title>
            <aug>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>3</issue>
            <fpage>426</fpage>
            <lpage>427</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg430</pubid>
                  <pubid idtype="pmpid" link="fulltext">14960472</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Characterization of two VQIXXK motifs for tau fibrillization in vitro</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>VM</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2006</pubdate>
            <volume>45</volume>
            <issue>51</issue>
            <fpage>15692</fpage>
            <lpage>15701</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi061422+</pubid>
                  <pubid idtype="pmpid" link="fulltext">17176091</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Prediction of nucleating sequences from amyloidogenic propensities of tau-related peptides</p>
            </title>
            <aug>
               <au>
                  <snm>Rojas Quijano</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>Morrow</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Wise</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Brancia</snm>
                  <fnm>FL</fnm>
               </au>
               <au>
                  <snm>Goux</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2006</pubdate>
            <volume>45</volume>
            <issue>14</issue>
            <fpage>4638</fpage>
            <lpage>4652</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi052226q</pubid>
                  <pubid idtype="pmpid" link="fulltext">16584199</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Molecular mechanisms of prion pathogenesis</p>
            </title>
            <aug>
               <au>
                  <snm>Aguzzi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sigurdson</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Heikenwaelder</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Annu Rev Pathol</source>
            <pubdate>2008</pubdate>
            <volume>3</volume>
            <fpage>11</fpage>
            <lpage>40</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.pathmechdis.3.121806.154326</pubid>
                  <pubid idtype="pmpid" link="fulltext">18233951</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Amino-terminal fragments of mutant huntingtin show selective accumulation in striatal neurons and synaptic toxicity</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Johnston</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shelbourne</snm>
                  <fnm>PF</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>XJ</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <issue>4</issue>
            <fpage>385</fpage>
            <lpage>389</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/78054</pubid>
                  <pubid idtype="pmpid" link="fulltext">10932179</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Solution structure of human prolactin</p>
            </title>
            <aug>
               <au>
                  <snm>Teilum</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hoch</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Goffin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Kinet</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Martial</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Kragelund</snm>
                  <fnm>BB</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>351</volume>
            <issue>4</issue>
            <fpage>810</fpage>
            <lpage>823</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2005.06.042</pubid>
                  <pubid idtype="pmpid" link="fulltext">16045928</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Protein repeats: structures, functions, and evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Andrade</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Perez-Iratxeta</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
            </aug>
            <source>J Struct Biol</source>
            <pubdate>2001</pubdate>
            <volume>134</volume>
            <issue>2-3</issue>
            <fpage>117</fpage>
            <lpage>131</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jsbi.2001.4392</pubid>
                  <pubid idtype="pmpid" link="fulltext">11551174</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Molecular features of the copper binding sites in the octarepeat domain of the prion protein</p>
            </title>
            <aug>
               <au>
                  <snm>Burns</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Aronoff-Spencer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dunham</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Lario</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Avdievich</snm>
                  <fnm>NI</fnm>
               </au>
               <au>
                  <snm>Antholine</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Olmstead</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Vrielink</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gerfen</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Peisach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Millhauser</snm>
                  <fnm>GL</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2002</pubdate>
            <volume>41</volume>
            <issue>12</issue>
            <fpage>3991</fpage>
            <lpage>4001</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi011922x</pubid>
                  <pubid idtype="pmpid" link="fulltext">11900542</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Prion protein devoid of the octapeptide repeat region restores susceptibility to scrapie in PrP knockout mice</p>
            </title>
            <aug>
               <au>
                  <snm>Flechsig</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Shmerling</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hegyi</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Raeber</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Fischer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cozzio</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Aguzzi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Weissmann</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Neuron</source>
            <pubdate>2000</pubdate>
            <volume>27</volume>
            <issue>2</issue>
            <fpage>399</fpage>
            <lpage>408</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0896-6273(00)00046-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">10985358</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>The EBI SRS server--recent developments</p>
            </title>
            <aug>
               <au>
                  <snm>Zdobnov</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Etzold</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>2</issue>
            <fpage>368</fpage>
            <lpage>373</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.2.368</pubid>
                  <pubid idtype="pmpid" link="fulltext">11847095</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>ExPASy Proteomics tools</p>
            </title>
            <url>http://www.expasy.org/tools/</url>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Finding flexible patterns in unaligned protein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Jonassen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1995</pubdate>
            <volume>4</volume>
            <issue>8</issue>
            <fpage>1587</fpage>
            <lpage>1595</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2143188</pubid>
                  <pubid idtype="pmpid" link="fulltext">8520485</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Plate-forme bio-informatique GENOUEST</p>
            </title>
            <url>http://www.genouest.org</url>
         </bibl>
         <bibl id="B45">
            <title>
               <p>The amyloid stretch hypothesis: recruiting proteins toward the dark side</p>
            </title>
            <aug>
               <au>
                  <snm>Esteras-Chopo</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Serrano</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lopez de la Paz</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <issue>46</issue>
            <fpage>16672</fpage>
            <lpage>16677</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1283810</pubid>
                  <pubid idtype="pmpid" link="fulltext">16263932</pubid>
                  <pubid idtype="doi">10.1073/pnas.0505905102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Prediction of amyloid fibril-forming proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Kallberg</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gustafsson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Persson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Thyberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Johansson</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2001</pubdate>
            <volume>276</volume>
            <issue>16</issue>
            <fpage>12945</fpage>
            <lpage>12950</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M010402200</pubid>
                  <pubid idtype="pmpid" link="fulltext">11134035</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Dissecting subunit interfaces in homodimeric proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Bahadur</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Chakrabarti</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rodier</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Janin</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2003</pubdate>
            <volume>53</volume>
            <issue>3</issue>
            <fpage>708</fpage>
            <lpage>719</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10461</pubid>
                  <pubid idtype="pmpid" link="fulltext">14579361</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Dissecting protein-protein recognition sites</p>
            </title>
            <aug>
               <au>
                  <snm>Chakrabarti</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Janin</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2002</pubdate>
            <volume>47</volume>
            <issue>3</issue>
            <fpage>334</fpage>
            <lpage>343</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10085</pubid>
                  <pubid idtype="pmpid" link="fulltext">11948787</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Approaches to the automatic discovery of patterns in biosequences</p>
            </title>
            <aug>
               <au>
                  <snm>Brazma</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Jonassen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Eidhammer</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>1998</pubdate>
            <volume>5</volume>
            <issue>2</issue>
            <fpage>279</fpage>
            <lpage>305</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9672833</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>A structural study for the optimisation of functional motifs encoded in protein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Via</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Helmer-Citterich</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>50</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">420233</pubid>
                  <pubid idtype="pmpid" link="fulltext">15119965</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-50</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
