<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-227</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>MIDAS: software for analysis and visualisation of interallelic disequilibrium between multiallelic markers</p>
         </title>
         <aug>
            <au id="A1" ca="yes" ce="yes">
               <snm>Gaunt</snm>
               <mi>R</mi>
               <fnm>Tom</fnm>
               <insr iid="I1"/>
               <email>Tom.Gaunt@soton.ac.uk</email>
            </au>
            <au id="A2" ce="yes">
               <snm>Rodriguez</snm>
               <fnm>Santiago</fnm>
               <insr iid="I1"/>
               <email>S.Rodriguez@soton.ac.uk</email>
            </au>
            <au id="A3">
               <snm>Zapata</snm>
               <fnm>Carlos</fnm>
               <insr iid="I2"/>
               <email>bfcazaba@usc.es</email>
            </au>
            <au id="A4">
               <snm>Day</snm>
               <mi>NM</mi>
               <fnm>Ian</fnm>
               <insr iid="I1"/>
               <email>I.N.M.Day@soton.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Human Genetics Division, University of Southampton, School of Medicine, Duthie Building (MP 808), Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK</p>
            </ins>
            <ins id="I2">
               <p>Departamento de Gen&#233;tica, Universidad de Santiago, Santiago de Compostela, Spain</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>227</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/227</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16643648</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-227</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>19</day>
               <month>12</month>
               <year>2005</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>27</day>
               <month>4</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>27</day>
               <month>4</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Gaunt et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Various software tools are available for the display of pairwise linkage disequilibrium across multiple single nucleotide polymorphisms. The HapMap project also presents these graphics within their website. However, these approaches are limited in their use of data from multiallelic markers and provide limited information in a graphical form.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We have developed a software package (MIDAS &#8211; Multiallelic Interallelic Disequilibrium Analysis Software) for the estimation and graphical display of interallelic linkage disequilibrium. Linkage disequilibrium is analysed for each allelic combination (of one allele from each of two loci), between all pairwise combinations of any type of multiallelic loci in a contig (or any set) of many loci (including single nucleotide polymorphisms, microsatellites, minisatellites and haplotypes). Data are presented graphically in a novel and informative way, and can also be exported in tabular form for other analyses. This approach facilitates visualisation of patterns of linkage disequilibrium across genomic regions, analysis of the relationships between different alleles of multiallelic markers and inferences about patterns of evolution and selection.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>MIDAS is a linkage disequilibrium analysis program with a comprehensive graphical user interface providing novel views of patterns of linkage disequilibrium between all types of multiallelic and biallelic markers.</p>
            </sec>
            <sec>
               <st>
                  <p>Availability</p>
               </st>
               <p>Available from <url>http://www.genes.org.uk/software/midas</url> and <url>http://www.sgel.humgen.soton.ac.uk/midas</url></p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Gametic disequilibrium (widely known as linkage disequilibrium or LD) is a genetic phenomenon which occurs when alleles at different loci are non-randomly associated in a given population. This correlation between polymorphisms is caused and/or influenced by their shared history of mutation and recombination, and by many other factors including genetic drift, population growth, admixture or migration, population structure, the ages of the polymorphisms, the physical distance separating them and the effects of selective pressure <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The characterization of LD is an important issue in both evolutionary and medical genetics, since it is informative in association mapping of trait or disease loci, and an indicator of the interaction between genes, the relative influence of different evolutionary forces in the generation/disruption of genetic variability, and the genetic history of populations <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>The theory of estimation of LD has been substantially developed in recent years. Relevant advances have been made in the knowledge of the properties of LD coefficients and LD statistical tests, which are used respectively to measure the magnitude and to estimate the significance of LD. LD is said to exist when the frequency of a haplotype observed in a population sample is significantly greater or lesser than the frequency expected from the product of the allele frequencies, the magnitude of LD correlating with such difference. There are a variety of measures and statistical tests available for the estimation of LD (D', &#961;, r, r<sup>2</sup>, d, d<sup>2</sup>, and chi-square and Fisher exact tests, being the most used LD coefficients and statistical tests), and many programs exist for that purpose (including Haploview, 2LD, Arlequin, GDA, DNAsp, ALLASS, DISEQ, DMAP, etc., reviewed in <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>). Some software, such as GOLD <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, GOLDsurfer <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and Haploview <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, also include graphical displays enabling quick overviews of large regions. However, most packages are intended for use with single-nucleotide polymorphism (SNP) data in a pairwise fashion. This focus on biallelic markers makes both LD estimation and graphical representation straightforward compared with multiallelic markers such as microsatellites.</p>
         <p>The analysis of LD between a pair of multiallelic loci represents a conceptual difference in relation to the analysis of LD between a pair of biallelic loci. In both instances, LD can be analysed at two different levels. One is the overall LD between the pair of loci, and the other is the interallelic LD between each of the alleles at the first locus and each of the alleles at the second one.</p>
         <p>The magnitude and the significance of both overall and interallelic LD are the same for pairwise analyses involving two biallelic loci. This does not apply, however, for LD between multiallelic loci. Given a pair of multiallelic loci with <it>k </it>and <it>l </it>alleles respectively, there are <it>k </it>&#215; <it>l </it>possible interallelic associations. In theory, pairwise combinations of alleles at different loci can differ in parameters such as magnitude, significance and patterns of LD. This has been confirmed empirically in the characterization of interallelic LD between pairs of dinucleotide repeat loci spanning human chromosome 11p15 <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, and in the analysis of LD between the <it>TH01 </it>microsatellite and <it>IGF2 </it>SNP haplotypes in the context of the identification of microsatellite loci tagging haplotypes relevant to association mapping of complex disease traits <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. The analysis of interallelic associations is therefore necessary for a complete description of LD between multiallelic loci <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>Despite the existence of alternative estimation theory <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, LD between multiallelic loci is often estimated by pooling alleles into two groups in order to reduce the system to a two-allele two-locus model. This approach does not allow the analysis of all possible interallelic associations. In contrast, it reduces the LD between multiallelic loci to a single estimate of overall LD. It has been shown that the overall measure obtained by pooling alleles of multiallelic loci tends to underestimate LD, may complicate discrimination among the evolutionary forces generating LD in populations, and may decrease the success of association mapping of trait or disease loci (<abbrgrp><abbr bid="B12">12</abbr></abbrgrp> and references therein). In addition to the number of alleles, the magnitude and the power to detect LD depend on other factors, including the sample size, the statistical tests and coefficients used, the allele frequencies and the sign of the association <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. This latter issue has been shown to be of special importance. A sign-based LD estimation method recently developed for multiallelic systems <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, has been shown to considerably increase both the statistical power and the accuracy of estimation of the intensity of LD <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B13">13</abbr></abbrgrp>. On the other hand, the task of presenting a graphical overview of interallelic disequilibrium between alleles of multiallelic markers is rather more challenging than for biallelic markers (with colour intensity indicating the magnitude of linkage disequilibrium between a pair of markers) and has not been previously attempted.</p>
         <p>In this work, we have developed an integrated LD analysis software (MIDAS: Multiallelic Interallelic Disequilibrium Analysis Software) that computes interallelic LD from genotypic data incorporating the latest advances in the theory of estimation of LD, and represents graphically the intensity and significance of pairwise non-random associations between any combination of microsatellites, SNPs, haplotypes or other multi-allelic markers.</p>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <p>MIDAS was written in the Python programming language v2.4 <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, using the Tkinter module for generating a graphical user interface (GUI). The Tkinter "Canvas" widget was used for plotting of graphical data, whilst other Tkinter widgets were used for creation of menus, buttons and other aspects of the interface. All modules used were part of the standard Python distribution <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> and include: Tkinter, tkFileDialog, math, copy, cPickle, os and webbrowser. The program reads and writes standard tab-delimited text files, and has an additional option to save a binary analysis file (using the cPickle module) which stores all variables and allows the user to reload a previous analysis.</p>
         <p>Figure <figr fid="F1">1</figr> shows a flow-chart representing the program structure. Data (raw genotypes, with marker IDs and positions) are imported, and then the user selects analysis (this is a separate step to enable incorporation of different analyses in future versions). Analysis begins with an assessment of Hardy-Weinberg equilibrium (HWE) as previously described <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Markers out of HWE are flagged for highlighting in the final outputs. The next step is estimation of LD. Finally, the results of the analysis are plotted (figure <figr fid="F2">2</figr>).</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Flow-chart of MIDAS from the users perspective</p>
            </caption>
            <text>
               <p>Flow-chart of MIDAS from the users perspective. Rectangles indicate user inputs, ovals indicate program functions.</p>
            </text>
            <graphic file="1471-2105-7-227-1"/>
         </fig>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Screenshots of MIDAS</p>
            </caption>
            <text>
               <p>Screenshots of MIDAS. (a) A region of chromosome 11 showing 30 markers. Green lines indicate relative position of markers. Yellow intensity indicates distance between pairwise markers. Placing the mouse over a feature provides details. (b) A pairwise plot for two microsatellites (zoomed in). Significant results are boxed in red (D' &#8805; 0) or blue (D'&lt;0). Placing the mouse cursor over an allele pair provides details and statistics and also plots that pair at the bottom right of the screen. Magnitude of |D'| is also plotted (middle right). (c) A SNP/microsatellite pair (zoomed in). This is identical to the microsatellite/microsatellite plot, but with only two alleles in one dimension. (d) A SNP/SNP pair (zoomed in). The plot is oriented to place the most frequent alleles for both SNPs in the top left. Statistics can be observed by placing the mouse over an allele pair. For SNPs a magnified plot is not shown, but the |D'| graph is still used (middle right).</p>
            </text>
            <graphic file="1471-2105-7-227-2"/>
         </fig>
         <p>The program has been designed for simple installation and use by any computer user, and requires only the prior installation of the standard Python distribution <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> to function on a Microsoft<sup>&#174; </sup>Windows<sup>&#174; </sup>2000/XP computer. Operation is mouse and menu-driven with optional hotkeys for scroll and zoom. Input files can be prepared in most spreadsheet programs and exported as tab-text. Results output is tab-text format and can be imported into most spreadsheet programs.</p>
         <p>All parts of the program were scripted <it>de novo</it>, but the algorithm for LD calculation was based on previous programs developed by two of the authors (CZ and SR).</p>
         <sec>
            <st>
               <p>Estimation of LD</p>
            </st>
            <p>Given two multiallelic loci, <it>A </it>and <it>B</it>, we estimated the LD for each pair of alleles defining a two-locus haplotype. The accurate computation of all possible interallelic associations requires that each of the two-locus haplotypes defining an interallelic combination represents only the observed and expected counts for the pair of alleles under consideration. This is not attained when alleles (and therefore haplotype counts) are pooled arbitrarily. MIDAS computes interallelic disequilibrium between multiallelic loci following an approach previously described <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> which avoids both losing and pooling interallelic information. Both this approach and its underlying theory have been applied and discussed in detail previously (<abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and references therein). In brief, if locus <it>A </it>has <it>k </it>alleles <it>A</it><sub><it>i </it></sub>(<it>i </it>= 1,......, <it>k</it>) and locus <it>B </it>has <it>l </it>alleles <it>B</it><sub><it>j </it></sub>(<it>j </it>= 1,......, <it>l</it>), then the complete array of possible two-locus haplotypes was partitioned into <it>k </it>&#215; <it>l </it>separate 2 &#215; 2 contingency tables by collapsing the data into <it>A</it><sub><it>i </it></sub>vs. not-<it>A</it><sub><it>i </it></sub>(<m:math name="1471-2105-7-227-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>A</m:mi><m:mover accent="true"><m:mi>i</m:mi><m:mo>&#175;</m:mo></m:mover></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGbbqqdaWgaaWcbaGafmyAaKMbaebaaeqaaaaa@2F56@</m:annotation></m:semantics></m:math>) at the <it>A </it>locus, and <it>B</it><sub><it>j </it></sub>vs. not-<it>B</it><sub><it>j </it></sub>(<m:math name="1471-2105-7-227-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>B</m:mi><m:mover accent="true"><m:mi>j</m:mi><m:mo>&#175;</m:mo></m:mover></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGcbGqdaWgaaWcbaGafmOAaOMbaebaaeqaaaaa@2F5A@</m:annotation></m:semantics></m:math>) at the <it>B </it>locus. Estimates of two-locus haplotype frequencies were obtained from genotype data by the Hill method, an expectation-maximisation (EM) algorithm <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. The magnitude of disequilibrium between pairs of alleles at different loci was measured by <m:math name="1471-2105-7-227-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:msup><m:mi>D</m:mi><m:mo>&#8242;</m:mo></m:msup><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGebargaqbamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaaaa@30AD@</m:annotation></m:semantics></m:math> = <it>D</it><sub><it>ij</it></sub>/<it>D</it><sub><it>max</it></sub>, where <it>D</it><sub><it>ij </it></sub>= <it>X</it><sub><it>ij </it></sub>- <it>p</it><sub><it>i</it></sub><it>q</it><sub><it>j</it></sub>, <it>p</it><sub><it>i </it></sub>and <it>q</it><sub><it>j </it></sub>are the frequencies of alleles <it>i </it>and <it>j</it>, respectively, <it>X</it><sub><it>ij </it></sub>is the observed frequency of the haplotype <it>A</it><sub><it>i</it></sub><it>B</it><sub><it>j </it></sub>and <it>D</it><sub><it>max </it></sub>= min [<it>p</it><sub><it>i</it></sub>(1 - <it>q</it><sub><it>j</it></sub>),(1 - <it>p</it><sub><it>i</it></sub>)<it>q</it><sub><it>j</it></sub>] when <it>D</it><sub><it>ij </it></sub>> 0 or <it>D</it><sub><it>max </it></sub>= min [<it>p</it><sub><it>i</it></sub><it>q</it><sub><it>j</it></sub>,(1 - <it>p</it><sub><it>i</it></sub>)(1 - <it>q</it><sub><it>j</it></sub>)] when <it>D</it><sub><it>ij </it></sub>&lt; 0 <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Significance test of the null hypothesis of random association between pairs of alleles at the two loci (<it>D</it><sub><it>ij </it></sub>= 0) was tested by <m:math name="1471-2105-7-227-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>X</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mn>2</m:mn></m:msubsup><m:mo>=</m:mo><m:mi>n</m:mi><m:msubsup><m:mi>D</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mn>2</m:mn></m:msubsup><m:mo>/</m:mo><m:msub><m:mi>p</m:mi><m:mi>i</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mn>1</m:mn><m:mo>&#8722;</m:mo><m:msub><m:mi>p</m:mi><m:mi>i</m:mi></m:msub><m:mo stretchy="false">)</m:mo><m:msub><m:mi>q</m:mi><m:mi>j</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mn>1</m:mn><m:mo>&#8722;</m:mo><m:msub><m:mi>q</m:mi><m:mi>j</m:mi></m:msub><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGybawdaqhaaWcbaGaemyAaKMaemOAaOgabaGaeGOmaidaaOGaeyypa0JaemOBa4Maemiraq0aa0baaSqaaiabdMgaPjabdQgaQbqaaiabikdaYaaakiabc+caViabdchaWnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaeGymaeJaeyOeI0IaemiCaa3aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkcqWGXbqCdaWgaaWcbaGaemOAaOgabeaakiabcIcaOiabigdaXiabgkHiTiabdghaXnaaBaaaleaacqWGQbGAaeqaaOGaeiykaKcaaa@4D17@</m:annotation></m:semantics></m:math>, which approximates a &#967;<sup>2 </sup>distribution with one degree of freedom, where <it>n </it>is the number of individuals sampled <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. Yates's correction was also computed.</p>
            <p>Estimation of the magnitude and significance of pairwise LD involving biallelic loci was performed in the same way, but considering that <it>p</it><sub><it>i </it></sub>and <it>q</it><sub><it>j </it></sub>are the frequencies of the commonest alleles for each biallelic locus. This establishes a homogeneous criterion for the construction of 2 &#215; 2 contingency tables, (i.e., consideration of haplotype <it>A</it><sub><it>i</it></sub><it>B</it><sub><it>j </it></sub>as the one constituted by the two more frequent alleles). This criterion was uniformly followed for the estimation of the observed haplotype frequency and for computation of pairwise LD magnitude and significance in all SNP/SNP analyses. This criterion is consistent with respect to the sign-based LD estimation method recently developed for multiallelic systems <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B13">13</abbr></abbrgrp> and establishes consistency of some biological basis. By placing the most frequent allele of each of a pair of SNPs in the top left of the 2 &#215; 2 table then if the minor alleles coincide on some haplotype, the display shows a 'main diagonal' excess (D' positive) whereas if minor allele at locus A predominate with major allele of locus B (and vice versa) the display shows a minor diagonal pattern (D' negative). When |D'| = 1, the haplotype patterns depicted (either three or two of the possible four) give information which is relevant to their possible history not fully evident from D' nor from r<sup>2 </sup>nor any other coefficient (see figure <figr fid="F5">5</figr>).</p>
            <p>For pairwise analyses involving two multiallelic loci, haplotype <it>A</it><sub><it>i</it></sub><it>B</it><sub><it>j </it></sub>was considered to comprise the two alleles of interest. This is also consistent with the sign-based LD estimation method <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B13">13</abbr></abbrgrp> in most situations, except in rare circumstances when haplotype <it>A</it><sub><it>i</it></sub><it>B</it><sub><it>j </it></sub>is constituted by one allele with frequency higher than 0.5 and another allele with frequency lower than 0.5. For pairwise analyses involving one multiallelic locus and one biallelic locus, both LD estimation and representation were performed twice for each microsatellite allele of interest: <it>A</it><sub><it>i</it></sub><it>B</it><sub><it>j </it></sub>was considered to comprise the microsatellite allele of interest and the commonest allele at the biallelic locus in one analysis, and the microsatellite allele of interest and the rarest allele at the biallelic locus in the other.</p>
            <p>For users that wish to perform the analysis of multiallelic markers by dichotomising the marker into most common allele versus all other alleles combined we provide data in the output file to indicate that analysis for each combination of markers. This is provided in rows where there is a "Y" for "MostFreq1" (first marker) and "Y" for "MostFreq2" (second marker). For users who wish to collapse multiallelic markers to biallelic markers in other ways the software will accept that data in the form of input files with multiallelic markers recoded as if they were SNPs. However, it should be noted that no dichotomization represents the actual overall LD between two multiallelic loci, but only one of the possible interallelic associations.</p>
            <p>The MIDAS output file provides D', r<sup>2</sup>, expected and estimated haplotype frequencies, allele frequencies, &#967;<sup>2 </sup>and distance between markers.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>An example dataset is shown in figure <figr fid="F2">2</figr>, comprising a set of microsatellites and SNPs from the 11p chromosome region <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and Zapata <it>et al </it>(in prep) (subset of 50 samples). Figure <figr fid="F2">2a</figr> shows the typical unzoomed view, with pairs of markers in a grid of 1 to n columns and 1 to n rows. The plot begins at top left with 1 versus 2 and continues to n-1 versus n at bottom right. The distance between markers is represented by the intensity of background colour (closer = darker yellow). The image can be zoomed and scrolled, and positioning the mouse over a plotted result gives the statistics, a magnified plot and a plot of |D'| (as shown in the right-hand panel of figure <figr fid="F2">2b&#8211;d</figr>).</p>
         <p>For pairwise SNP analyses the LD is represented as in figure <figr fid="F2">2d</figr>. The vertical and horizontal lines split the black square into four rectangles, the areas of which represent the expected haplotype frequencies for each allele combination (upper left is <it>A</it><sub><it>i</it></sub><it>B</it><sub><it>j </it></sub>frequency = <it>A</it><sub><it>i </it></sub>&#215; <it>B</it><sub><it>j</it></sub>) (figure <figr fid="F3">3</figr>). Each quadrant then has a coloured rectangle to represent the "observed" (i.e. estimated using EM) haplotype frequency. The dimensions of the rectangle are in proportion to the two allele frequencies it represents, and its colour intensity represents the significance (by &#967;<sup>2</sup>) of LD or the magnitude of D' (user option on the View menu, figures in this paper show use of the significance option). Blue rectangles represent a less frequent haplotype than expected (D' &lt; 0), and red a more frequent haplotype than expected (D' > 0). Alleles are re-ordered to ensure that the most common alleles for each marker are represented by the top-left quadrant.</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Representation of haplotype frequencies and LD in MIDAS</p>
            </caption>
            <text>
               <p>Representation of haplotype frequencies and LD in MIDAS. For SNPs the expected (under no LD) haplotype frequencies for each allele combination are plotted with an unfilled, black rectangle divided into four quadrants by two lines. The estimated haplotype frequencies are represented by solid red or blue rectangles. Where a coloured rectangle exceeds the size of the black rectangle it is coloured red, indicating an excess of that haplotype (D' &#8805; 0). The opposite situation is indicated by a blue rectangle (D'&lt;0). For multi-allelic markers the principle is the same, but a separate plot is shown for each combination of alleles, i.e. locus 1 allele i/allele not-i, locus 2 allele j/allele not-j.</p>
            </text>
            <graphic file="1471-2105-7-227-3"/>
         </fig>
         <p>For multiallelic versus biallelic or multiallelic the plot is slightly different (figure <figr fid="F2">2c</figr>). For each marker combination there are multiple pairs of vertical and horizontal lines (matching the upper left quadrant of the biallelic display). Each pair represents one allele combination, with the black rectangle indicating the expected haplotype frequency and the coloured rectangle the "observed" (i.e. estimated using EM) haplotype frequency (figure <figr fid="F3">3</figr>). The colour scheme is the same as for biallelic markers.</p>
         <p>The markers are arrayed with marker 1 versus marker 2 at top left and marker n-1 versus marker n at bottom right, forming a right-angle triangle of plots (figure <figr fid="F2">2a</figr>). To the bottom left of the display is a line parallel to the long side of this triangle representing a map of the genomic region in which the markers are situated. Each marker is represented by a green line from their relative position on this map to the row and column in which they are plotted (figure <figr fid="F2">2a</figr>). Placing the mouse over this line (or the circle at its end) gives marker name and position.</p>
         <p>A typical session involves preparation of an input file of genotypes (figure <figr fid="F4">4</figr>) using any mainstream spreadsheet program and exporting as tab-text format. MIDAS is then run by double-clicking the script file. The window shows basic instructions &#8211; briefly: (1) "Open genotype file" from "Open" on the "File" menu, then (2) Select "Analysis" &#8211; "LD and haplotypes". Zoom can be operated by mouse-click, key-stroke ("i" and "o") or menu, while scrolling can be operated by cursor key or scroll-bar. At minimum zoom ("View" &#8211; "fit to screen" eg figure <figr fid="F2">2a</figr>) the user can rapidly spot statistically significant results and patterns. Placing the mouse cursor over a feature displays its statistics and detail. The high levels of zoom shown in figures <figr fid="F2">2b,2c</figr> and <figr fid="F2">2d</figr> enable the user to analyse the LD in more detail, and read the statistics by using mouse-over. Export functions include a tab-text file of all results (which can be opened by any mainstream spreadsheet program) and a binary file format that stores all analysis variables and can be used to store the whole analysis for future use. The latter format speeds up viewing of previous analyses with regions containing many markers. Finally a postscript export option is available to save graphical view, although standard screen captures (using Alt-PrintScreen in Microsoft<sup>&#174; </sup>Windows<sup>&#174;</sup>) are adequate for most uses.</p>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>The format of a MIDAS input file</p>
            </caption>
            <text>
               <p>The format of a MIDAS input file. Data are raw genotypes in a tab-delimited text file. Row 1 contains marker names, row 2 contains positions. Markers should be sorted in position order for clarity. Alleles should be delimited by an underscore ("_"), and can be any valid letter or number. Where numbers are used, ensure that the same number of digits are used for all alleles (eg 094, 098, 102) to preserve size order in the alphanumeric sort. There must be no more than one blank line at the end of the data and all null values must be coded as "?_?".</p>
            </text>
            <graphic file="1471-2105-7-227-4"/>
         </fig>
         <p>Menu options include file operations, analysis, options to customise the view and a help option, which provides simple information and instructions for usage.</p>
         <sec>
            <st>
               <p>Application of MIDAS</p>
            </st>
            <sec>
               <st>
                  <p>Evolutionary relationships between SNPs</p>
               </st>
               <p>The SNP/SNP plots (which can be seen at low levels of zoom) provide a quick way of inferring evolutionary relationships between markers. Figure <figr fid="F5">5</figr> shows how three different type of plot provide this type of information.</p>
            </sec>
            <sec>
               <st>
                  <p>LD between a multiallelic microsatellite and several SNPs</p>
               </st>
               <p>We have previously described the association between allele groups of a highly polymorphic microsatellite in the Growth Hormone/chorionic somatomammotrophin (<it>GH</it>/<it>CSH</it>) gene region on chromosome 17 (<it>CSH1</it>.01) and phenotypes of the metabolic syndrome <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. For these analyses we dichotomised the microsatellite on the basis of size and distribution <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Figure <figr fid="F6">6</figr> shows the interallelic LD between SNPs in the <it>GH</it>/<it>CSH </it>gene region and the <it>CSH1</it>.01 microsatellite. These analyses confirm the validity of our dichotomisation, indicating two major clades of alleles within the microsatellite. However, in most cases the analysis of all alleles of a multiallelic marker rather than a dichotomisation of alleles provides the maximum information, with no necessity to make biological assumptions.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Use of MIDAS SNP/SNP plots to infer evolutionary history</p>
                  </caption>
                  <text>
                     <p>Use of MIDAS SNP/SNP plots to infer evolutionary history. The haplotype on which a SNP first arose is indicated by the estimated frequency of the haplotype carrying the most frequent alleles at both loci. (i) If this is less than expected, it implies that the SNP 2 arose on the haplotype carrying the common allele at SNP 1 (i.e. D'&lt;0). (ii) If it is more common than expected then SNP 2 arose on the haplotype carrying the rare allele at SNP 1 (i.e. D' &#8805; 0). (iii) If only two haplotypes are observed then perfect LD exists (r<sup>2 </sup>= 1). This may arise through bottlenecks, selection or simultaneous occurrence.</p>
                  </text>
                  <graphic file="1471-2105-7-227-5"/>
               </fig>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>LD between a complex microsatellite and SNPs</p>
                  </caption>
                  <text>
                     <p>LD between a complex microsatellite and SNPs. (a) Previous work [22] indicated SNP alleles in LD with two size ranges of the <it>CSH1</it>.01 microsatellite. The lower size range has dinucleotide spacing, the upper has tetranucleotide spacing. This suggested two major lineages. (b) Plotting interallelic LD between a SNP (GH1V004) and the <it>CSH1</it>.01 microsatellite demonstrates clear LD with the two lineages. The common SNP alleles associate with the lower size range and the rare SNP alleles associate with the upper size range. Results are boxed in red where the haplotype frequency is significantly higher than expected (D' &#8805; 0) and blue where it is significantly lower (D'&lt;0). (c) SNP haplotypes (four SNPs, including GH1V004) confirm these findings and demonstrate the ability of MIDAS to handle haplotype data as a multi-allelic marker.</p>
                  </text>
                  <graphic file="1471-2105-7-227-6"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>LD between input haplotypes and other markers</p>
               </st>
               <p>Figure <figr fid="F6">6</figr> also demonstrates the potential to input haplotype data and analyse it as a multi-allelic marker. In this case a 4-SNP haplotype is analysed for LD with a multi-allelic microsatellite (figure <figr fid="F6">6f</figr>). This approach can provide an overview of how biallelic or multiallelic markers interact with haplotypes, and also how haplotypes in two different regions interact with each other.</p>
            </sec>
            <sec>
               <st>
                  <p>LD between two multiallelic tandem repeat loci</p>
               </st>
               <p>Our work in the <it>IGF2</it>-<it>INS</it>-<it>TH </it>region of chromosome 11 has identified associations between SNPs, haplotypes and other markers and obesity <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. Two multi-allelic markers within this region (included in our haplotype analyses) are the insulin gene VNTR (<it>INS </it>VNTR) and the tyrosine hydroxylase tetranucleotide microsatellite (<it>TH</it>01). Figure <figr fid="F7">7</figr> shows the interallelic linkage disequilibrium between these two markers, with the patterns indicating which alleles are associated and also suggesting that the VNTR mutates more rapidly than the microsatellite (one <it>TH</it>01 allele associates with multiple <it>INS </it>VNTR alleles).</p>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p>LD between the <it>INS </it>VNTR and the <it>TH</it>01 microsatellite</p>
                  </caption>
                  <text>
                     <p>LD between the <it>INS </it>VNTR and the <it>TH</it>01 microsatellite. Each <it>TH</it>01 allele associates with a size range of VNTR alleles (256 and 263 associate with the class III alleles). This infers a greater rate of mutation in the VNTR because there is a wider range of allele sizes in the VNTR dimension significantly associated with <it>TH</it>01 alleles than <it>vice versa</it>. Close-ups of individual allele plots are shown to indicate the magnitude of effect &#8211; black rectangle indicates expected haplotype frequency under no LD, coloured rectangle indicates the estimated haplotype frequency.</p>
                  </text>
                  <graphic file="1471-2105-7-227-7"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Regions of "perfect" LD</p>
               </st>
               <p>Recent work using data from HapMap <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp> and Celera has indicated the presence of extended regions of perfect LD (where only two major haplotypes exist) <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Figure <figr fid="F8">8a</figr> shows the characteristic MIDAS pattern for this type of region, with only two haplotypes existing for each pair of SNPs (data from HapMap <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>). Figure <figr fid="F8">8b</figr> and <figr fid="F8">8c</figr> demonstrates an alternative approach which we have developed (SNPFrequencyViewer) to rapidly scan for these regions, in which extended regions of isofrequent SNPs correlate with regions of perfect LD (data from HapMap <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>). These can then be confirmed and examined in more detail using MIDAS. Admixture of two populations highly differentiated in these genomic regions is one possible explanation, selection another.</p>
               <fig id="F8">
                  <title>
                     <p>Figure 8</p>
                  </title>
                  <caption>
                     <p>Visualisation of regions of perfect LD</p>
                  </caption>
                  <text>
                     <p>Visualisation of regions of perfect LD. Marker pairs can have either two or three haplotypes present when D' = 1. Most programs do not distinguish between these graphically, despite the potential biological importance. (a) The <it>BRCA1 </it>region on chromosome 17. MIDAS shows only two haplotypes for many SNPs (perfect LD, r<sup>2 </sup>= 1) using HapMap data [29,30]. (b) Allele frequencies from HapMap data [29,30] show that SNPs in regions with only two haplotypes share the same minor allele frequency (MAF) for many SNPs (eg <it>BRCA1 </it>region on chromosome 17) compared to (c) nearby regions which have a mixture of MAFs. Viewing MAF may therefore be a quick way to find regions of perfect LD, which can then be checked with MIDAS.</p>
                  </text>
                  <graphic file="1471-2105-7-227-8"/>
               </fig>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>MIDAS is a new program that presents the novel approach of analysing and graphically representing the interallelic linkage disequilibrium (LD) between multiple pairs of bi- and multiallelic markers. The graphical representation of LD incorporates information on expected haplotype frequency (under no LD), estimated haplotype frequency and D' or significance. Distance information and statistics are also presented in the interface. This enables rapid visual interpretation and inference of evolutionary and functional relationships between SNPs and microsatellites across large genomic regions. Applications to data-sets we have analysed previously demonstrate the effectiveness of viewing patterns in the data graphically rather than numerically.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>&#8226; Project name: MIDAS: Multiallelic Interallelic Disequilibrium Analysis Software.</p>
         <p>&#8226; Project home page: <url>http://www.genes.org.uk/software/midas</url></p>
         <p>&#8226; Operating system(s): Microsoft<sup>&#174; </sup>Windows<sup>&#174; </sup>2000/XP</p>
         <p>&#8226; Programming language: Python 2.4/Tkinter</p>
         <p>&#8226; Other requirements: Python 2.4 or later <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> must be installed before MIDAS</p>
         <p>&#8226; License: MIDAS licence supplied with program</p>
         <p>&#8226; Any restrictions to use by non-academics: royalty-free use allowed within terms of licence</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>MIDAS was written in Python 2.4 with a Tkinter graphical interface by TRG with design suggestions and testing by all authors. Algorithms for estimating haplotype frequencies from Hill method and for LD were adapted from a BASIC code program developed by CZ (who also suggested some statistical improvements). The manuscript was drafted by TRG and SR with inputs from all authors.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>TRG is funded by a BHF (British Heart Foundation) Intermediate Fellowship (FS/05/065/19497), SR by a HOPE (Wessex Medical Trust) fellowship and work in our laboratory by the Medical Research Council (UK) (Programme Grant G9800748).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Patterns of linkage disequilibrium in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Ardlie</snm>
                  <fnm>KG</fnm>
               </au>
               <au>
                  <snm>Kruglyak</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Seielstad</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>299</fpage>
            <lpage>309</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg777</pubid>
                  <pubid idtype="pmpid" link="fulltext">11967554</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Spectrum of nonrandom associations between microsatellite loci on human chromosome 11p15</p>
            </title>
            <aug>
               <au>
                  <snm>Zapata</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rodr&#237;guez</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Visedo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sacrist&#225;n</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2001</pubdate>
            <volume>158</volume>
            <fpage>1235</fpage>
            <lpage>1251</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11454771</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Linkage disequilibrium and the search for complex disease genes</p>
            </title>
            <aug>
               <au>
                  <snm>Jorde</snm>
                  <fnm>LB</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1435</fpage>
            <lpage>1444</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.144500</pubid>
                  <pubid idtype="pmpid" link="fulltext">11042143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Linkage disequilibrium for different scales and applications</p>
            </title>
            <aug>
               <au>
                  <snm>Mueller</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Brief Bioinform</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>355</fpage>
            <lpage>364</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bib/5.4.355</pubid>
                  <pubid idtype="pmpid" link="fulltext">15606972</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>GOLD &#8211; graphical overview of linkage disequilibrium</p>
            </title>
            <aug>
               <au>
                  <snm>Abecasis</snm>
                  <fnm>GR</fnm>
               </au>
               <au>
                  <snm>Cookson</snm>
                  <fnm>WO</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>182</fpage>
            <lpage>183</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.2.182</pubid>
                  <pubid idtype="pmpid" link="fulltext">10842743</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>GOLDsurfer: three dimensional display of linkage disequilibrium</p>
            </title>
            <aug>
               <au>
                  <snm>Pettersson</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Jonsson</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Cardon</snm>
                  <fnm>LR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>3241</fpage>
            <lpage>3243</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth341</pubid>
                  <pubid idtype="pmpid" link="fulltext">15201180</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Haploview: analysis and visualization of LD and haplotype maps</p>
            </title>
            <aug>
               <au>
                  <snm>Barrett</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Fry</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Maller</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Daly</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>263</fpage>
            <lpage>265</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth457</pubid>
                  <pubid idtype="pmpid" link="fulltext">15297300</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Haplotypic analyses of the IGF2-INS-TH gene cluster in relation to cardiovascular risk traits</p>
            </title>
            <aug>
               <au>
                  <snm>Rodr&#237;guez</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gaunt</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>O'Dell</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hawe</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Humphries</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>IN</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <fpage>715</fpage>
            <lpage>725</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/ddh070</pubid>
                  <pubid idtype="pmpid" link="fulltext">14749349</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Replication of IGF2-INS-TH*5 haplotype effect on obesity in older men and study of related phenotypes</p>
            </title>
            <aug>
               <au>
                  <snm>Rodr&#237;guez</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gaunt</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Dennison</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Syddall</snm>
                  <fnm>HE</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>DI</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>IN</fnm>
               </au>
            </aug>
            <source>Eur J Hum Genet</source>
            <pubdate>2006</pubdate>
            <volume>14</volume>
            <fpage>109</fpage>
            <lpage>116</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16251897</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Testing hypothesis about linkage disequilibrium with multiple alleles [abstract]</p>
            </title>
            <aug>
               <au>
                  <snm>Weir</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Cockerham</snm>
                  <fnm>CC</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1978</pubdate>
            <volume>88</volume>
            <fpage>633</fpage>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Statistical methods for assessing linkage disequilibrium at the HLA-A, B, C loci</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Piazza</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Ann Hum Genet</source>
            <pubdate>1981</pubdate>
            <volume>45</volume>
            <fpage>79</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubid idtype="pmpid">6947712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Sampling variance and distribution of the D' measure of overall gametic disequilibrium between multiallelic loci</p>
            </title>
            <aug>
               <au>
                  <snm>Zapata</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Carollo</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rodriguez</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Ann Hum Genet</source>
            <pubdate>2001</pubdate>
            <volume>65</volume>
            <fpage>395</fpage>
            <lpage>406</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1469-1809.2001.6540395.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">11592929</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Distribution of nonrandom associations between pairs of protein loci along the third chromosome of Drosophila melanogaster</p>
            </title>
            <aug>
               <au>
                  <snm>Zapata</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Nunez</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Velasco</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2002</pubdate>
            <volume>161</volume>
            <fpage>1539</fpage>
            <lpage>1550</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12196399</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The Python Programming Language</p>
            </title>
            <url>http://www.python.org</url>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Estimates of inbreeding in a natural population: a comparison of sampling properties</p>
            </title>
            <aug>
               <au>
                  <snm>Curie-Cohen</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1982</pubdate>
            <volume>100</volume>
            <fpage>339</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7106561</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Deviations from Hardy-Weinberg proportions: sampling variances and use in estimation of inbreeding coefficients</p>
            </title>
            <aug>
               <au>
                  <snm>Robertson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>WG</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1984</pubdate>
            <volume>107</volume>
            <fpage>703</fpage>
            <lpage>718</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">6745643</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Estimation of linkage disequilibrium in randomly mating populations</p>
            </title>
            <aug>
               <au>
                  <snm>Hill</snm>
                  <fnm>WG</fnm>
               </au>
            </aug>
            <source>Heredity</source>
            <pubdate>1974</pubdate>
            <volume>33</volume>
            <fpage>229</fpage>
            <lpage>239</lpage>
            <xrefbib>
               <pubid idtype="pmpid">4531429</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The interaction of selection and linkage. I. General considerations; heterotic models</p>
            </title>
            <aug>
               <au>
                  <snm>Lewontin</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1964</pubdate>
            <volume>49</volume>
            <fpage>49</fpage>
            <lpage>67</lpage>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Gametic disequilibrium measures: proceed with caution</p>
            </title>
            <aug>
               <au>
                  <snm>Hedrick</snm>
                  <fnm>PW</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1987</pubdate>
            <volume>117</volume>
            <fpage>331</fpage>
            <lpage>341</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3666445</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Inferences about linkage disequilibrium</p>
            </title>
            <aug>
               <au>
                  <snm>Weir</snm>
                  <fnm>BS</fnm>
               </au>
            </aug>
            <source>Biometrics</source>
            <pubdate>1979</pubdate>
            <volume>35</volume>
            <fpage>235</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubid idtype="pmpid">497335</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Late life metabolic syndrome, early growth, and common polymorphism in the growth hormone and placental lactogen gene cluster</p>
            </title>
            <aug>
               <au>
                  <snm>Day</snm>
                  <fnm>IN</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Gaunt</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Voropanov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rodriguez</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Syddall</snm>
                  <fnm>HE</fnm>
               </au>
               <au>
                  <snm>Sayer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Dennison</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Tabassum</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>DI</fnm>
               </au>
            </aug>
            <source>J Clin Endocrinol Metab</source>
            <pubdate>2004</pubdate>
            <volume>89</volume>
            <fpage>5569</fpage>
            <lpage>5576</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1210/jc.2004-0152</pubid>
                  <pubid idtype="pmpid" link="fulltext">15531513</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Apal polymorphism in insulin-like growth factor II (IGF2) gene and weight in middle-aged males</p>
            </title>
            <aug>
               <au>
                  <snm>O'Dell</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Hindmarsh</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Pringle</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Ford</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Humphries</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>IN</fnm>
               </au>
            </aug>
            <source>Int J Obes Relat Metab Disord</source>
            <pubdate>1997</pubdate>
            <volume>21</volume>
            <fpage>822</fpage>
            <lpage>825</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.ijo.0800483</pubid>
                  <pubid idtype="pmpid">9376897</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Associations of IGF2 ApaI RFLP and INS VNTR class I allele size with obesity</p>
            </title>
            <aug>
               <au>
                  <snm>O'Dell</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Bujac</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>IN</fnm>
               </au>
            </aug>
            <source>Eur J Hum Genet</source>
            <pubdate>1999</pubdate>
            <volume>7</volume>
            <fpage>821</fpage>
            <lpage>827</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.ejhg.5200381</pubid>
                  <pubid idtype="pmpid" link="fulltext">10573016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Evidence of multiple causal sites affecting weight in the IGF2-INS-TH region of human chromosome 11</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>O'Dell</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>IN</fnm>
               </au>
            </aug>
            <source>Hum Genet</source>
            <pubdate>2002</pubdate>
            <volume>110</volume>
            <fpage>173</fpage>
            <lpage>181</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00439-001-0663-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">11935324</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Positive associations between single nucleotide polymorphisms in the IGF2 gene region and body mass index in adult males</p>
            </title>
            <aug>
               <au>
                  <snm>Gaunt</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>IN</fnm>
               </au>
               <au>
                  <snm>O'Dell</snm>
                  <fnm>SD</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2001</pubdate>
            <volume>10</volume>
            <fpage>1491</fpage>
            <lpage>1501</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/10.14.1491</pubid>
                  <pubid idtype="pmpid" link="fulltext">11448941</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Replication of IGF2-INS-TH(*)5 haplotype effect on obesity in older men and study of related phenotypes</p>
            </title>
            <aug>
               <au>
                  <snm>Rodriguez</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gaunt</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Dennison</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Syddall</snm>
                  <fnm>HE</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>DI</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>IN</fnm>
               </au>
            </aug>
            <source>Eur J Hum Genet</source>
            <pubdate>2006</pubdate>
            <volume>14</volume>
            <fpage>109</fpage>
            <lpage>116</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16251897</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The International HapMap Project</p>
            </title>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>426</volume>
            <fpage>789</fpage>
            <lpage>796</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02168</pubid>
                  <pubid idtype="pmpid" link="fulltext">14685227</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A haplotype map of the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Altshuler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brooks</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Chakravarti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Daly</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Donnelly</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <fpage>1299</fpage>
            <lpage>1320</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04226</pubid>
                  <pubid idtype="pmpid" link="fulltext">16255080</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Genetically indistinguishable SNPs and their influence on inferring the location of disease-associated variants</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Ke</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Paolucci</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ragoussis</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Deloukas</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bentley</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Cardon</snm>
                  <fnm>LR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1503</fpage>
            <lpage>1510</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1310638</pubid>
                  <pubid idtype="pmpid" link="fulltext">16251460</pubid>
                  <pubid idtype="doi">10.1101/gr.4217605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Human genome-wide screen of haplotype-like blocks of reduced diversity</p>
            </title>
            <aug>
               <au>
                  <snm>Costas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Salas</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Carracedo</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2005</pubdate>
            <volume>349</volume>
            <fpage>219</fpage>
            <lpage>225</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2004.12.042</pubid>
                  <pubid idtype="pmpid" link="fulltext">15780967</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>SNPFrequencyViewer</p>
            </title>
            <url>http://www.genes.org.uk/software/snpfrequencyviewer</url>
         </bibl>
      </refgrp>
   </bm>
</art>
