LD2SNPing: linkage disequilibrium plotter and RFLP enzyme mining for tag SNPs1 Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung, Taiwan 2 Graduate Institute of Natural Products, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan 3 Center of Excellence for Environmental Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan 4 Department of Chemical Engineering, I-Shou University, Kaohsiung, Taiwan 5 Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, Taiwan 6 Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan
BMC Genetics 2009, 10:26doi:10.1186/1471-2156-10-26 The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2156/10/26
©
2009 Chang et al; licensee BioMed Central Ltd. AbstractBackgroundLinkage disequilibrium (LD) mapping is commonly used to evaluate markers for genome-wide association studies. Most types of LD software focus strictly on LD analysis and visualization, but lack supporting services for genotyping. ResultsWe developed a freeware called LD2SNPing, which provides a complete package of mining tools for genotyping and LD analysis environments. The software provides SNP ID- and gene-centric online retrievals for SNP information and tag SNP selection from dbSNP/NCBI and HapMap, respectively. Restriction fragment length polymorphism (RFLP) enzyme information for SNP genotype is available to all SNP IDs and tag SNPs. Single and multiple SNP inputs are possible in order to perform LD analysis by online retrieval from HapMap and NCBI. An LD statistics section provides D, D', r2, δQ, ρ, and the P values of the Hardy-Weinberg Equilibrium for each SNP marker, and Chi-square and likelihood-ratio tests for the pair-wise association of two SNPs in LD calculation. Finally, 2D and 3D plots, as well as plain-text output of the results, can be selected. ConclusionLD2SNPing thus provides a novel visualization environment for multiple SNP input, which facilitates SNP association studies. The software, user manual, and tutorial are freely available at http://bio.kuas.edu.tw/LD2NPing webcite. BackgroundSingle nucleotide polymorphisms (SNPs) are very important markers for disease [1] and cancer [2] association studies. The number of identified SNPs is currently estimated to be about 3.1 million [3]. Identification of associations by statistical analyses of SNP data is challenging due to the large number of SNPs involved. Linkage disequilibrium (LD) is one of the most commonly used methods when choosing informative SNPs that represent the original SNP distribution in a genome for genome-wide association studies. LD mappings are commonly used to evaluate markers across large data sets. Given the vast amount of data in association studies, visualization of the LD results in graphical form rather than text form facilitates the interpretation of the results considerably [4]. Many types of visualization software for LD have been developed, e.g. LDA [5], Haploview [6], and JLIN [7]. Although these tools have made valuable contributions to LD visualization and analysis, they lack many services and tools for users to generate genotype data for LD analysis. Without the actual data set itself, users are unable to perform LD analysis. However, many types of software exist which provide information for genotyping, e.g. the SNPlex genotyping system [8], SNP cutter [9], SNP-RFLPing [10], and V-MitoSNP [11]. These programs do not include an LD function though. It is thus still difficult for researchers to narrow down the number of SNPs for performing SNP genotyping. A common way of identifying tag SNPs of the genes of interest is to check the HapMap website http://www.hapmap.org webcite[12]. Currently available tools, however, are not well integrated, but rather are independent programs. We have thus integrated an SNP genotyping service and LD visualization/analysis tool in a single program to provide a single platform for tag SNP selection, SNP genotyping, and LD analysis. This platform, LD2SNPing, furthermore provides a novel function for multiple SNP inputs in order to directly plot the LD. The user can input SNPs of interest and calculate the LD measurement for SNP selection before the genotyping process. This stand-alone JAVA-based visualisation tool greatly facilitates preparation of the genotype data and increases the performance of LD analyses. ImplementationLD2SNPing is a Java-based software, which is implemented under the Java Runtime Environment (JRE) and Java 3D. The LD statistics program calculates D, D', r2, δQ, and ρ values, as well as the P value of Hardy-Weinberg Equilibrium (HWE-P) calculations for each SNP marker. LD2SNPing provides the P value of the Chi-square test and P value of the likelihood-ratio test for the pair-wise association of two SNPs are also provided in the LD calculation. LD2SNPing processes genotype data and estimates pair-wise loci haplotype frequencies of the sample using an expectation-maximization algorithm (EM) [13]. Except the exact tests of HWE [14] is implemented in LD2SNPing, the equations used in these calculations are listed in the appendix of the user manual as described by LDA [5]. In visualization of LD plot, the LD2SNPing software provides SNPs with a minor allele frequency (MAF) value greater than 0.01. All the MAF and HWE-P values for these SNPs are provided in the text window. The SNP genotype information and the tag SNPs are retrieved online from dbSNP version BUILD 129 of NCBI [15]http://www.ncbi.nlm.nih.gov/SNP/ webcite[16] and HapMap http://www.hapmap.org webcite version HapMap Data Rel 23a/phaseII Mar08, on the NCBI B36 assembly, dbSNP b126 [12], respectively. Online retrieval for SNP genotype information from NCBI using SNP ID and gene input is similar to the function described in the SNP-Flankplus [17] and SNP ID-info [18]. The default setting for the minor allele frequency (MAF) cut-off in tag SNP from HapMap is 0.2. Four populations, CEU, CHB, JPT, and YRI (Caucasian, Han-Chinese, Japanese and Sub-Saharan African, respectively) are selectable during tag SNP retrieval from HapMap. The retrieved data are the most up-to-date data available. The RFLP database structure is based on REBASE http://www.rebase.org webcite[19] version 610. The RFLP mining function for the selected SNP is provided by the SNP-RFLPing [10], which is integrated in the LD2SNPing. A demonstration and user manual of the LD2SNPing software are available as a free download from http://bio.kuas.edu.tw/LD2SNPing webcite. Many animations explaining how to use the LD2SNPing software are provided on the homepage and embedded in the user manual (see Additional file 1) as tutorials. Additional File 1. User manual for LD2SNPing. User manual for LD2SNPing. Format: PDF Size: 12.8MB Download file This file can be viewed with: Adobe Acrobat Reader ResultsData import formats: File inputLD2SNPing accepts four different input file formats, namely two Excel (.xls and .cvs), Word (.doc) and NotePad (.txt) formats. The first and second rows for each file are reserved for the user-defined SNP name and the distance between SNPs (optional), respectively. Individual genotypes accept the following formats: NN, N_N, and N/N (N is one of four possible nucleotides). If the input file is missing a genotype, it is automatically bypassed in LD2SNPing processing without interference. Some example files for testing are available in the example file folder of the LD2SNPing software package. Data import formats: rsID inputLD2SNPing provides the rsID# input for online retrieval of individual SNP information from the dbSNP of the NCBI (Figure 1A).
Data import formats: Gene inputLD2SNPing accepts gene name (HUGO, Human Genome Organization) input to provide tag SNPs through online retrieval from HapMap (Figure 1B). LD-free function: Retrieval of individual SNP information from NCBIIn Figure 1A, the SNP (rs17884306) information for all populations of the dbSNP is provided (P1, CAUC1, AFR1, HISP1, and PAC1). The ssID#s (ss32469505 and ss48297306) for the corresponding rsID# (rs17884306) can be selected by using the pull-down window. LD-free function: Gene input for finding rsID data of tag SNPIn Figure 1B, LD2SNPing provides the tag SNP information through HapMap by gene input. The example shown is BRCA2. The tag SNP candidates provided by LD2SNPing are completely matched with those of HapMap (shown in the user manual). HapMap-CEU, HCB, JPT and YRI are acceptable for selection. LD-free function: RFLP enzyme mining toolBefore performing LD analysis, it is necessary to collect SNP genotype data for genes of interest, such as the SNP ID input (Figure 1A) and tag SNPs (Figure 1B). LD2SNPing executes RFLP restriction enzyme mining upon clicking of the RFLP box indicated by arrow 6 of Figure 1A and arrow 5 of Figure 1B. RFLP results are shown in the format pictured in Figure 1C, in which restriction enzyme information for SNPs of interest (here, rs9534275) are shown. Information about alleles, enzyme name, the recognition sequence and commercial availability is provided. LD function: Input formats for 2D analysisLD2SNPing provides for file input and sample file input to perform LD analysis and visualization (numbers 1 and 2 of Figure 2A, respectively). Moreover, LD2SNPing provides for online retrieval of multiple SNP inputs for LD measurement, prediction and visualization (numbers #1 to #8 of Figures 2A and 2B). For convenience, the LD for any SNPs located on the same chromosome can be directly analyzed. Figure 2B shows the single SNP rsID# (rs2078486), which has six different ssID#s from different data sources. For example, ss20037931 has HapMap-CEU, HCB, JPT and YRI as data sources. Different data sources have different genotype frequencies for the same SNP rsID# due to the different data sets. The data was retrieved online from dbSNP of NCBI and confirmed to match (shown in user manual). Both file input and multiple SNP input lead to results similar with those shown in Figure 3, although the color pattern is different (described later).
LD function: 2D-LD graphThe distance between SNPs supplied in the input file can be optionally displayed or hidden (number 1 of Figure 3A). This distance is shown next to the diagonal line as a numerical value. By clicking on the "select scope" (number 2 of Figure 3A) and "repaint" (number 8 of Figure 3A) buttons, a user can limit the number of SNPs shown to only those of interest. This view can be reversed by clicking on the "restore scope" (number 3 of Figure 3A) button. The parameters for LD measurement are selected by the two axes named "left and right LD measure" (numbers 4 and 5 of Figure 3A, respectively). Different color schemes for each of the statistics can be selected (numbers 6 and 7 of Figure 3A). Moreover, LD2SNPing provides a window for the minor allele frequency (MAF) value and HWE-P values for each analyzed SNP when LD analysis is performed (not shown). A more detailed description is given in the user manual. LD function: Data analysis of LD informationLD2SNPing provides spontaneous analysis of the LD measurements for each pair-wise SNPs by clicking. For example, a text window (Figure 4A) will open when the arrow located in the box of SNP5 vs. SNP2 (Figure 3A) is clicked. In Figure 4A, the allele/haplotype frequencies, Chi-square P value, likelihood-ratio P value and all LD statistics (D, D', r2, δQ, and ρ) of paired SNPs are provided. These values are matched to the LDA software [5] (not shown).
In addition, LD2SNPing provides graphic analyses, such as grids and pie3D graphs, to supplement the 2D-LD visualization and analysis (numbers of 10 and 11 of Figure 3A). The results are shown in the user manual. LD function: 3D-LD graphThe 3D visualization of LD is performed by clicking on the icon for number 13 in Figure 3A. It is the same as in the 2D-LD plot except for the color patterns and the color ranges. In LD-3D, the distance and LD measurement values are indicated by the height in the diagonal line (Figure 3B). Users can toggle between the 2D-LD view or close the analysis by clicking on the icon for numbers 12 and 9 of Figure 3A, respectively. Data exportAll the analyzed results can be saved as tab-delimited text files (.txt) and graphic files (.jpg) for convenience. The LD parameters are exported to a single file. Figure 4B shows a sample test result for "LD measure data", D'. All the D' values for each SNP are listed pairwise, a common publishing format. Other LD parameters are not shown here, but are available in the user manual. DiscussionComparison of some LD softwareMany kinds of software for LD visualisation are freely available. LDA [5], Haploview [6], and JLIN [7] were written in Java to implement LD analyses. A comparison of the different LD software is shown in Table 1. LDA and JLIN provide many LD measurements, but LDA offers only limited options for visualization of the results. Some LD parameters are not provided by Haploview, e.g. δQ and ρ values. Table 1. Comparison of some LD software platforms Generally, SNP genotyping has to be performed to generate the SNP genotypes needed for LD analysis. Before performing LD analysis, however, all of the available LD software platforms only provide LD measurements without providing supporting functions, such as tag SNP mining by gene input, retrieval of SNP information, or RFLP enzyme mining for genotype. These supporting functions are provided in LD2SNPing (Table 1). Moreover, LD2SNPing allows for input of multiple SNPs for LD analysis (Figure 2). The genotype information of input SNPs are retrieved online from NCBI and HapMap. Therefore, users have an overview of the LD analysis for the input SNPs without performing prior SNP genotyping or inputting the genotype file. In contrast, Haploview provides many SNPs and users must manually select SNPs of interest. If the SNPs of interest are distributed widely over the chromosome, the SNP panel contains a large number of SNPs. Haploview thus only indirectly provides LD analysis for multiple SNPs. Tag SNP selectionTag SNP selection candidates from different operation times in HapMap may not be consistent due to changes made in the built-in greedy algorithm. Some tag SNPs may or may not be found again in subsequent tests. For example, tag SNP selection by inputting gene BRCA2 to HapMap under MAF = 0.2 yields two tag SNP sets: 1) rs9534342, rs9943888, rs11571662, rs206120, rs206342, rs542551, rs9567552, rs206079, rs9562605, and rs14448 and 2) rs9534275, rs9943888, rs11571579, rs206146, rs206077, rs573014, rs9567552, rs9534174, rs144848, and rs9562605. Restriction enzyme mining for RFLPThe LD2SNPing provides the SNP ID searching to online retrieval to dbSNP in NCBI for RFLP analysis. However, the RFLP analysis for SNP ID input may be unable to provide the restriction enzyme information due to the nature of SNP itself. For example, the sequence information for rs9943888 and rs11571579 are retrieved successfully in LD2SNPing but only rs11571579 has the suitable restriction enzymes to mine (not shown). This is the nature for the SNP itself but not the RFLP analysis system error. For the wet experiment of PCR-RFLP, the users need the primer design software such as the "Prim-SNPing" [20] for primer design for SNP-RFLP and "SNP-Flankplus" [17] for the retrieval of SNP flanking sequence for primer design. ConclusionLD2SNPing has the following characteristics: 1) it provides a search function for online retrieval of SNP information from dbSNP of NCBI; 2) it provides gene-centric tag SNP selection through online retrieval from HapMap; 3) all the SNP IDs and tag SNPs are processed to mine RFLP restriction enzymes for SNP genotype; 4) it provides LD measurements for D, D', r2, δQ, and ρ, along with the P value of the Hardy-Weinberg Equilibrium for each SNP marker and the P values of the Chi-square and likelihood-ratio tests for the pair-wise association of two SNPs in LD calculation; 5) it accepts multiple SNP inputs to perform LD analysis by online retrieval from HapMap and NCBI; 6) it presents both 2D and 3D visualization with LD-related measurements shown on the graphs; 7) it provides both graphic and plain-text outputs for LD analysis. In conclusion, LD2SNPing is a novel and integrated visualisation software designed to provide the user with the tools necessary for genotyping and LD analysis. It provides a simple and user-friendly interface with integrated functions for retrieval of SNP information, LD statistical calculation, analysis and visualization. Availability and requirementsProject name: LD2SNPing: Linkage disequilibrium plotter and RFLP enzyme mining for tag SNPs Project home page: http://bio.kuas.edu.tw/LD2SNPing/ webcite with software and user manual for download. Operating system(s): Platform-independent Programming language: Java Other requirements: Java 1.5.0 or higher License: Free for non-commercial use Any restrictions to use by non-academics: Please contact corresponding author. AbbreviationsSNP: single nucleotide polymorphism; LD: linkage disequilibrium; RFLP: restriction fragment length polymorphism; HWE: Hardy-Weinberg Equilibrium; EM: expectation-maximisation algorithm; HUGO: Human Genome Organization; MAF: minor allele frequency. Authors' contributionsHWC and LYC wrote the manuscript. LYC provides the genomics information and LD-related statistics. YJC designed and developed the Java implementation of the underlying algorithms and GUI. YHC improved the RFLP performance and online retrieval for SNP information. HWC instructed HCH and HCC regarding software testing, improvement, and animation preparation. CHY coordinated and oversaw this study. All authors read and approved the final manuscript. AcknowledgementsThis work was partly supported by the National Science Council in Taiwan under grants 97-2311-B-037-003-MY3, 96-2221-E-214-050-MY3, NSC96-2311-B037-002, 96-2622-E-151-019-CC3, NSC96-2622-E214-004-CC3, KMU-EM-97-1.1b, and KMU-EM-98-1.4. References
Have something to say? Post a comment on this article! |




on Google Scholar








author email
corresponding author email
Figure 1.
Figure 2.
Figure 3.
Figure 4.