<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/rss.css" type="text/css"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:extra="http://www.w3.org/1999/xhtml"
    xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel rdf:about="http://www.biomedcentral.com/feeds/latestarticles/journal?journal=bmcbioinformatics&amp;quantity=&amp;format=rss&amp;version=">
        <title>BMC Bioinformatics - Latest Articles</title>
        <link>http://www.biomedcentral.com/bmcbioinformatics/</link>
        <description>The latest research articles published by BMC Bioinformatics</description>
        <dc:date>2009-11-06T00:00:00Z</dc:date>
        <items>
            <rdf:Seq>
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/370" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/369" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/368" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/367" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/366" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/365" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/364" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/363" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/362" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/361" />
                            </rdf:Seq>
        </items>
        <extra:info rdf:parseType="Literal">
            <html:div style="font:14px Verdana, Geneva, Arial, Helvetica, sans-serif" xmlns:html="http://www.w3.org/1999/xhtml">
                <html:span style="font-weight:bold">
                    This is an RSS newsfeed from BioMed Central
                </html:span>
                <html:br />
                <html:span style="font-size: 12px;">
                    It is intended to be used with an RSS reader. For more information about RSS newsfeeds from BioMed Central, visit
                    <html:br />
                    <html:a href="http://www.biomedcentral.com/info/about/rss/" style="color:#3333CC; font-size:12px;">
                        http://www.biomedcentral.com/info/about/rss/
                    </html:a>
                    <html:br />
                </html:span>
            </html:div>
        </extra:info>
        <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </channel>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/370">
        <title>Elucidation of functional consequences of signalling pathway interactions </title>
        <description>Background:
A great deal of data has accumulated on signalling pathways. These large datasets are thought to contain much implicit information on their molecular structure, interaction and activity information; which provides a picture of intricate molecular networks believed to underlie biological functions. While tremendous advances have been made in trying to understand these systems, how information is transmitted within them is still poorly understood. This ever growing amount of data demands we adopt powerful computational techniques that will play a pivotal role in the conversion of mined data to knowledge, and in elucidating the topological and functional properties of protein-protein interactions.
Results:
A computational framework is presented which allows for the description of embedded networks, and identification of common components thought to assist in the transmission of information within the systems studied. By employing the graph theories of network biology - such as degree distribution, clustering coefficient, vertex betweenness and shortest path measures - topological features of protein-protein interactions for published datasets of the p53, nuclear factor kappa B and G1/S phase of the cell cycle systems were ascertained. Highly ranked nodes which in some cases acted as connecting proteins most likely responsible for propagation of transduction signals across the networks were identified. The functional consequences of these nodes in the context of their network environment were also determined. These findings highlight the usefulness of the framework in identifying possible combination or links within these systems as targets for therapeutic responses; and put forward the idea of using retrieved knowledge on the shared components in constructing better organised and structured models of signalling networks.
Conclusion:
It is hoped that through the data mined reconstructed signal transduction networks, well developed models of the published data can be built which in the end would guide the prediction of new targets based on the pathway&apos;s environment for further analysis. Source code is available upon request.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/370</link>
                <dc:creator>Adaoha Ihekwaba</dc:creator>
                <dc:creator>Phuong Nguyen</dc:creator>
                <dc:creator>Corrado Priami</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:370</dc:source>
        <dc:date>2009-11-06T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-370</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>370</prism:startingPage>
        <prism:publicationDate>2009-11-06T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/369">
        <title>RNA folding on the 3D triangular lattice</title>
        <description>Background:
Difficult problems in structural bioinformatics are often studied in simple exact models to gain insights and to derive general principles. Protein folding, for example, has long been studied in the lattice model. Recently, researchers have also begun to apply the lattice model to the study of RNA folding.
Results:
We present a novel method for predicting RNA secondary structures with pseudoknots: first simulate the folding dynamics of the RNA sequence on the 3D triangular lattice, next extract and select a set of disjoint base pairs from the best lattice conformation found by the folding simulation. Experiments on sequences from PseudoBase show that our prediction method outperforms the HotKnot algorithm of Ren, Rastegari, Condon and Hoos, a leading method for RNA pseudoknot prediction. Our method for RNA secondary structure prediction can be adapted into an efficient reconstruction method that, given an RNA sequence and an associated secondary structure, finds a conformation of the sequence on the 3D triangular lattice that realizes the base pairs in the secondary structure. We implemented a suite of computer programs for the simulation and visualization of RNA folding on the 3D triangular lattice. These programs come with detailed documentation and are accessible from the companion website of this paper at http://www.cs.usu.edu/~mjiang/rna/DeltaIS/.
Conclusion:
Folding simulation on the 3D triangular lattice is effective method for RNA secondary structure prediction and lattice conformation reconstruction. The visualization software for the lattice conformations of RNA structures is a valuable tool for the study of RNA folding and is a great pedagogic device.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/369</link>
                <dc:creator>Joel Gillespie</dc:creator>
                <dc:creator>Martin Mayne</dc:creator>
                <dc:creator>Minghui Jiang</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:369</dc:source>
        <dc:date>2009-11-05T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-369</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>369</prism:startingPage>
        <prism:publicationDate>2009-11-05T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/368">
        <title>A biosegmentation benchmark for evaluation of bioimage analysis methods</title>
        <description>Background:
We present a biosegmentation benchmark that includes infrastructure, datasets with associated ground truth, and validation methods for biological image analysis. The primary motivation for creating this resource comes from the fact that it is very difficult, if not impossible, for an end-user to choose from a wide range of segmentation methods available in the literature for a particular bioimaging problem. No single algorithm is likely to be equally effective on diverse set of images and each method has its own strengths and limitations. We hope that our benchmark resource would be of considerable help to both the bioimaging researchers looking for novel image processing methods and image processing researchers exploring application of their methods to biology.
Results:
Our benchmark consists of different classes of images and ground truth data, ranging in scale from subcellular, cellular to tissue level, each of which pose their own set of challenges to image analysis. The associated ground truth data can be used to evaluate the effectiveness of different methods, to improve methods and to compare results. Standard evaluation methods and some analysis tools are integrated into a database framework that is available online at  http://bioimage.ucsb.edu/biosegmentation/.
Conclusions:
This online benchmark will facilitate integration and comparison of image analysis methods for bioimages. While the primary focus is on biological images, we believe that the dataset and infrastructure will be of interest to researchers and developers working with biological image analysis, image segmentation and object tracking in general.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/368</link>
                <dc:creator>Elisa Drelie Gelasca</dc:creator>
                <dc:creator>Boguslaw Obara</dc:creator>
                <dc:creator>Dmitry Fedorov</dc:creator>
                <dc:creator>Kristian Kvilekval</dc:creator>
                <dc:creator>B. Manjunath</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:368</dc:source>
        <dc:date>2009-11-01T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-368</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>368</prism:startingPage>
        <prism:publicationDate>2009-11-01T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/367">
        <title>GLIDERS - A web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs</title>
        <description>Background:
A number of tools for the examination of linkage disequilibrium (LD) patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (&gt;500kb). We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine) that enables the retrieval of pairwise associations with r2 [greater than or equal to] 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers.DescriptionGLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range). The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%), distance limits between SNPs (minimum and maximum), r2 (0.3 to 1), HapMap population sample (CEU, YRI and JPT+CHB combined) and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file.
Conclusions:
GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/367</link>
                <dc:creator>Robert Lawrence</dc:creator>
                <dc:creator>Aaron Day-Williams</dc:creator>
                <dc:creator>Richard Mott</dc:creator>
                <dc:creator>John Broxholme</dc:creator>
                <dc:creator>Lon Cardon</dc:creator>
                <dc:creator>Eleftheria Zeggini</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:367</dc:source>
        <dc:date>2009-10-31T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-367</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>367</prism:startingPage>
        <prism:publicationDate>2009-10-31T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/366">
        <title>(PS)2-v2: template-based protein structure prediction server</title>
        <description>Background:
Template selection and target-template alignment are critical steps for template-based modeling (TBM) methods. To identify the template for the twilight zone of 15~25% sequence similarity between targets and templates is still difficulty for template-based protein structure prediction. This study presents the (PS)2-v2 server, based on our original server with numerous enhancements and modifications, to improve reliability and applicability.
Results:
To detect homologous proteins with remote similarity, the (PS)2-v2 server utilizes the S2A2 matrix, which is a 60x60 substitution matrix using the secondary structure propensities of 20 amino acids, and the position-specific sequence profile (PSSM) generated by PSI-BLAST. In addition, our server uses multiple templates and multiple models to build and assess models. Our method was evaluated on the Lindahl benchmark for fold recognition and ProSup benchmark for sequence alignment. Evaluation results indicated that our method outperforms sequence-profile approaches, and had comparable performance to that of structure-based methods on these benchmarks. Finally, we tested our method using the 154 TBM targets of the CASP8 (Critical Assessment of Techniques for Protein Structure Prediction) dataset. Experimental results show that (PS)2-v2 is ranked 6th among 72 severs and is faster than the top-rank five serves, which utilize ab initio methods.
Conclusions:
Experimental results demonstrate that (PS)2-v2 with the S2A2 matrix is useful for template selections and target-template alignments by blending the amino acid and structural propensities. The multiple-template and multiple-model strategies are able to significantly improve the accuracies for target-template alignments in the twilight zone. We believe that this server is useful in structure prediction and modeling, especially in detecting homologous templates with sequence similarity in the twilight zone.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/366</link>
                <dc:creator>Chih-Chieh Chen</dc:creator>
                <dc:creator>Jenn-Kang Hwang</dc:creator>
                <dc:creator>Jinn-Moon Yang</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:366</dc:source>
        <dc:date>2009-10-31T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-366</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>366</prism:startingPage>
        <prism:publicationDate>2009-10-31T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/365">
        <title>Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods</title>
        <description>Background:
Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (Delta Delta G) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues (&quot;hot spots&apos;&apos;) at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition.
Results:
We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which Delta Delta G &gt; 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%.
Conclusions:
We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been applied separately to biomolecular problems, the results of our investigation indicate that there are substantial benefits to be gained by their integration.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/365</link>
                <dc:creator>Stefano Lise</dc:creator>
                <dc:creator>Cedric Archambeau</dc:creator>
                <dc:creator>Massimiliano Pontil</dc:creator>
                <dc:creator>David Jones</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:365</dc:source>
        <dc:date>2009-10-30T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-365</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>365</prism:startingPage>
        <prism:publicationDate>2009-10-30T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/364">
        <title>multiplierz: an extensible API based desktop environment for proteomics data analysis</title>
        <description>Background:
Efficient analysis of results from mass spectrometry-based proteomics experiments requires access to disparate data types, including native mass spectrometry files, output from algorithms that assign peptide sequence to MS/MS spectra, and annotation for proteins and pathways from various database sources.  Moreover, proteomics technologies and experimental methods are not yet standardized; hence a high degree of flexibility is necessary for efficient support of high- and low-throughput data analytic tasks.  Development of a desktop environment that is sufficiently robust for deployment in data analytic pipelines, and simultaneously supports customization for programmers and non-programmers alike, has proven to be a significant challenge.
Results:
We describe multiplierz, a flexible and open-source desktop environment for comprehensive proteomics data analysis.  We use this framework to expose a prototype version of our recently proposed common API (mzAPI) designed for direct access to proprietary mass spectrometry files.  In addition to routine data analytic tasks, multiplierz supports generation of information rich, portable spreadsheet-based reports.  Moreover, multiplierz is designed around a &quot;zero infrastructure&quot; philosophy, meaning that it can be deployed by end users with little or no system administration support.  Finally, access to multiplierz functionality is provided via high-level Python scripts, resulting in a fully extensible data analytic environment for rapid development of custom algorithms and deployment of high-throughput data pipelines.
Conclusions:
Collectively, mzAPI and multiplierz facilitate a wide range of data analysis tasks, spanning technology development to biological annotation, for mass spectrometry-based proteomics research.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/364</link>
                <dc:creator>Jignesh Parikh</dc:creator>
                <dc:creator>Manor Askenazi</dc:creator>
                <dc:creator>Scott Ficarro</dc:creator>
                <dc:creator>Tanya Cashorali</dc:creator>
                <dc:creator>James Webber</dc:creator>
                <dc:creator>Nathaniel Blank</dc:creator>
                <dc:creator>Yi Zhang</dc:creator>
                <dc:creator>Jarrod Marto</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:364</dc:source>
        <dc:date>2009-10-29T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-364</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>364</prism:startingPage>
        <prism:publicationDate>2009-10-29T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/363">
        <title>A novel R-package graphic user interface for the analysis of metabonomic profiles</title>
        <description>Background:
Analysis of the plethora of metabolites found in the NMR spectra of biological fluids or tissues requires data complexity to be simplified. We present a graphical user interface (GUI) for NMR-based metabonomic analysis. The &quot;Metabonomic Package&quot; has been developed for metabonomics research as open-source software and uses the R statistical libraries.
Results:
The package offers the following options:Raw 1-dimensional spectra processing: phase, baseline correction and normalization.Importing processed spectra.Including/excluding spectral ranges, optional binning and bucketing, detection and alignment of peaks.Sorting of metabolites based on their ability to discriminate, metabolite selection, and outlier identification.Multivariate unsupervised analysis: principal components analysis (PCA).Multivariate supervised analysis: partial least squares (PLS), linear discriminant analysis (LDA), k-nearest neighbor classification.Neural networks.Visualization and overlapping of spectra.Plot values of the chemical shift position for different samples.Furthermore, the &quot;Metabonomic&quot; GUI includes a console to enable other kinds of analyses and to take advantage of all R statistical tools.
Conclusions:
We made complex multivariate analysis user-friendly for both experienced and novice users, which could help to expand the use of NMR-based metabonomics.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/363</link>
                <dc:creator>Jose Izquierdo-Garcia</dc:creator>
                <dc:creator>Ignacio Rodriguez</dc:creator>
                <dc:creator>Angelos Kyriazis</dc:creator>
                <dc:creator>Palmira Villa</dc:creator>
                <dc:creator>Pilar Barreiro</dc:creator>
                <dc:creator>Manuel Desco</dc:creator>
                <dc:creator>Jesus Ruiz-Cabello</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:363</dc:source>
        <dc:date>2009-10-29T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-363</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>363</prism:startingPage>
        <prism:publicationDate>2009-10-29T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/362">
        <title>BARCRAWL and BARTAB:  Software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing</title>
        <description>Background:
Advances in automated DNA sequencing technology have greatly increased the scale of genomic and metagenomic studies.  An increasingly popular means of increasing project throughput is by multiplexing samples during the sequencing phase.  This can be achieved by covalently linking short, unique &quot;barcode&quot; DNA segments to genomic DNA samples, for instance through incorporation of barcode sequences in PCR primers. Although several strategies have been described to insure that barcode sequences are unique and robust to sequencing errors, these have not been integrated into the overall primer design process, thus potentially introducing bias into PCR amplification and/or sequencing steps.
Results:
Barcrawl is a software program that facilitates the design of barcoded primers, for multiplexed high-throughput sequencing. The program bartab can be used to deconvolute DNA sequence datasets produced by the use of multiple barcoded primers.  This paper describes the functions implemented by barcrawl and bartab and presents a proof-of-concept case study of both programs in which barcoded rRNA primers were designed and validated by high-throughput sequencing.
Conclusions:
Barcrawl and bartab can benefit researchers who are engaged in metagenomic projects that employ multiplexed specimen processing.  The source code is released under the GNU general public license and can be accessed at http://www.phyloware.com.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/362</link>
                <dc:creator>Daniel Frank</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:362</dc:source>
        <dc:date>2009-10-29T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-362</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>362</prism:startingPage>
        <prism:publicationDate>2009-10-29T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/361">
        <title>Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy</title>
        <description>Background:
Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, it has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. An earlier work published seven years ago on predicting sulfotyrosine sites has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins.
Results:
A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at http://ecsb.ex.ac.uk/sulfotyrosine for public use.
Conclusion:
The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/361</link>
                <dc:creator>Zheng Rong Yang</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:361</dc:source>
        <dc:date>2009-10-29T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-361</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>361</prism:startingPage>
        <prism:publicationDate>2009-10-29T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <cc:License rdf:about="http://creativecommons.org/licenses/by/2.0/">
        <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
    </cc:License>
</rdf:RDF>
