<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/rss.css" type="text/css"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:extra="http://www.w3.org/1999/xhtml"
    xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel rdf:about="http://www.biomedcentral.com/feeds/latestarticles/journal?journal=bmcbioinformatics&amp;quantity=&amp;format=rss&amp;version=">
        <title>BMC Bioinformatics - Latest Articles</title>
        <link>http://www.biomedcentral.com/bmcbioinformatics/</link>
        <description>The latest research articles published by BMC Bioinformatics</description>
        <dc:date>2013-05-16T00:00:00Z</dc:date>
        <items>
            <rdf:Seq>
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/162" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/161" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/160" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/159" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/158" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/157" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/156" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/155" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/154" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/14/153" />
                            </rdf:Seq>
        </items>
                 <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </channel>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/162">
        <title>Calcium (Ca2+) waves data calibration and analysis using image processing techniques</title>
        <description>Background:
Calcium (Ca2+) propagates within tissues serving as an important information carrier. In particular, cilia beat frequency in oviduct cells is partially regulated by Ca2+ changes. Thus, measuring the calcium density and characterizing the traveling wave plays a key role in understanding biological phenomena. However, current methods to measure propagation velocities and other wave characteristics involve several manual or time-consuming procedures. This limits the amount of information that can be extracted, and the statistical quality of the analysis.
Results:
Our work provides a framework based on image processing procedures that enables a fast, automatic and robust characterization of data from two-filter fluorescence Ca2+ experiments. We calculate the mean velocity of the wave-front, and use theoretical models to extract meaningful parameters like wave amplitude, decay rate and time of excitation.
Conclusions:
Measurements done by different operators showed a high degree of reproducibility. This framework is also extended to a single filter fluorescence experiments, allowing higher sampling rates, and thus an increased accuracy in velocity measurements.</description>
        <link>http://www.biomedcentral.com/1471-2105/14/162</link>
                <dc:creator>Carlos Milovic</dc:creator>
                <dc:creator>Carolina Oses</dc:creator>
                <dc:creator>Manuel Villalón</dc:creator>
                <dc:creator>Sergio Uribe</dc:creator>
                <dc:creator>Carlos Lizama</dc:creator>
                <dc:creator>Claudia Prieto</dc:creator>
                <dc:creator>Marcelo Andia</dc:creator>
                <dc:creator>Pablo Irarrazaval</dc:creator>
                <dc:creator>Cristian Tejos</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:162</dc:source>
        <dc:date>2013-05-16T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-162</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-162-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>162</prism:startingPage>
        <prism:publicationDate>2013-05-16T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/161">
        <title>Efficient algorithms for biological stems search</title>
        <description>Background:
Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS) or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l) with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs.
Results:
In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic&apos;s algorithm.
Conclusions:
Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller) subset of the stems output by Kuksa and Pavlovic&apos;s algorithm.</description>
        <link>http://www.biomedcentral.com/1471-2105/14/161</link>
                <dc:creator>Tian Mi</dc:creator>
                <dc:creator>Sanguthevar Rajasekaran</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:161</dc:source>
        <dc:date>2013-05-16T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-161</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-161-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>161</prism:startingPage>
        <prism:publicationDate>2013-05-16T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/160">
        <title>Disk-based k-mer counting on a PC</title>
        <description>Background:
The k-mer counting problem, which is to build the histogram of occurrences of every k-symbol longsubstring in a given text, is important for many bioinformatics applications. They include developingde Bruijn graph genome assemblers, fast multiple sequence alignment and repeat detection.
Results:
We propose a simple, yet efficient, parallel disk-based algorithm for counting k-mers. Experimentsshow that it usually offers the fastest solution to the considered problem, while demanding a relativelysmall amount of memory. In particular, it is capable of counting the statistics for short-read humangenome data, in input gzipped FASTQ file, in less than 40 minutes on a PC with 16GB of RAMand 6 CPU cores, and for long-read human genome data in less than 70 minutes. On a more powerfulmachine, using 32GB of RAM and 32 CPU cores, the tasks are accomplished in less than half the time.No other algorithm for most tested settings of this problem and mammalian-size data can accomplishthis task in comparable time. Our solution also belongs to memory-frugal ones; most competitivealgorithms cannot efficiently work on a PC with 16GB of memory for such massive data.
Conclusions:
By making use of cheap disk space and exploiting CPU and I/O parallelism we propose a very compet-itive k-mer counting procedure, called KMC. Our results suggest that judicious resource managementmay allow to solve at least some bioinformatics problems with massive data on a commodity personalcomputer.Keywordsk-mer counting, de Bruijn graph genome assemblers, Multiple sequence alignment, Repeat detectionAvailabilityKMC is freely available at http://sun.aei.polsl.pl/kmc.</description>
        <link>http://www.biomedcentral.com/1471-2105/14/160</link>
                <dc:creator>Sebastian Deorowicz</dc:creator>
                <dc:creator>Agnieszka Debudaj-Grabysz</dc:creator>
                <dc:creator>Szymon Grabowski</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:160</dc:source>
        <dc:date>2013-05-16T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-160</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-160-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>160</prism:startingPage>
        <prism:publicationDate>2013-05-16T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/159">
        <title>MIMO: an efficient tool for molecular interaction maps overlap</title>
        <description>Background:
Molecular pathways represent an ensemble of interactions occurring among molecules within the cell and between cells. The identification of similarities between molecular pathways across organisms and functions has a critical role in understanding complex biological processes. For the inference of such novel information, the comparison of molecular pathways requires to account for imperfect matches (flexibility) and to efficiently handle complex network topologies. To date, these characteristics are only partially available in tools designed to compare molecular interaction maps.
Results:
Our approach MIMO (Molecular Interaction Maps Overlap) addresses the first problem by allowing the introduction of gaps and mismatches between query and template pathways and permits -when necessary- supervised queries incorporating a priori biological information. It then addresses the second issue by relying directly on the rich graph topology described in the Systems Biology Markup Language (SBML) standard, and uses multidigraphs to efficiently handle multiple queries on biological graph databases. The algorithm has been here successfully used  to highlight the contact point between various human pathways in the Reactome database.
Conclusions:
MIMO offers a flexible and efficient graph-matching tool for comparing complex biological pathways.</description>
        <link>http://www.biomedcentral.com/1471-2105/14/159</link>
                <dc:creator>Pietro Di Lena</dc:creator>
                <dc:creator>Gang Wu</dc:creator>
                <dc:creator>Pier Luigi Martelli</dc:creator>
                <dc:creator>Rita Casadio</dc:creator>
                <dc:creator>Christine Nardini</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:159</dc:source>
        <dc:date>2013-05-15T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-159</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-159-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>159</prism:startingPage>
        <prism:publicationDate>2013-05-15T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/158">
        <title>Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient</title>
        <description>Background:
Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great &quot;Tree of Life&quot; (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user&apos;s needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces.
Results:
With the aim of building such a &quot;phylotastic&quot; system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (www.phylotastic.org), and a server image.
Conclusions:
Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.</description>
        <link>http://www.biomedcentral.com/1471-2105/14/158</link>
                <dc:creator>Arlin Stoltzfus</dc:creator>
                <dc:creator>Hilmar Lapp</dc:creator>
                <dc:creator>Naim Matasci</dc:creator>
                <dc:creator>Helena Deus</dc:creator>
                <dc:creator>Brian Sidlauskas</dc:creator>
                <dc:creator>Christian Zmasek</dc:creator>
                <dc:creator>Gaurav Vaidya</dc:creator>
                <dc:creator>Enrico Pontelli</dc:creator>
                <dc:creator>Karen Cranston</dc:creator>
                <dc:creator>Rutger Vos</dc:creator>
                <dc:creator>Campbell Webb</dc:creator>
                <dc:creator>Luke Harmon</dc:creator>
                <dc:creator>Megan Pirrung</dc:creator>
                <dc:creator>Brian O'Meara</dc:creator>
                <dc:creator>Matthew Pennell</dc:creator>
                <dc:creator>Siavash Mirarab</dc:creator>
                <dc:creator>Michael Rosenberg</dc:creator>
                <dc:creator>James Balhoff</dc:creator>
                <dc:creator>Holly Bik</dc:creator>
                <dc:creator>Tracy Heath</dc:creator>
                <dc:creator>Peter Midford</dc:creator>
                <dc:creator>Joseph Brown</dc:creator>
                <dc:creator>Emily McTavish</dc:creator>
                <dc:creator>Jeet Sukumaran</dc:creator>
                <dc:creator>Mark Westneat</dc:creator>
                <dc:creator>Michael Alfaro</dc:creator>
                <dc:creator>Aaron Steele</dc:creator>
                <dc:creator>Greg Jordan</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:158</dc:source>
        <dc:date>2013-05-13T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-158</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-158-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>158</prism:startingPage>
        <prism:publicationDate>2013-05-13T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/157">
        <title>Copy number variation genotyping using family information</title>
        <description>Background In recent years there has been a growing interest in the role of copy number variations (CNV) in genetic diseases. Though there has been rapid development of technologies and statistical methods devoted to detection in CNVs from array data, the inherent challenges in data quality associated with most hybridization techniques remains a challenging problem in CNV association studies.Results To help address these data quality issues in the context of family-based association studies, we introduce a statistical framework for the intensity-based array data that takes into account the family information for copy-number assignment. The method is an adaptation of traditional methods for modeling SNP genotype data that assume Gaussian mixture model, whereby CNV calling is performed for all family members simultaneously and leveraging within family-data to reduce CNV calls that are incompatible with Mendelian inheritance while still allowing de-novo CNVs. Applying this method to simulation studies and a genome-wide association study in asthma, we find that our approach significantly improves CNV calls accuracy, and reduces the Mendelian inconsistency rates and false positive genotype calls. The results were validated using qPCR experiments.Conclusions In conclusion, we have demonstrated that the use of family information can improve the quality of CNV calling and hopefully give more powerful association test of CNVs.</description>
        <link>http://www.biomedcentral.com/1471-2105/14/157</link>
                <dc:creator>Jen-hwa Chu</dc:creator>
                <dc:creator>Angela Rogers</dc:creator>
                <dc:creator>Iuliana Ionita-Laza</dc:creator>
                <dc:creator>Katayoon Darvishi</dc:creator>
                <dc:creator>Ryan Mills</dc:creator>
                <dc:creator>Charles Lee</dc:creator>
                <dc:creator>Benjamin Raby</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:157</dc:source>
        <dc:date>2013-05-09T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-157</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-157-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>157</prism:startingPage>
        <prism:publicationDate>2013-05-09T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/156">
        <title>Real-time interactive data mining for chemical imaging information: application to automated histopathology</title>
        <description>Background:
Vibrational spectroscopic imaging is now used in several fields to acquire molecular information from microscopically heterogeneous systems. Recent advances have led to promising applications in tissue analysis for cancer research, where chemical information can be used to identify cell types and disease. However, recorded spectra are affected by the morphology of the tissue sample, making identification of chemical structures difficult.
Results:
Extracting features that can be used to classify tissue is a cumbersome manual process which limits this technology from wide applicability. In this paper, we describe a method for interactive data mining of spectral features using GPU-based manipulation of the spectral distribution.
Conclusions:
This allows researchers to quickly identify chemical features corresponding to cell type. These features are then applied to tissue samples in order to visualize the chemical composition of the tissue without the use of chemical stains.</description>
        <link>http://www.biomedcentral.com/1471-2105/14/156</link>
                <dc:creator>David Mayerich</dc:creator>
                <dc:creator>Michael Walsh</dc:creator>
                <dc:creator>Matthew Schulmerich</dc:creator>
                <dc:creator>Rohit Bhargava</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:156</dc:source>
        <dc:date>2013-05-08T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-156</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-156-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>156</prism:startingPage>
        <prism:publicationDate>2013-05-08T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/155">
        <title>Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution</title>
        <description>Background Glycoproteins are involved in a diverse range of biochemical and biological processes. Changes in protein glycosylation are believed to occur in many diseases, particularly during cancer initiation and progression. The identification of biomarkers for human disease states is becoming increasingly important, as early detection is key to improving survival and recovery rates. To this end, the serum glycome has been proposed as a potential source of biomarkers for different types of cancers.High-throughput hydrophilic interaction liquid chromatography (HILIC) technology for glycan analysis allows for the detailed quantification of the glycan content in human serum. However, the experimental data from this analysis is compositional by nature. Compositional data are subject to a constant-sum constraint, which restricts the sample space to a simplex. Statistical analysis of glycan chromatography datasets should account for their unusual mathematical properties.As the volume of glycan HILIC data being produced increases, there is a considerable need for a framework to support appropriate statistical analysis. Proposed here is a methodology for feature selection in compositional data. The principal objective is to provide a template for the analysis of glycan chromatography data that may be used to identify potential glycan biomarkers.Results A greedy search algorithm, based on the generalized Dirichlet distribution, is carried out over the feature space to search for the set of &quot;grouping variables&quot; that best discriminate between known group structures in the data, modelling the compositional variables using beta distributions. The algorithm is applied to two glycan chromatography datasets. Statistical classification methods are used to test the ability of the selected features to differentiate between known groups in the data. Two well-known methods are used for comparison: correlation-based feature selection (CFS) and recursive partitioning (rpart). CFS is a feature selection method, while recursive partitioning is a learning tree algorithm that has been used for feature selection in the past.Conclusions The proposed feature selection method performs well for both glycan chromatography datasets. It is computationally slower, but results in a lower misclassification rate and a higher sensitivity rate than both correlation-based feature selection and the classification tree method.</description>
        <link>http://www.biomedcentral.com/1471-2105/14/155</link>
                <dc:creator>Marie Galligan</dc:creator>
                <dc:creator>Radka Saldova</dc:creator>
                <dc:creator>Matthew Campbell</dc:creator>
                <dc:creator>Pauline Rudd</dc:creator>
                <dc:creator>Thomas Murphy</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:155</dc:source>
        <dc:date>2013-05-07T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-155</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-155-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>155</prism:startingPage>
        <prism:publicationDate>2013-05-07T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/154">
        <title>Reconstituting protein interaction networks using parameter-dependent domain-domain interactions</title>
        <description>Background:
We can describe protein-protein interactions (PPIs) as sets of distinct domain-domain interactions (DDIs) that mediate the physical interactions between proteins. Experimental data confirm that DDIs are more consistent than their corresponding PPIs, lending support to the notion that analyses of DDIs may improve our understanding of PPIs and lead to further insights into cellular function, disease, and evolution. However, currently available experimental DDI data cover only a small fraction of all existing PPIs and, in the absence of structural data, determining which particular DDI mediates any given PPI is a challenge.
Results:
We present two contributions to the field of domain interaction analysis. First, we introduce a novel computational strategy to merge domain annotation data from multiple databases. We show that when we merged yeast domain annotations from six annotation databases we increased the average number of domains per protein from 1.05 to 2.44, bringing it closer to the estimated average value of 3. Second, we introduce a novel computational method, parameter-dependent DDI selection (PADDS), which, given a set of PPIs, extracts a small set of domain pairs that can reconstruct the original set of protein interactions, while attempting to minimize false positives. Based on a set of PPIs from multiple organisms, our method extracted 27% more experimentally detected DDIs than existing computational approaches.
Conclusions:
We have provided a method to merge domain annotation data from multiple sources, ensuring large and consistent domain annotation for any given organism. Moreover, we provided a method to extract a small set of DDIs from the underlying set of PPIs and we showed that, in contrast to existing approaches, our method was not biased towards DDIs with low or high occurrence counts. Finally, we used these two methods to highlight the influence of the underlying annotation density on the characteristics of extracted DDIs. Although increased annotations greatly expanded the possible DDIs, the lack of knowledge of the true biological false positive interactions still prevents an unambiguous assignment of domain interactions responsible for all protein network interactions.Executable files and examples are given athttp://www.bhsai.org/downloads/padds/</description>
        <link>http://www.biomedcentral.com/1471-2105/14/154</link>
                <dc:creator>Vesna Memi¿evi¿</dc:creator>
                <dc:creator>Anders Wallqvist</dc:creator>
                <dc:creator>Jaques Reifman</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:154</dc:source>
        <dc:date>2013-05-07T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-154</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-154-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>154</prism:startingPage>
        <prism:publicationDate>2013-05-07T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/14/153">
        <title>Iterative rank-order normalization of gene expression microarray data</title>
        <description>Background:
Many gene expression normalization algorithms exist for Affymetrix GeneChip microarrays. The most popular of these is RMA, primarily due to the precision and low noise produced during the process. A significant strength of this and similar approaches is the use of the entire set of arrays during both normalization and model-based estimation of signal. However, this leads to differing estimates of expression based on the starting set of arrays, and estimates can change when a single, additional chip is added to the set. Additionally, outlier chips can impact the signals of other arrays, and can themselves be skewed by the majority of the population.
Results:
We developed an approach, termed IRON, which uses the best-performing techniques from each of several popular processing methods while retaining the ability to incrementally renormalize data without altering previously normalized expression. This combination of approaches results in a method that performs comparably to existing approaches on artificial benchmark datasets (i.e. spike-in) and demonstrates promising improvements in segregating true signals within biologically complex experiments.
Conclusions:
By combining approaches from existing normalization techniques, the IRON method offers several advantages. First, IRON normalization occurs pair-wise, thereby avoiding the need for all chips to be normalized together, which can be important for large data analyses. Secondly, the technique does not require similarity in signal distribution across chips for normalization, which can be important for maintaining biologically relevant differences in a heterogeneous background. Lastly, IRON introduces fewer post-processing artifacts, particularly in data whose behavior violates common assumptions. Thus, the IRON method provides a practical solution to common needs of expression analysis. A software implementation of IRON is available at [http://gene.moffitt.org/libaffy/].</description>
        <link>http://www.biomedcentral.com/1471-2105/14/153</link>
                <dc:creator>Eric Welsh</dc:creator>
                <dc:creator>Steven Eschrich</dc:creator>
                <dc:creator>Anders Berglund</dc:creator>
                <dc:creator>David Fenstermacher</dc:creator>
                <dc:source>BMC Bioinformatics 2013, null:153</dc:source>
        <dc:date>2013-05-07T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-14-153</dc:identifier>
                                <prism:require>/content/figures/1471-2105-14-153-toc.gif</prism:require>
                <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>153</prism:startingPage>
        <prism:publicationDate>2013-05-07T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <cc:License rdf:about="http://creativecommons.org/licenses/by/2.0/">
        <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
    </cc:License>
</rdf:RDF>
