<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/rss.css" type="text/css"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:extra="http://www.w3.org/1999/xhtml"
    xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel rdf:about="http://www.biomedcentral.com/feeds/latestarticles/journal?journal=bmcbioinformatics&amp;quantity=&amp;format=rss&amp;version=">
        <title>BMC Bioinformatics - Latest Articles</title>
        <link>http://www.biomedcentral.com/bmcbioinformatics/</link>
        <description>The latest research articles published by BMC Bioinformatics</description>
        <dc:date>2009-07-09T00:00:00Z</dc:date>
        <items>
            <rdf:Seq>
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/212" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/211" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/210" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/209" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/208" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/207" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/206" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/205" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/204" />
                                <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/10/203" />
                            </rdf:Seq>
        </items>
        <extra:info rdf:parseType="Literal">
            <html:div style="font:14px Verdana, Geneva, Arial, Helvetica, sans-serif" xmlns:html="http://www.w3.org/1999/xhtml">
                <html:span style="font-weight:bold">
                    This is an RSS newsfeed from BioMed Central
                </html:span>
                <html:br />
                <html:span style="font-size: 12px;">
                    It is intended to be used with an RSS reader. For more information about RSS newsfeeds from BioMed Central, visit
                    <html:br />
                    <html:a href="http://www.biomedcentral.com/info/about/rss/" style="color:#3333CC; font-size:12px;">
                        http://www.biomedcentral.com/info/about/rss/
                    </html:a>
                    <html:br />
                </html:span>
            </html:div>
        </extra:info>
        <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </channel>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/212">
        <title>Error statistics of hidden Markov model and hidden Boltzmann model results</title>
        <description>Background:
Hidden Markov models and hidden Boltzmann models are employed in computational biology and a variety of other scientific fields for a variety of analyses of sequential data.  Whether the associated algorithms are used to compute an actual probability or, more generally, an odds ratio or some other score, a frequent requirement is that the error statistics of a given score be known.  What is the chance that random data would achieve that score or better?  What is the chance that a real signal would achieve a given score threshold?
Results:
Here we present a novel general approach to estimating these false positive and true positive rates that is significantly more efficient than are existing general approaches.  We validate the technique via an implementation within the HMMER 3.0 package, which scans DNA or protein sequence databases for patterns of interest, using a profile-HMM.
Conclusions:
The new approach is faster than general naive sampling approaches, and more general than other current approaches.  It provides an efficient mechanism by which to estimate error statistics for hidden Markov model and hidden Boltzmann model results.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/212</link>
                <dc:creator>Lee Newberg</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:212</dc:source>
        <dc:date>2009-07-09T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-212</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>212</prism:startingPage>
        <prism:publicationDate>2009-07-09T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/211">
        <title>puma: a Bioconductor package for Propagating Uncertainty in Microarray Analysis</title>
        <description>Background:
Most analyses of microarray data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analyses it has been shown that we can improve results of differential expression detection, principal component analysis and clustering. Previously, implementations of these uncertainty propagation methods have only been available as separate packages, written in different languages. Previous implementations have also suffered from being very costly to compute, and in the case of differential expression detection, have been limited in the experimental designs to which they can be applied.
Results:
puma is a Bioconductor package incorporating a suite of analysis methods for use on Affymetrix GeneChip data.puma extends the differential expression detection methods of previous work from the 2-class case to the multi-factorial case. puma can be used to automatically create design and contrast matrices for typical experimental designs, which can be used both within the package itself but also in other Bioconductor packages. The implementation of differential expression detection methods has been parallelised leading to significant decreases in processing time on a range of computer architectures. puma incorporates the first R implementation of an uncertainty propagation version of principal component analysis, and an implementation of a clustering method based on uncertainty propagation. All of these techniques are brought together in a single, easy-to-use package with clear, task-based documentation.
Conclusions:
For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. These methods can be used to improve results from more traditional analyses of microarray data. puma also offers improvements in terms of scope and speed of execution over previously available methods. puma is recommended for anyone working with the Affymetrix GeneChip platform for gene expression analysis and can also be applied more generally.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/211</link>
                <dc:creator>Richard Pearson</dc:creator>
                <dc:creator>Xuejun Liu</dc:creator>
                <dc:creator>Guido Sanguinetti</dc:creator>
                <dc:creator>Marta Milo</dc:creator>
                <dc:creator>Neil Lawrence</dc:creator>
                <dc:creator>Magnus Rattray</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:211</dc:source>
        <dc:date>2009-07-09T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-211</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>211</prism:startingPage>
        <prism:publicationDate>2009-07-09T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/210">
        <title>Iterative refinement of structure-based sequence alignments by Seed Extension</title>
        <description>Background:
Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment.
Results:
RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI&apos;s CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs.
Conclusion:
RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/210</link>
                <dc:creator>Changhoon Kim</dc:creator>
                <dc:creator>Chin-Hsien Tai</dc:creator>
                <dc:creator>Byungkook Lee</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:210</dc:source>
        <dc:date>2009-07-09T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-210</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>210</prism:startingPage>
        <prism:publicationDate>2009-07-09T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/209">
        <title>A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests</title>
        <description>Background:
The detection of true significant cases under multiple testing is becoming a fundamental issue when analyzing high-dimensional biological data. Unfortunately, known multitest adjustments reduce their statistical power as the number of tests increase. We propose a new multitest adjustment, based on a sequential goodness of fit metatest (SGoF), which increases its statistical power with the number of tests. The method is compared with Bonferroni and FDR-based alternatives by simulating a multitest context via two different kinds of tests: 1) one-sample t-test, and 2) homogeneity G-test.
Results:
It is shown that SGoF behaves especially well with small sample sizes when 1) the alternative hypothesis is weakly to moderately deviated from the null model, 2) there are widespread effects through the family of tests, and 3) the number of tests is large.
Conclusions:
Therefore, SGoF should become an important tool for multitest adjustment when working with high-dimensional biological data.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/209</link>
                <dc:creator>Antonio Carvajal-Rodriguez</dc:creator>
                <dc:creator>Jacobo de Una-Alvarez</dc:creator>
                <dc:creator>Emilio Rolan-Alvarez</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:209</dc:source>
        <dc:date>2009-07-08T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-209</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>209</prism:startingPage>
        <prism:publicationDate>2009-07-08T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/208">
        <title>OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif</title>
        <description>Background:
DNA sequence binding motifs for several important transcription factors happen to be self-overlapping. Many of the current regulatory site identification methods do not explicitly take into account the overlapping sites. Moreover, most methods use arbitrary thresholds and fail to provide a biophysical interpretation of statistical quantities. In addition, commonly used approaches do not include the location of a site with respect to the transcription start site (TSS) in an integrated probabilistic framework while identifying sites. Ignoring these features can lead to inaccurate predictions as well as incorrect design and interpretation of experimental results.
Results:
We have developed a tool based on a Hidden Markov Model (HMM) that identifies binding location of transcription factors with preference for self-overlapping DNA motifs by combining the effects of their alternative binding modes. Interpreting HMM parameters as biophysical quantities, this method uses the occupancy probability of a transcription factor on a DNA sequence as the discriminant function, earning the algorithm the name OHMM: Occupancy via Hidden Markov Model. OHMM learns the classification threshold by training emission probabilities using unaligned sequences containing known sites and estimating transition probabilities to reflect site density in all promoters in a genome. While identifying sites, it adjusts parameters to model site density changing with the distance from the transcription start site. Moreover, it provides guidance for designing padding sequences in gel shift experiments. In the context of binding sites to transcription factor NF-kappaB, we find that the occupancy probability predicted by OHMM correlates well with the binding affinity in gel shift experiments. High evolutionary conservation scores and enrichment in experimentally verified regulated genes suggest that NF-kappaB binding sites predicted by our method are likely to be functional.
Conclusions:
Our method deals specifically with identifying locations with multiple overlapping binding sites by computing the local occupancy of the transcription factor. Moreover, considering OHMM as a biophysical model allows us to learn the classification threshold in a principled manner. Another feature of OHMM is that we allow transition probabilities to change with location relative to the TSS. OHMM could be used to predict physical occupancy, and provides guidance for proper design of gel-shift experiments. Based upon our predictions, new insights into NF-kappaB function and regulation and possible new biological roles of NF-kappaB were uncovered.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/208</link>
                <dc:creator>Amar Drawid</dc:creator>
                <dc:creator>Nupur Gupta</dc:creator>
                <dc:creator>Vijayalakshmi Nagaraj</dc:creator>
                <dc:creator>Celine Gelinas</dc:creator>
                <dc:creator>Anirvan Sengupta</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:208</dc:source>
        <dc:date>2009-07-07T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-208</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>208</prism:startingPage>
        <prism:publicationDate>2009-07-07T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/207">
        <title>Ensemble approach to predict specificity determinants: benchmarking and validation</title>
        <description>Background:
It is extremely important and challenging to identify the sites that are responsible for functional specification or diversification in protein families. In this study, a rigorous comparative benchmarking protocol was employed to provide a reliable evaluation of methods which predict the specificity determining sites. Subsequently, three best performing methods were applied to identify new potential specificity determining sites through ensemble approach and common agreement of their prediction results.
Results:
It was shown that the analysis of structural characteristics of predicted specificity determining sites might provide the means to validate their prediction accuracy. For example, we found that for smaller distances it holds true that the more reliable the prediction method is, the closer predicted specificity determining sites are to each other and to the ligand.
Conclusions:
We observed certain similarities of structural features between predicted and actual subsites which might point to their functional relevance. We speculate that majority of the identified potential specificity determining sites might be indirectly involved in specific interactions and could be ideal target for mutagenesis experiments.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/207</link>
                <dc:creator>Saikat Chakrabarti</dc:creator>
                <dc:creator>Anna Panchenko</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:207</dc:source>
        <dc:date>2009-07-02T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-207</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>207</prism:startingPage>
        <prism:publicationDate>2009-07-02T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/206">
        <title>Representative transcript sets for evaluating a translational initiation sites predictor</title>
        <description>Background:
Translational initiation site (TIS) prediction is a very important and actively studied topic in bioinformatics.  In order to complete a comparative analysis, it is desirable to have several benchmark data sets which can be used to test the effectiveness of different algorithms.  An ideal benchmark data set should be reliable, representative and readily available. Preferably, proteins encoded by members of the data set should also be representative of the protein population actually expressed in cellular specimens.
Results:
In this paper, we report a general algorithm for constructing a reliable sequence collection that only includes mRNA sequences whose corresponding protein products present an average profile of the general protein population of a given organism, with respect to three major structural parameters.  Four representative transcript collections, each derived from a model organism, have been obtained following the algorithm we propose. Evaluation of these data sets shows that they are reasonable representations of the spectrum of proteins obtained from cellular proteomic studies. Six state-of-the-art predictors have been used to test the usefulness of the construction algorithm that we proposed. Comparative study which reports the predictors&apos; performance on our data set as well as three other existing benchmark collections has demonstrated the actual merits of our data sets as benchmark testing collections.
Conclusions:
The proposed data set construction algorithm has demonstrated its property of being a general and widely applicable scheme. Our comparison with published proteomic studies has shown that the expression of our data set of transcripts generates a polypeptide population that is representative of that obtained from evaluation of biological specimens. Our data set thus represents &quot;real world&quot; transcripts that will allow more accurate evaluation of algorithms dedicated to identification of TISs, as well as other translational regulatory motifs within mRNA sequences. The algorithm proposed by us aims at compiling a redundancy-free data set by removing redundant copies of homologous proteins. The existence of such data sets may be useful for conducting statistical analyses of protein sequence-structure relations. At the current stage, our approach&apos;s focus is to obtain an &quot;average&quot; protein data set for any particular organism without posing much selection bias. However, with the three major protein structural parameters deeply integrated into the scheme, it would be a trivial task to extend the current method for obtaining a more selective protein data set, which may facilitate the study of some particular protein structure.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/206</link>
                <dc:creator>Jia Zeng</dc:creator>
                <dc:creator>Reda Alhajj</dc:creator>
                <dc:creator>Douglas Demetrick</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:206</dc:source>
        <dc:date>2009-07-02T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-206</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>206</prism:startingPage>
        <prism:publicationDate>2009-07-02T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/205">
        <title>Clique-based data mining for related genes in a biomedical database</title>
        <description>Background:
Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph.
Results:
We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called &apos;gene modules&apos;) by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using &apos;metabolic syndrome&apos;-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes.
Conclusions:
We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/205</link>
                <dc:creator>Tsutomu Matsunaga</dc:creator>
                <dc:creator>Chikara Yonemori</dc:creator>
                <dc:creator>Etsuji Tomita</dc:creator>
                <dc:creator>Masaaki Muramatsu</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:205</dc:source>
        <dc:date>2009-07-01T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-205</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>205</prism:startingPage>
        <prism:publicationDate>2009-07-01T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/204">
        <title>Comparison of sequence-dependent tiling array normalization approaches</title>
        <description>Background:
The detection of enriched DNA or RNA fragments by tiling microarrays has become more and more popular. These microarrays contain a high number of small probes covering genomic loci. However, to achieve high coverage the probe sequences cannot be selected for their hybridization properties. The affinity of the probes towards their targets varies in a sequence-dependent manner. In order to remove this bias a number of approaches have been developed and shown to increase the detection of enriched DNA or RNA fragments. However, these approaches also employ a peak detection algorithm that is different from the one used previously. Thus, it seems possible that the enhancement of detection is due to the peak detection algorithm rather than the sequence-dependent normalization.
Results:
We compared three different sequence-dependent probe level normalization procedures to a naive sequence-independent normalization technique. In order to achieve maximal comparability, we used the normalized intensity values as input to a single peak detection algorithm. A so-called &quot;spike-in&quot; data set served as benchmark for the performance. We will show that the sequence-dependent normalization procedures do not perform better than the naive approach, suggesting that the benefit of using these normalization approaches is limited. Furthermore, we will show that the naive approach does well, because it effectively removes the sequence-dependent component of the measured intensities with the help of the control hybridization experiment.
Conclusions:
Sequence-dependent normalization of microarray data hardly improves the detection of enriched DNA or RNA fragments. The &quot;success&quot; of the sequence-independent naive approach is only possible due to the control experiment and requires proper scaling of the measured intensities.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/204</link>
                <dc:creator>Ho-Ryun Chung</dc:creator>
                <dc:creator>Martin Vingron</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:204</dc:source>
        <dc:date>2009-06-30T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-204</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>204</prism:startingPage>
        <prism:publicationDate>2009-06-30T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.biomedcentral.com/1471-2105/10/203">
        <title>Integrated analysis of DNA copy number and gene expression microarray data using gene sets</title>
        <description>Background:
Genes that play an important role in tumorigenesis are expected to show association between DNA copy number and RNA expression. Optimal power to find such associations can only be achieved if analysing copy number and gene expression jointly. Furthermore, some copy number changes extend over larger chromosomal regions affecting the expression levels of multiple resident genes.
Results:
We propose to analyse copy number and expression array data using gene sets, rather than individual genes. The proposed model is robust and sensitive. We re-analysed two publicly available datasets as illustration. These two independent breast cancer datasets yielded similar patterns of association between gene dosage and gene expression levels, in spite of different platforms having been used. Our comparisons show a clear advantage to using sets of genes&apos; expressions to detect associations with long-spanning, low-amplitude copy number aberrations. In addition, our model allows for using additional explanatory variables and does not require mapping between copy number and expression probes.
Conclusions:
We developed a general and flexible tool for integration of multiple microarray data sets, and showed how the identification of genes whose expression is affected by copy number aberrations provides a powerful approach to prioritize putative targets for functional validation.</description>
        <link>http://www.biomedcentral.com/1471-2105/10/203</link>
                <dc:creator>Renee Menezes</dc:creator>
                <dc:creator>Marten Boetzer</dc:creator>
                <dc:creator>Melle Sieswerda</dc:creator>
                <dc:creator>Gert-Jan Ommen</dc:creator>
                <dc:creator>Judith Boer</dc:creator>
                <dc:source>BMC Bioinformatics 2009, 10:203</dc:source>
        <dc:date>2009-06-29T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1471-2105-10-203</dc:identifier>
        <prism:publicationName>BMC Bioinformatics</prism:publicationName>
        <prism:issn>1471-2105</prism:issn>
        <prism:volume>10</prism:volume>
        <prism:startingPage>203</prism:startingPage>
        <prism:publicationDate>2009-06-29T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <cc:License rdf:about="http://creativecommons.org/licenses/by/2.0/">
        <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
    </cc:License>
</rdf:RDF>
