<?xml version = '1.0' encoding = 'UTF-8'?>
<?xml-stylesheet href="/rss/styledrssBMC.css" type="text/css"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:extra="http://www.biomedcentral.com/xml/schemas/extra/" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" xmlns:cc="http://web.resource.org/cc/">
	<channel rdf:about="http://www.biomedcentral.com/rss">
		<extra:info rdf:parseType="Literal">
			<html:div xmlns:html="http://www.w3.org/1999/xhtml" style="font:14px Verdana, Geneva, Arial, Helvetica, sans-serif">
				<html:span style="font-weight:bold">This is an RSS newsfeed from BioMed Central</html:span>
				<html:br/>
				<html:span style="font-size: 12px;">It is intended to be used with an RSS reader. For more information about RSS newsfeeds from BioMed Central, visit <html:br/><html:a href="http://www.biomedcentral.com/info/about/rss/" style="color:#3333CC; font-size:12px;">http://www.biomedcentral.com/info/about/rss/</html:a><html:br/>
				</html:span>
			</html:div>
		</extra:info>
		<title>BMC Bioinformatics - Latest articles</title>
		<link>http://www.biomedcentral.com/bmcbioinformatics/</link>
		<description>The latest articles from BMC Bioinformatics (ISSN 1471-2105) published by 
				
				BioMed Central
		</description>
        <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        <items>
            <rdf:Seq>
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/235"/>			    
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/234"/>			    
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/233"/>			    
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/232"/>			    
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/231"/>			    
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/230"/>			    
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/229"/>			    
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/228"/>			    
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/227"/>			    
            
				    <rdf:li rdf:resource="http://www.biomedcentral.com/1471-2105/9/226"/>			    
            
            </rdf:Seq>
        </items>
    </channel>  
    
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/235">
            
            <title>Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm</title>
			<description>Background:
Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also indicate the mechanisms and history of genome evolution in any ancestral lineage. Despite their abundance, universality and significance, studies of genomic repeat content have been largely limited to analyses of the repeats in fully sequenced genomes. 
Results:
In order to facilitate a broader range of repeat analyses, the Assisted Automated Assembler of Repeat Families algorithm has been developed. This program, written in PERL and with numerous adjustable parameters, identifies sequence overlaps in small shotgun sequence datasets and walks them out to create long pseudomolecules representing the most abundant repeats in any genome. Testing of this program in maize indicated that it found and assembled all of the major repeats in one or more pseudomolecules, including coverage of the major Long Terminal Repeat retrotransposon families. Both Sanger sequence and 454 datasets were appropriate.  
Conclusions:
These results now indicate that hundreds of higher eukaryotic genomes can be efficiently characterized for the nature, abundance and evolution of their major repetitive DNA components.</description>
			<link>http://www.biomedcentral.com/1471-2105/9/235</link>
			
			 	<dc:creator>Jeremy D DeBarry, Renyi Liu and Jeffrey L Bennetzen</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:235</dc:source>
			<dc:date>2008-05-13</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-235</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>235</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-13</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
	
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/234">
            
            <title>Comprehensive inventory of protein complexes in the Protein Data Bank from consistent classification of interfaces</title>
			<description>Background:
Protein-protein interactions are ubiquitous and essential for all cellular processes.  High-resolution X-ray crystallographic structures of protein complexes can reveal the details of their function and provide a basis for many computational and experimental approaches.  Differentiation between biological and non-biological contacts and reconstruction of the intact complex is a challenging computational problem.  A successful solution can provide additional insights into the fundamental principles of biological recognition and reduce errors in many algorithms and databases utilizing interaction information extracted from the Protein Data Bank (PDB).
Results:
We have developed a method for identifying protein complexes in the PDB X-ray structures by a four step procedure: (1) comprehensively collecting all protein-protein interfaces; (2) clustering similar protein-protein interfaces together; (3) estimating the probability that each cluster is relevant based on a diverse set of properties; and (4) combining these scores for each PDB entry in order to predict the complex structure.  The resulting clusters of biologically relevant interfaces provide a reliable catalog of evolutionary conserved protein-protein interactions.  These interfaces, as well as the predicted protein complexes, are available from the Protein Interface Server (PInS) website at http://pins.ornl.gov/.
Conclusions:
Our method demonstrates an almost two-fold reduction of the annotation error rate as evaluated on a large benchmark set of complexes validated from the literature. We also estimate relative contributions of each interface property to the accurate discrimination of biologically relevant interfaces and discuss possible directions for further improving the prediction method.</description>
			<link>http://www.biomedcentral.com/1471-2105/9/234</link>
			
			 	<dc:creator>Andrew J Bordner and Andrey A Gorin</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:234</dc:source>
			<dc:date>2008-05-12</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-234</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>234</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-12</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
	
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/233">
            
            <title>Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes </title>
			<description>Background:
Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes. 
Results:
We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I sigma 70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the alpha subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the sigma 70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions. 
Conclusions:
The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance.  </description>
			<link>http://www.biomedcentral.com/1471-2105/9/233</link>
			
			 	<dc:creator>Michael Dekhtyar, Amelie Morin and Vehary Sakanyan</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:233</dc:source>
			<dc:date>2008-05-09</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-233</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>233</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-09</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
	
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/232">
            
            <title>Bioinformatic analyses of mammalian 5'-UTR sequence properties of mRNAs predicts alternative translation initiation sites
</title>
			<description>Background:
Utilization of alternative initiation sites for protein translation directed by non-AUG codons in mammalian mRNAs is observed with increasing frequency.  Alternative initiation sites are utilized for the synthesis of important regulatory proteins that control distinct biological functions.  It is, therefore, of high significance to define the parameters that allow accurate bioinformatic prediction of alternative translation initiation sites (aTIS).  This study has investigated 5'-UTR regions of mRNAs to define consensus sequence properties and structural features that allow identification of alternative initiation sites for protein translation.  
Results:
Bioinformatic evaluation of 5'-UTR sequences of mammalian mRNAs was conducted for classification and identification of alternative translation initiation sites for a group of mRNA sequences that have been experimentally demonstrated to utilize alternative non-AUG initiation sites for protein translation.  These are represented by the codons CUG, GUG, UUG, AUA, and ACG for aTIS.  The first phase of this bioinformatic analysis implements a classification tree that evaluated 5'-UTRs for unique consensus sequence features near the initiation codon, characteristics of 5'-UTR nucleotide sequences, and secondary structural features in a decision tree that categorizes mRNAs into those with potential aTIS, and those without.  The second phase addresses identification of the aTIS codon and its location.  Critical parameters of 5'-UTRs were assessed by an Artificial Neural Network (ANN) for identification of the aTIS codon and its location.  ANNs have previously been used for the purpose of AUG start site prediction and are applicable in complex.  ANN analyses demonstrated that multiple properties were required for predicting aTIS codons; these properties included unique consensus nucleotide sequences at positions -7 and -6 combined with positions -3 and +4, 5'-UTR length, ORF length, predicted secondary structures, free energy features, upstream AUGs, and G/C ratio.  Importantly, combined results of the classification tree and the ANN analyses provided highly accurate bioinformatic predictions of alternative translation initiation sites.
Conclusions:
This study has defined the unique properties of 5'-UTR sequences of mRNAs for successful bioinformatic prediction of alternative initiation sites utilized in protein translation.  The ability to define aTIS through the described bioinformatic analyses can be of high importance for genomic analyses to provide full predictions of translated mammalian and human gene products required for cellular functions in health and disease.</description>
			<link>http://www.biomedcentral.com/1471-2105/9/232</link>
			
			 	<dc:creator>Jill L Wegrzyn, Thomas M Drudge, Faramarz Valafar and Vivian Hook</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:232</dc:source>
			<dc:date>2008-05-08</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-232</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>232</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-08</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
	
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/231">
            
            <title>Automating dChip: toward reproducible sharing of microarray data analysis</title>
			<description>Background:
During the past decade, many software packages have been developed for analysis and visualization of various types of microarrays. We have developed and maintained the widely used dChip as a microarray analysis software package accessible to both biologist and data analysts. However, challenges arise when dChip users want to analyze large number of arrays automatically and share data analysis procedures and parameters. Improvement is also needed when the dChip user support team tries to identify the causes of reported analysis errors or bugs from users. 
Results:
We report here implementation and application of the dChip automation module. Through this module, dChip automation files can be created to include menu steps, parameters, and data viewpoints to run automatically. A data-packaging function allows convenient transfer from one user to another of the dChip software, microarray data, and analysis procedures, so that the second user can reproduce the entire analysis session of the first user. An analysis report file can also be generated during an automated run, including analysis logs, user comments, and viewpoint screenshots. 
Conclusions:
The dChip automation module is a step toward reproducible research, and it can prompt a more convenient and reproducible mechanism for sharing microarray software, data, and analysis procedures and results. Automation data packages can also be used as publication supplements. Similar automation mechanisms could be valuable to the research community if implemented in other genomics and bioinformatics software packages.  </description>
			<link>http://www.biomedcentral.com/1471-2105/9/231</link>
			
			 	<dc:creator>Cheng Li</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:231</dc:source>
			<dc:date>2008-05-08</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-231</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>231</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-08</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
	
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/230">
            
            <title>CPSP-tools - exact and complete algorithms for high-throughput 3D lattice protein studies</title>
			<description>Background:
The principles of protein folding and evolution pose 
problems of very high inherent complexity. 
Often these problems are tackled using simplified protein models,
e.g. lattice proteins.
The CPSP-tools package provides programs to solve exactly and completely
the problems typical of studies using 3D lattice protein models. 
Among the tasks addressed are the prediction of (all) globally optimal
and/or suboptimal structures as well as sequence design and neutral 
network exploration.
Results:
In contrast to stochastic approaches, which are
not capable of answering many fundamental questions, our methods
are based on fast, non-heuristic techniques.
The  resulting tools are designed for high-throughput studies of
3D-lattice proteins utilising the Hydrophobic-Polar (HP) model.
The source bundle is freely available at
http://www.bioinf.uni-freiburg.de/sw/cpsp/
Conclusions:
The CPSP-tools package is the first set
of exact and complete methods for 
extensive, high-throughput studies of non-restricted 3D-lattice
protein models. In particular, our package deals with cubic and
face centered cubic (FCC) lattices.</description>
			<link>http://www.biomedcentral.com/1471-2105/9/230</link>
			
			 	<dc:creator>Martin Mann, Sebastian Will and Rolf Backofen</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:230</dc:source>
			<dc:date>2008-05-07</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-230</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>230</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-07</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
	
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/229">
            
            <title>A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences</title>
			<description>Background:
The structure of many eukaryotic cell regulatory proteins is highly modular.
They are assembled from globular domains, segments of natively disordered polypeptides and short linear motifs.
The latter are involved in protein interactions and formation of regulatory complexes.
The function of such proteins, which may be difficult to define, is the aggregate of the subfunctions of the modules.
It is therefore desirable to efficiently predict linear motifs with some degree of accuracy, yet sequence database searches return results that are not significant.
Results:
We have developed a method for scoring the conservation of linear motif instances.
It requires only primary sequence-derived information (e.g. multiple alignment and sequence tree) and takes into account the degenerate nature of linear motif patterns.
On our benchmarking, the method accurately scores  86% of the known positive instances, while distinguishing them from random matches in 78% of the cases.
The conservation score is implemented as a real time application designed to be integrated into other tools.
It is currently accessible via a Web Service or through a graphical interface.
Conclusions:
The conservation score improves the prediction of linear motifs, by discarding those matches that are unlikely to be functional because they have not been conserved during the evolution of the protein sequences.
It is especially useful for instances in non-structured regions of the proteins, where a domain masking filtering strategy is not applicable.</description>
			<link>http://www.biomedcentral.com/1471-2105/9/229</link>
			
			 	<dc:creator>Claudia Chica, Alberto Labarga, Cathryn M Gould, Rodrigo Lopez and Toby J Gibson</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:229</dc:source>
			<dc:date>2008-05-06</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-229</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>229</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-06</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
	
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/228">
            
            <title>Inferring the role of transcription factors in regulatory networks</title>
			<description>Background:
Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays.
Results:
We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of E. coli extracted from the
literature (1529 nodes and 3802 edges), and we estimate that 30% of the regulations can be annotated with about 30 profiles. We
additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to S. cerevisiae transcriptional network (2419 nodes and 4344 interactions), by combining ChIP-chip data and 15 expression profiles . We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions). In addition, we report predictions for 14.5% of all interactions.
Conclusions:
Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data,
real and artificial, that a relatively small  number of perturbation experiments are enough to determine a significant portion of regulatory effects. This is a key practical asset compared to statistical methods for network reconstruction. We demonstrate that our approach is able to provide accurate predictions, even when the network is incomplete and the data is noisy.</description>
			<link>http://www.biomedcentral.com/1471-2105/9/228</link>
			
			 	<dc:creator>Philippe Veber, Carito Guziolowski, Michel Le Borgne, Ovidiu Radulescu and Anne Siegel</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:228</dc:source>
			<dc:date>2008-05-06</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-228</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>228</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-06</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
	
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/227">
            
            <title>The pairwise disconnectivity index as a new metric for the topological analysis of regulatory networks</title>
			<description>Background:
Currently, there is a gap between purely theoretical studies of the topology of large bioregulatory networks and the practical traditions and interests of experimentalists. While the theoretical approaches emphasize the global characterization of regulatory systems, the practical approaches focus on the role of distinct molecules and genes in regulation. To bridge the gap between these opposite approaches, one needs to combine 'general' with 'particular' properties and translate abstract topological features of large systems into testable functional characteristics of individual components. Here, we propose a new topological parameter - the pairwise disconnectivity index of a network's element - that is capable of such bridging.
Results:
The pairwise disconnectivity index quantifies how crucial an individual element is for sustaining the communication ability between connected pairs of vertices in a network that is displayed as a directed graph. Such an element might be a vertex (i.e., molecules, genes), an edge (i.e., reactions, interactions), as well as a group of vertices and/or edges. The index can be viewed as a measure of topological redundancy of regulatory paths which connect different parts of a given network and as a measure of sensitivity (robustness) of this network to the presence (absence) of each individual element. Accordingly, we introduce the notion of a path-degree of a vertex in terms of its corresponding incoming, outgoing and mediated paths, respectively. The pairwise disconnectivity index has been applied to the analysis of several regulatory networks from various organisms. The importance of an individual vertex or edge for the coherence of the network is determined by the particular position of the given element in the whole network. 
Conclusions:
Our approach enables to evaluate the effect of removing each element (i.e., vertex, edge, or their combinations) from a network. The greatest potential value of this approach is its ability to systematically analyze the role of every element, as well as groups of elements, in a regulatory network. </description>
			<link>http://www.biomedcentral.com/1471-2105/9/227</link>
			
			 	<dc:creator>Anatolij P. Potapov, Bjorn Goemann and Edgar Wingender</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:227</dc:source>
			<dc:date>2008-05-02</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-227</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>227</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-02</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
	
		<item rdf:about="http://www.biomedcentral.com/1471-2105/9/226">
            
            <title>SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences</title>
			<description>Background:
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction.
Results:
SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. 
Conclusions:
The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.</description>
			<link>http://www.biomedcentral.com/1471-2105/9/226</link>
			
			 	<dc:creator>Lukasz Kurgan, Krzysztof Cios and Ke Chen</dc:creator>
			
			<dc:source>BMC Bioinformatics 2008, 9:226</dc:source>
			<dc:date>2008-05-01</dc:date>
			<dc:identifier>doi:10.1186/1471-2105-9-226</dc:identifier>
			
			
							
					<prism:publicationName>BMC Bioinformatics</prism:publicationName>
					
			
							
					<prism:issn>1471-2105</prism:issn>
					
			
							
					<prism:volume>9</prism:volume>
					
			
							
					<prism:startingPage>226</prism:startingPage>
					
			
							
					<prism:publicationDate>2008-05-01</prism:publicationDate>
					

            <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/"/>
        </item>
		
    <cc:License rdf:about="http://creativecommons.org/licenses/by/2.0/">
         <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction"/>
         <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution"/>
         <cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks"/>
	</cc:License>
</rdf:RDF>
