Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Symposium of Computations in Bioinformatics and Bioscience (SCBB06)

Open Access Open Badges Research

SEQOPTICS: a protein sequence clustering system

Yonghui Chen1*, Kevin D Reilly1, Alan P Sprague1 and Zhijie Guan2

Author Affiliations

1 Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL 35294-1170, USA

2 San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093-0505, USA

For all author emails, please log on.

BMC Bioinformatics 2006, 7(Suppl 4):S10  doi:10.1186/1471-2105-7-S4-S10

Published: 12 December 2006



Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. In most cases single linkage or graph-based clustering algorithms have been applied. OPTICS (Ordering Points To Identify the Clustering Structure) is an attractive approach due to its emphasis on visualization of results and support for interactive work, e.g., in choosing parameters. However, OPTICS has not been used, as far as we know, for protein sequence clustering.


In this paper, a system of clustering proteins, SEQOPTICS (SEQuence clustering with OPTICS) is demonstrated. The system is implemented with Smith-Waterman as protein distance measurement and OPTICS at its core to perform protein sequence clustering. SEQOPTICS is tested with four data sets from different data sources. Visualization of the sequence clustering structure is demonstrated as well.


The system was evaluated by comparison with other existing methods. Analysis of the results demonstrates that SEQOPTICS performs better based on some evaluation criteria including Jaccard coefficient, Precision, and Recall. It is a promising protein sequence clustering method with future possible improvement on parallel computing and other protein distance measurements.