Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

A comparison of common programming languages used in bioinformatics

Mathieu Fourment* and Michael R Gillings

BMC Bioinformatics 2008, 9:82  doi:10.1186/1471-2105-9-82

PubMed Commons is an experimental system of commenting on PubMed abstracts, introduced in October 2013. Comments are displayed on the abstract page, but during the initial closed pilot, only registered users can read or post comments. Any researcher who is listed as an author of an article indexed by PubMed is entitled to participate in the pilot. If you would like to participate and need an invitation, please email info@biomedcentral.com, giving the PubMed ID of an article on which you are an author. For more information, see the PubMed Commons FAQ.

Sub-divide comparisons to IO, computing, etc.

Zhang Zhang   (2008-09-01 11:11)  Yale University email

This paper made a valuable attempt to compare the performance of six programming languages used in bioinformatics. To get comparison results, three common cases in bioinformatics, Sellers, NJ and parsing blast, are used. I think that it would also be important to determine which language is better than others when considering different kinds of operations individually, such as, IO operations, computing operations (e.g, ML and MCMC), parsing sequences, etc. This may also guide us to choose more efficient language for a given bioinformatics programming task.

Competing interests

None declared

top

Regarding the use of the Python code

Peter Cock   (2008-02-18 17:42)  Biopython Project; University of Warwick

The entire trust of this paper is a comparison of the performance of the different languages, yet the skill level of the programmer in each language varies dramatically - surely confounding the whole exercise.

For example, the authors confess to being inexperienced in python, and it is clear from their code that they are beginners. For example, one of their observations:

"Perl clearly outperformed Python for I/O operations. Perl was three times as fast as Python when reading a FASTA file and needed half of the space to store the sequences in memory (Fig 4)."

The script concerned contains errors, for example attempting to removing trailing new line characters with line.rstrip('/n') rather than line.rstrip('\n')

More importantly, given their desire to look at performance metrics, is the way they have concatenated the sequences. The seq+=line idiom used is the most natural, but it is well known in the python community that concatenating a list of strings the using ''.join(str_list) is far more efficient.

It would appear that the reviewers of this manuscript were also python novices, or at least missed this point.

Finally as far as I can tell, the authors have not provided all the input files used for their benchmarks, making it difficult to verify their results.

Competing interests

I am a python programmer, and contribute to the Biopython project (mentioned but not used in this paper).

top

Post a comment