Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the 12th Annual UT-ORNL-KBRIN Bioinformatics Summit 2013

Open Access Meeting abstract

Subgroup and outlier detection analysis

Gang Wu1, Iwona Pawlikowska2, Tanja Gruber3, James Downing4, Jinghui Zhang1 and Stan Pounds2*

Author Affiliations

1 Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA

2 Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA

3 Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA

4 Department of Pathology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 17):A2  doi:10.1186/1471-2105-14-S17-A2


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/14/S17/A2


Published:22 October 2013

© 2013 Wu et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background

High-dimensional biological data presents the opportunity to discover novel forms of biological heterogeneity, such as overexpression or suppression of expression of a particular gene in a subset of a cohort. This novel biological heterogeneity appears in the data as outliers or distinct subgroups. Here, we describe and evaluate three procedures for subgroup and outlier detection analysis (SODA): a leave-one-out (LOO) procedure that is widely used for outlier detection in the bioinformatics literature, the least median squares (LMS) procedure from the statistics literature, and the dip test (DT) from the statistics literature. We also propose and evaluate the max spacing test (MST) as a novel SODA method.

Results

In simulation studies, we found that LMS, DT, and MST are each the best method in specific settings. In an example analysis, we found that LMS and MST effectively identified confirmed fusion genes as outliers and DT and MST effectively identified genes that distinguish between two confirmed subtypes of pediatric acute megakaryoblastic leukemia. We conclude that LMS, DT, and MST are robust and complimentary methods for SODA.

Acknowledgements

We gratefully acknowledge funding from ALSAC which raises funds for St. Jude.