Abstract
Background
The literature presents many different algorithms for classifying heartbeats from ECG signals. The performance of the classifier is normally presented in terms of sensitivity, specificity or other metrics describing the proportion of correct versus incorrect beat classifications. From the clinician's point of view, such metrics are however insufficient to rate the performance of a classifier.
Methods
We propose a new methodology for the presentation of classifier performance, based on Bayesian classification theory. Our proposition lets the investigators report their findings in terms of beatbybeat comparisons, and defers the role of assessing the utility of the classifier to the statistician. Evaluation of the classifier's utility must be undertaken in conjunction with the set of relative costs applicable to the clinicians' application. Such evaluation produces a metric more tuned to the specific application, whilst preserving the information in the results.
Results
By way of demonstration, we propose a set of costs, based on clinical data from the literature, and examine the results of two published classifiers using our method. We make recommendations for reporting classifier performance, such that this method can be used for subsequent evaluation.
Conclusion
The proportion of misclassified beats contains insufficient information to fully evaluate a classifier. Performance reports should include a table of beatbybeat comparisons, showing notonly the number of misclassifications, but also the identity of the classes involved in each inaccurate classification.
Background
In recent years there has been a surge of interest in computer implementations of automatic beat classification algorithms. The impetus for this research stems partly from the advances and miniaturisation of electronics, which allows portable, wearable and implantable devices to perform greater functionality than was achievable in the past, but also from the desire to automate tasks currently performed by intensive care and operating room staff.
There are several principles on which classifiers operate, and many variations and implementations of each. With the large number of published algorithms comes the need to analyse and compare their performance. The literature to date, compares techniques on a low level, parameterbyparameter basis [13]. ANSI standard EC57 [4] attempts to formalise methods for reporting such comparisons. These comparisons are of interest to those working on the development of new algorithms or the enhancement of existing ones, but are of little interest to a clinician when making a decision about which algorithm suits his purpose.
Problems with current classification methods
The contemporary method for reporting the performance of a beat classification algorithm involves beatbybeat comparisons between the class as indicated by the algorithm and as indicated by some reference. The MITBIH database [5] is a commonly used reference source. Performance is reported either by a table giving counts of correctly and incorrectly classified beats, or by way of statistics inferred from such a table. Common statistics are sensitivity (Se), specificity (Sp) and positive predictivity (+P). These are defined as:
where:
TP = number of true positives
FP = number of false positives
TN = number of true negatives
FN = number of false negatives.
However these values are defined only for binary classification, and do not readily lend themselves to problems involving more than two classes, {ω_{1}, ω_{2 }... ω_{n}}. Nevertheless, in the literature, one often sees beat classifier performance reports where sensitivity and specificity are freely quoted. Whilst the definitions of these measures for the context are normally not given, they appear to use the following extended definitions:
TP_{j }= number of beats correctly identified as belonging to class ω_{j}
FP_{j }= number of beats incorrectly identified as belonging to class ω_{j}
TN_{j }= number of beats correctly identified as belonging to a class other than ω_{j}
FN_{j }= number of beats incorrectly identified as belonging to a class other than ω_{j}
and hence Se_{j}, Sp_{j }and +P_{j }can be defined accordingly.
A number of problems become apparent when using such statistics to evaluate the performance of beat classifiers:
1. They do not take into account the a priori probabilities of the beat classes.
2. They do not take into account the relative costs of false classification.
3. They can be presented only as a multidimensional value, even where only two classes are being considered. There is no obvious single ordinal value.
Problem 1 has been recognised in the medical literature [6]. We are not aware of any previous attempt to deal with problem 2. Problem 3 makes these reports particularly unhelpful from the point of view of the clinician trying to compare systems with a view to adopting one for use. For an n class classifier, there are 2n scalar quantities, so ranking classifiers using these quantities is not possible.
We propose a new method, which overcomes these problems and aims to be generally useful for the quantitative comparison of beat classification schemes.
Proposed methodology
A system's utility as a prognostic medical tool is a measure of the benefit afforded by selecting it against other alternatives. Choosing a system involves maximising the benefit, or alternatively, minimising the risk. A measure for the overall risk associated with making a decision based upon the output of a beat classifier is a useful measure of its performance. Risk is characterised by the probability of error and the costs associated with making a decision based upon the erroneous classification. We have used Bayesian decision theory to determine a method of calculating the risk associated with a beat classifier.
Introduction to Bayesian risk
Bayesian decision theory is presented in many texts on statistics and classification theory [7,8] and will be introduced here only briefly. In a system which is claimed to recognise n different classes of beats {ω_{1}, ω_{2 }...ω_{n}}, there are n possible outputs, {α_{1}, α_{2 }... α_{n}}. Bayes rule states:
Since {α_{1}, α_{2 }... α_{n}} are mutually exclusive and then
The quantity P(ω_{j}) is called the a priori probability, P(α_{k}ω_{j}) the likelihood or class conditional probability and P(ω_{j}α_{k}) the a posteriori probability. Note that P(α_{j}ω_{j}) ≡ Se_{j }. From (1) and (2) we can write:
Let λ(α_{k}ω_{j}) be the cost incurred for making decision α_{k }when ω_{j }is the true beat class. Therefore, the risk of making decision α_{k }based upon the classifier's output (the risk of reliance) is:
Combining the above gives
The overall risk of relying on a classifier is
or equivalently
We propose that {R(α_{1}), R(α_{2}) ... R(α_{n})} be used in the consideration of a classifier's utility, and that be used for overall rating of classifiers. has the range (0, ∞) and its units are dollars (or whatever units have been chosen for λ(α_{k}ω_{j})). We envisage that a unitless measure, having the range (0, 1) is more useful in many circumstances. Accordingly, we also propose a normalised metric:
where is the value obtained from equation (5) when the class conditional probabilities are set to
Thus, a perfect classifier has a value of zero and, at the opposite extreme, unity.
Results
ANSI EC57 section 4.3 identifies 5 classes of beats which are recommended in performance reports, viz: normal beats, SupraVentricular Ectopic Beats, Ventricular Ectopic Beats, fusions of normal and Ventricular Ectopic Beats and other unclassified beats. From secondary data sources, we derived a priori probabilities and costs of decisions for these classes. A detailed description of the derivation and the data sources are below.
Table 1 shows the a priori probabilities. Table 2 shows the costs. Together with the values of P(α_{k}ω_{j}), these tables enable the risk to be calculated. Unfortunately, in many cases the literature presents neither the values for P(α_{k}ω_{j}), nor the table of beatbybeat comparisons from which they could be deduced. In the literature, we were able to find only two classifiers for which this data was reported. These are the classifiers of de Chazal et al. [9] and of Melo et al. [10]. Melo et al. publishes separate results for aberrated atrial premature beats. For the purposes of comparison, we have regarded aberrated and nonaberrated atrial premature beats as a single class (ω_{SVEB}).
Table 1. A priori probabilities derived from the MITBIH Arrhythmia Database.
Table 2. Costs of false classification in 1000s of AUD
The results are shown in Table 3. In these results, the overall risk is significantly lower for the Melo classifier, and the risks of reliance R(α_{i}) is also lower for all i. In other words, this classifier dominates in all respects. In general however this may not be the case, and one classifier may have a lower risk of reliance for one decision whilst having a higher risk of reliance for another.
Table 3. Comparative performance of two classifiers
Discussion
We do not presume that the costs presented herein, or the propositions used in their calculation are universally applicable. Rather, we seek to demonstrate how, given the class conditional matrix, an ordinal measure for a beat classifier may be determined, applicable to any particular situation. Others might disagree with our cost calculation methods, or the application might demand consideration for classes other than those we have investigated. Whilst we have used monetary units to measure costs, we recognise the ethical issues raised by doing so, and our methodology imposes no requirement on the nature of the units of cost. Any unit acceptable to the community of interest may be used. In such cases, given the class conditional matrix, potential users may conduct their own studies and assess the performance of the classifier using the method we have described.
To facilitate such studies, we urge biomedical engineers to report more than the relative numbers of correct versus incorrectly classified beats, but also the identity of the misclassified beats in the form of a class conditional matrix. ANSI EC57 describes how to compile such a matrix, but makes no recommendation for its publication. It is trivial to calculate sensitivity and specificity from such a matrix if desired, and allows for more useful measures of performance as described herein. We recommend publication of the class conditional matrix and/or a table of beatbybeat comparisons.
Conclusion
The utility of a beat classifier cannot be fully quantified in terms of the number of correct and incorrect beats. Instead, the number of misclassifications for each class is required. Together with the a priori probabilities and the costs of misclassification, quantitative measures of a classifier's utility can be determined.
A system which claims to classify beats into more than two classes is not a binary classifier, and performance should not be reported as if it is. Instead of reporting sensitivity/specificity/predictivity for each of the n classes, a n × n matrix of beat classifications (the class conditional frequencies) should be reported.
Clinicians wishing to assess a classifier need to obtain estimates for the costs of misclassification, and calculate the overall risk of reliance.
Methods
Equation (5) comprises the terms P(α_{k}ω_{i}), P(ω_{j}) and λ(α_{k}ω_{j}). P(α_{k}ω_{i}) are parameters of the classifier and can be tested experimentally. P(ω_{j}) and λ(α_{k}ω_{j}) are parameters of the classes of interest. They are respectively the a priori probabilities and the costs of making decisions. In this section we examine a number of secondary sources to determine values for P(ω_{j}) and λ(α_{k}ω_{j}).
A priori probabilities
We used the MITBIH Arrhythmia Database to extract a priori probabilities. The records chosen were the first group of records (numbers 100–124) from the database. We omitted the second group (numbers 200–234), since these were deliberately selected by the authors of the database to contain "rare but clinically important phenomena", whereas the first group was randomly selected so as to "serve as a representative sample of the variety of waveforms and artifact that an arrhythmia detector might encounter in routine clinical use".
Table 1 shows the data extracted from the database. Class 0 was disregarded, since these annotations are not beats, but are used to mark other interesting features in the signal. c_{j }are the counts of beats of class ω_{j}. P(ω_{j}) was calculated by dividing c_{j }by , Note that c_{5 }= 0 and we therefore conclude that beat classes other than 1–4 are sufficiently rare to have negligible effect on the utility of a system.
Extrapolation of data
Where possible, we referred to longitudinal studies, giving data gathered over a period of 10 years or longer. In many instances, the data was available only in graphical form in which case trapezoidal approximation of integrals was used. Where data over a 10 year period was not available, we used data gathered over a shorter period and extrapolated by the following method.
We presume survival to be described by an exponential expression
where κ and β are constants for which β < 0 and 0 <κ ≤ 1. Equation (10) implies that the mortality fraction m is
Integrating m over 10 years gives the total loss:
which can be expressed as
where X = ln κ. Thus equation (10) implies
Hence, X and β can be found from the data by linear regression of t against ln(s), and the total expected mortality over 10 years from equation (13).
Costs of incorrect classification
In determining costs for incorrect decisions, we have presumed standard clinical treatment of abnormal beats, or nontreatment of normal beats according to the system's output. We have then investigated the costs, in monetary terms, of taking that course of action under each state of nature. We have endeavoured to report 'costs' as the general cost to society, rather than the cost to any particular entity. A summary of these figures is presented in Table 2. All costs have been normalised the year 2006, and are in Australian dollars except where otherwise noted.
Each application however may have different ancillary parameters, and these may affect the costs involved. The methods and figures provided in this study reflect the most general situation, as best as we could determine. Propositions we have made in calculations of costs we have stated herein, and these should be examined when applying the figures.
Proposition 1 The clinical treatment for fusions of normal and Ventricular Ectopic Beats (class ω_{F}) is identical to that for Ventricular Ectopic Beats.
We make this proposition on the basis that sustained Ventricular Ectopic Beats are potentially life threatening, and must be treated. To a clinician, the fact that the polarisation of the waveform coincides with the preceding beat is merely incidental.
Proposition 2 The maximum future projection which may affect costs of misclassification is 10 years.
10 years is chosen as a reasonable period beyond which the advances in medical technology can be expected to invalidate the results of future prediction.
Datum 1 The expected loss of life of a healthy subject, projected over the next 10 years, is 1.21 years.
This datum is from the control group of Benjamin et al. [11]. This was a longitudinal study which investigated the mortality of subjects who had developed Atrial Fibrillation. We integrated the survival results presented by Figure A of that paper to determine the expected number of years of life lost by a healthy subject.
Datum 2 The probability that a subject with SupraVentricular Ectopic Beats will develop Atrial Fibrillation is 0.324.
Datum 2 comes from the results of Frost et al. [12].
Datum 3 The expected loss of life due to a person suffering from Atrial Fibrillation is 2.69 years (projected over 10 years).
Datum 3 is determined from Benjamin et al. in a similar fashion to Datum 1: The calculations are presented in Table 4.
Table 4. Expected Loss of life due to Atrial Fibrillation
Datum 4 A person's contribution to society is $44,320 per annum.
Datum 4 is the mean average wage in Australia for the year 2006[13].
Datum 5 The cost of misdiagnosing a SupraVentricular Ectopic Beat as a normal beat, λ(α_{N}ω_{SVEB}) is $38,627.
This figure is the product of Data 2, 3 and 4.
Datum 6 The probability of initially surviving Ventricular Fibrillation is .
From the text of Baum et al.[14], we know that, in a 3 year study of Ventricular Fibrillation cases, 146 patients out of 886 initially survived Ventricular Fibrillation.
Datum 7 The total expected loss of life by a person who initially survives Ventricular Fibrillation is 6.33 years (projected over 10 years).
To derive Datum 7 we used the results of Baum et al.[14]. In Figure 2 of that paper, survival curves are presented for subjects who initially survived Ventricular Fibrillation. Survival data are presented for only 24 months. We extrapolated survival data over a 10 year period by the method described above.
Datum 8 The expected loss of life, attributable to Ventricular Fibrillation, by a subject who suffers Ventricular Fibrillation is 8.19 years (projected over 10 years).
We know from Datum 3 that the expected loss of life of healthy subjects is 1.21 years. Hence we can calculate the loss due to Ventricular Fibrillation as per Table 5.
Table 5. Expected Loss of life due to Ventricular Fibrillation
Datum 9 The expected loss of life, attributable to Ventricular Tachycardia by a subject who suffers Ventricular Tachycardia is 2.59 years (projected over 10 years).
This datum was obtained from Doval et al.[15] by the data extrapolation method described previously. In that study of 516 subjects both with and without nonsustained Ventricular Tachycardia, the projected loss over 10 years for the group with Ventricular Tachycardia was 9.24 years, whereas the projected loss for the group without Ventricular Tachycardia was 6.65 years. The difference is 2.59 years.
Datum 10 The probability that a subject who experiences one or more episodes of Ventricular Ectopic Beats, will develop Ventricular Fibrillation is 0.34.
Datum 11 The probability that a subject who experiences one or more episodes of Ventricular Ectopic Beats, will develop Ventricular Tachycardia is 0.41.
Datum 10 and Datum 11 are implied from Carrim and Khan [16]. In that study of 44 subjects exhibiting Ventricular Ectopic Beats, V T is the set of subjects developing Ventricular Tachycardia and V F is the set of subjects developing Ventricular Fibrillation. From their data, we are given:
and
from which we can deduce that V T ∩ V F = 12, V T = 18 and V F = 15.
Thus,
and
Datum 12 The expected loss of life, when a Ventricular Ectopic Beat is misdiagnosed as a normal beat is 3.84 years.
Datum 12 is derived from Data 10, 11, 8 and 9. The derivation is shown in Table 6.
Table 6. Expected loss of life due to Ventricular Ectopic Beats
Datum 13 The cost of misdiagnosing a Ventricular Ectopic Beat as a normal beat, λ(α_{N}ω_{VEB}) is $170,189.
This is the product of Datum 12 and Datum 4. By Proposition 1 we attribute the same cost to λ(α_{N}ω_{F}).
Costs of misclassification of a normal beat as abnormal
Ventricular Ectopic Beats are of potential concern to a physician. If a system misclassifies a normal beat (ω_{N}), or a SupraVentricular Ectopic Beat (ω_{SVEB}) as Ventricular Ectopic Beat or a fusion beat (decisions α_{VEB }and α_{F}), the likely result is that the patient will be unnecessarily detained in observation, pending further examination. Thus, the cost of misclassification, in this case, is the cost of retaining a patient in intensive care for one day.
Datum 14 The cost of retaining a patient in intensive care for 1 day is $2146
This datum is the mean average of figures obtained from two independent Australian health insurance providers. A case study by Rechner and Lipman[17] for the year 2003 cites the figure $2670. This study however was conducted within a teaching hospital, and we therefore expect the costs to be higher than average.
We use the cost of retaining a patient in intensive care for one day as the cost of misclassification of a normal beat as the penalty in such instances. Thus λ(α_{VEB}ω_{N}) = λ(α_{F}ω_{N}) = λ(α_{VEB}ω_{SVEB}) = λ(α_{F}ω_{SVEB}) = $2146.
From Proposition 1 we conclude that λ(λ(α_{F}ω_{VEB}) = λ(α_{VEB}ω_{F}) = 0.
Erroneous classification of beats as SupraVentricular Ectopic Beats
Unlike Ventricular Ectopic Beats, SupraVentricular Ectopic Beats are typically not life threatening, and are therefore not normally treated unless recurrent [18]. Thus, the cost of misclassification as a normal beat (λ(α_{SVEB}ω_{N})) is zero. By the same token, misclassification of Ventricular Ectopic Beats or fusion beats as SupraVentricular Ectopic Beats bears the same penalty as an erroneous classification as a normal beat. Thus λ(α_{SVEB}ω_{VEB}) = λ(α_{N}ω_{VEB}) and λ(α_{SVEB}ω_{F}) = λ(α_{N}ω_{F}).
Upper bound of risk
From Tables 1 and 2, the value of was calculated as described in section as $52,817.
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
JMD conceived the study, participated in the acquisition of data, and drafted the manuscript. LH assisted with the design of the study, made substantial contribution to the acquisition of secondary data and the interpretation of data. Both authors read and approved the final manuscript.
Acknowledgements
The authors would like to express their thanks to Amitava Datta for his advice during this study.
References

Christov I, Bortolan G: Ranking of pattern recognition parameters for premature ventricular contractions classification by neural networks.
Physiological Measurement 2004, 25:12811290. PubMed Abstract  Publisher Full Text

de Chazal P, Reilly RB: A Comparison of the ECG Classification Performance of Different Feature Sets.

Christov I, GómezHerrero G, Krasteva V, Jekova I, Gotchev A, Egiazarian K: Comparative study of morphological and timefrequency ECG descriptors for heartbeat classification.
Medical Engineering and Physics 2006, 28(9):876887. Publisher Full Text

Testing and reporting performance results of cardiac rhythm and STsegment measuring algorithms Arlington, VA, USA; 1988.
Published as American National Standard ANSI/AAMI EC57:1988

Schluter P, Peterson S, Moody G, Siegal L, Jackson C, Perry D, Acarturk E, Aumiller J, Blake S, Blaustein A, Conrad C, Heller G, Malagold M, Mark R, Miklozek C: MITBIH Arrhythmia Database Directory. [http://www.physionet.org/physiobank/database/html/mitdbdir/mitdbdir.htm] webcite

Altman DG, Bland JM: Statistics Notes: Diagnostic tests 2: predictive values.
British Medical Journal 1994, 309(6947):102. PubMed Abstract  Publisher Full Text

Schlesinger MI, Hlaváč V: Ten Lectures on Statistical and Structural Pattern Recognition. Volume 24. Kluwer Academic Publishers; 2002::122.

Duda RO, Hart PE: Pattern Classification and Scene Analysis. 1st edition. Kluwer Academic Publishers; 1973::1039.

de Chazal P, O'Dwyer M, Reilly RB: Automatic Classification of Heartbeats Using ECG Morphology and Heartbeat Interval Features.
IEEE Transactions on Biomedical Engineering 2004, 51(7):11961206. PubMed Abstract  Publisher Full Text

Melo SL, Calôba LP, Nadal J: Arrhythmia Analysis Using Artificial Neural Network and Decimated Electrocardiographic Data.

Benjamin EJ, Wolf PA, D'Agostino RB, Silbershatz H, Kannel WB, Levy D: Impact of Atrial Fibrillation on the Risk of Death: The Framingham Heart Study.
Circulation 1998, 98:946952. PubMed Abstract  Publisher Full Text

Frost LHM, Christiansen EH, Jacobsen CJ, Allermand H, Thomsen PEB: Low vagal tone and supraventricular ectopic activity predict atrial fibrillation and flutter after coronary artery bypass grafting.
European Heart Journal 1995, 16:825831. PubMed Abstract  Publisher Full Text

Australian Bureau of Statistics: 6306.0 – Employee Earnings and Hours, Australia, May 2006. [http:/ / www.abs.gov.au/ AUSSTATS/ abs@.nsf/ ProductsbyCatalogue/ 27641437D6780D1FCA2568A9001393DF?Op enDocument#] webcite
Belconnen, ACT Australia; 2616.

Baum RS, III HA, Cobb LA: Survival after Resuscitation from OutofHospital Ventricular Fibrillation.
Circulation 1974, 50:12311235. PubMed Abstract

Doval H, Nul D, Grancelli H, Varini S, Soifer S, Corrado G, Dubner S, Scapin O, Perrone S: Nonsustained Ventricular Tachycardia in Severe Heart Failure: Independent Marker of Increased Mortality due to Sudden Death.
Circulation 1996, 94(12):31983203. PubMed Abstract  Publisher Full Text

Carrim ZI, Khan AA: Mean Frequency of Premature Ventricular Complexes and Predictor of Malignant Ventricular Arrythmias.

Rechner IJ, Lipman J: The costs of caring for patients in a tertiary refereral Australian Intensive Care Unit.

Lundqvist CB: ACC/AHA/ESC Guidelines for the Management of Patients with Supraventricular Arrhythmias – executive summary a report of the American college of cardiology/American heart association task force on practice guidelines and the European society of cardiology committee for practice guidelines (writing committee to develop guidelines for the management of patients with supraventricular arrhythmias).
Journal of the American College of Cardiology 2003, 42(8):14931531.
Developed in Collaboration with NASPEHeart Rhythm Society
PubMed Abstract  Publisher Full Text
Prepublication history
The prepublication history for this paper can be accessed here: