Email updates

Keep up to date with the latest news and content from BMC Musculoskeletal Disorders and BioMed Central.

Open Access Research article

Hip fracture risk assessment: artificial neural network outperforms conditional logistic regression in an age- and sex-matched case control study

Wo-Jan Tseng1, Li-Wei Hung2, Jiann-Shing Shieh3, Maysam F Abbod4 and Jinn Lin2*

Author Affiliations

1 Department of Orthopaedic Surgery, National Taiwan University Hospital Hsin-Chu Branch, No.25, Ln. 442, Sec. 1, Jingguo Rd., East Dist., 300, Hsinchu, Taiwan

2 Department of Orthopaedic Surgery, National Taiwan University Hospital, No.7, Zhongshan S. Rd., Zhongzheng Dist., Taipei, Taiwan

3 Department of Mechanical Engineering, Yuan Ze University, No.135, Yuandong Rd., Zhongli, Taiwan

4 School of Engineering and Design, Brunel University, Kingston LaneUxbridge Middlesex UB8 3PH, West London, United Kingdom

For all author emails, please log on.

BMC Musculoskeletal Disorders 2013, 14:207  doi:10.1186/1471-2474-14-207

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2474/14/207


Received:14 December 2012
Accepted:12 July 2013
Published:15 July 2013

© 2013 Tseng et al.; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Osteoporotic hip fractures with a significant morbidity and excess mortality among the elderly have imposed huge health and economic burdens on societies worldwide. In this age- and sex-matched case control study, we examined the risk factors of hip fractures and assessed the fracture risk by conditional logistic regression (CLR) and ensemble artificial neural network (ANN). The performances of these two classifiers were compared.

Methods

The study population consisted of 217 pairs (149 women and 68 men) of fractures and controls with an age older than 60 years. All the participants were interviewed with the same standardized questionnaire including questions on 66 risk factors in 12 categories. Univariate CLR analysis was initially conducted to examine the unadjusted odds ratio of all potential risk factors. The significant risk factors were then tested by multivariate analyses. For fracture risk assessment, the participants were randomly divided into modeling and testing datasets for 10-fold cross validation analyses. The predicting models built by CLR and ANN in modeling datasets were applied to testing datasets for generalization study. The performances, including discrimination and calibration, were compared with non-parametric Wilcoxon tests.

Results

In univariate CLR analyses, 16 variables achieved significant level, and six of them remained significant in multivariate analyses, including low T score, low BMI, low MMSE score, milk intake, walking difficulty, and significant fall at home. For discrimination, ANN outperformed CLR in both 16- and 6-variable analyses in modeling and testing datasets (p?<?0.005). For calibration, ANN outperformed CLR only in 16-variable analyses in modeling and testing datasets (p?=?0.013 and 0.047, respectively).

Conclusions

The risk factors of hip fracture are more personal than environmental. With adequate model construction, ANN may outperform CLR in both discrimination and calibration. ANN seems to have not been developed to its full potential and efforts should be made to improve its performance.

Keywords:
Hip fracture; Artificial neural network; Conditional logistic regression; Discrimination; Calibration

Background

With increased human life expectancy, osteoporosis has become more prevalent and may lead to disastrous fractures at most skeletal sites. Among them, hip fractures are of particular concern because they are associated with a significant morbidity (functional recovery being limited to less than 50% [1]) and excess mortality (up to 18-33% in the first year and persisting for at least 5 years afterwards [2]). The number of hip fractures worldwide is projected to hit 2.6 million by 2025 and may rise to 4.5 million by 2050, imposing huge health and economic burdens upon societies as a whole [3]. For developing strategies to prevent this serious injury, it is of crucial importance to better understand its risk factors and identify the patients at risk. Although many potential risk factors contributing to hip fracture have been identified, such as low bone mineral density (BMD), old age, female gender, chronic health conditions, experience of fracture and falls, physical inactivity, heavy smoking and drinking, impaired vision, use of certain medicines, low calcium and vitamin intake, low body mass index (BMI), low muscle strength, etc. [4,5], these risk factors may vary geographically, ethnically, and culturally, and their combined effects have not been well understood [6]. We have several kinds of method to create the risk factor models for hip fracture evaluation, and conditional logistic regression (CLR) and artificial neural network (ANN) are popular among them.

The ANN, simulating high-level human brain functions, is a computational modeling tool that has become widely accepted for modeling complex real-world problems [7]. Although it has been explored in many areas of medicine, such as nephrology, microbiology, radiology, neurology, cardiology, etc. [8], its use in the orthopedic trauma field is still rare. Eller-Vainicher et al. identified the promising role of ANN in predicting osteoporotic fracture among postmenopause osteoporosis women [9]. Lin et al. found ANN algorism could reliably predict the mortality of hip fractured patients and outperforms the logistic regression method [10]. The ANN, consisting of a set of highly interconnected processing units (neurons) tied together with weighted connections, includes an input layer, one or more hidden layers, and an output layer. The input layer comprises the data available for the analysis, and the output layer comprises the outcome. The ANN is trained on the basis of training data to correlate the input with the corresponding output over repeated training epochs to reduce the overall error. The stimulus of the input is propagated forward through each neuron layer until the output is produced. Then the ANN output is compared to the observed output, and an error signal is calculated. This error signal is then transmitted backwards across the neuron layers and the connection weights are updated to reduce the overall error. This refers to the multiplayer perceptron ANN model with feedforward backpropagation training and this process is supervised by a group of validation data, which are not used in the training process, and is terminated when the validation error reaches its minimum. ANN models derived from this training process are applied to other new datasets not used for training and validation.

In the present age- and sex-matched case control study, we identified important risk factors for hip fractures and the results were further used to build hip fracture prediction models with CLR or ANN methods. Based on a fair comparison with the same dependent variables and analytical processes, we hypothesized that ANN with a more nonlinear approach outperforms CLR in both discrimination and calibration.

Methods

Participants

The inclusion criteria were non-institutionalized patients over 60 years of age who had first-time, low-energy hip fractures, defined as fractures of the proximal femur caused by injuries equal to or less than a fall at standing height. Patients with previous hip lesions or surgeries were excluded. The study was approved by institute review board of National Taiwan University Hospital. Between April 2004 and January 2006, a total of 366 patients older than 60 years were admitted to our institute under the diagnosis of hip fractures. Among them, 115 cases were excluded for the following reasons: previous hip fractures or surgeries (76), fractures not caused by low-energy trauma (25), fractures in institutionalized patients (13), and the fracture treated without surgery (1). Of the 251 patients who met the inclusion criteria, 217 patients (149 women and 68 men) gave written informed consent and were enrolled in the current study. All patients were interviewed under stable conditions after their surgeries. The median time for completing the interview was 6 days after the fracture.

Hospital controls were simultaneously selected from patients of the Department of Family Medicine at the same hospital with the diagnosis of diseases or injuries unrelated to bone and without any history of hip fractures. The control group was individually matched to cases by age (within 4 to 6 years) and sex. Informed consents were obtained from all the participants.

Data measurements

Selection of the risk variables was based on the results of previous studies and other potential causes of hip fracture in an older population. Both cases and controls were interviewed by trained interviewers with the same standardized questionnaire including questions on 66 variables in 12 categories: 1) socio-demography (six variables: ethnicity, education, occupation, marriage, income, and living arrangement); 2) disease history (14 variables: hypertension, diabetes, stroke, heart disease, chronic respiratory disease, arthritis, osteoporosis, liver disease, cancer, cataract, Parkinson’s disease, constipation, weakness, and headache or migraine); 3) self-assessed health (three variables: current, comparison with 1 year ago, and comparison with same-aged people); 4) anthropometry (three variables: height, weight, BMI); 5) health habits (three variables: smoking, alcohol consumption, and regular exercise); 6) diet habits and medicine (15 variables: vegetarian diet, intake of milk, coffee, tea, calcium, vitamin, glucosamine, or anti-hypertensive, other cardiovascular, analgesic, anti-diabetic, psychotropic, gastrointestinal, and other drugs, and multiple medications); 7) injury-related experience (four variables: history of fall-induced fractures, fracture location, significant fall at home in the past year, and history of fall outdoors); 8) environmental hazards (seven variables: building type, multistory dwelling, number of stairs in a flight, stair height, stair lighting, outdoor lighting, and green light duration near their home); 9) physical functions (four variables: Activities of daily living (ADL) difficulty; Instrumental ADL (IADL) difficulty, walking difficulty, and pain at walking); 10) cognitive and other functioning (five variables: urinary incontinence, fecal incontinence, vision, hearing, and Mini-Mental State Examination (MMSE) score); 11) coordination function; and 12) total BMD. Height and weight were measured using electronic scales for BMI calculation. The physical functions were measured by questions on the level of difficulty in performing five ADL (eating, bathing, dressing, transferring, toileting), six IADL (using the telephone, managing medications, preparing meals, maintaining the home, shopping, managing finances), and walking. Cognitive function was measured with the MMSE. The coordination function was measured by finger-to-nose test which was conducted by asking the participants to use their finger to alternately touch their own nose and the interviewers’ finger as quickly as possible. BMD (T-score) was examined at the non-fractured side of proximal femur for cases and the same side for matched controls by using the same machine of dual-energy x-ray absorptiometry (DEXA) (Model: QDR4500A; Hologic, Waltham, MA), and read by the same radiologist. The reliability of interview and measurement results among the interviewers was checked by intraclass correlation coefficient (ICC), which showed moderate to high agreement.

Data processing and risk factor selection

The data were analyzed with conditional logistic regression to produce odds ratios and 95% confidence intervals using statistic software of SPSS COXREG 17.0 (SPSS Inc., Chicago, IL). Univariate analysis was initially conducted to examine the unadjusted association of all potential risk factors with hip fracture. Continuous variables including monthly income, body weight, height, leisure-time physical activity, MMSE score, peak expiratory flow rate, average hand grip strength, and total BMD value were all categorized into two groups according to the cut-off point selected by the Youden index in the receiving operating characteristic (ROC) curves. A “Missing” category was created for BMD with missing data. Significant variables with p?<?0.1 in univariate analyses were then tested by multivariate analyses with the forward stepwise approach, with the p value set at 0.05 for entry and 0.1 for removal. Categorical variables were contrasted with reference to the other category. All statistical tests performed were 2-tailed, and the final significance level was set at 0.05.

The significant variables in univariate and multivariate analyses were used to compute the individual fracture risk with either CLR or ANN. The dependent variable, hip fracture, was a dichotomous variable (Yes?=?1; No?=?0). All predictors were binary variables, coded with 0 or 1 (missing?=?2).

Participant partition

To assess the generalization, three way data split method [11] (Figure 1) was used for construction of prediction models and internal cross validation. The 217 matched pairs were randomly divided into two separate groups for 10-fold cross validation analyses: 195–197 pairs (about 9/10 of the enrolled patients) as the modeling datasets and 20–22 pairs (about 1/10) as the testing datasets. The modeling group was used to build CLR and ANN models. The testing group was set aside for later tests for generalization.

thumbnailFigure 1. The flowchart of data partition, neural network creation and generalization analyses by cross validation.

Conditional logistic regression model

In CLR analyses, the regression equations were derived from the significant variables in univariate and multivariate analyses in the modeling datasets. Risk scores calculated by regression equations as the summation of the products of the included independent variables and the regression coefficients of the variables were used to assess hip fracture risk [12]. The regression equations were then applied to the subjects in testing datasets for generalization analyses.

Artificial neural network model

In ANN analyses, the participants in each modeling dataset were further randomly divided into two subsets: 9/10 as the training subsets and 1/10 as the validation subsets also based on the principle of 10-fold cross validation. This procedure was performed twice, and thus 20 groups of training and validation subsets were obtained for ensemble analyses. In the training subsets, feed-forward back-propagation neural networks consisted an input layer, hidden layers, and an output layer, were constructed. A scaled conjugate-gradient algorithm [13] was used as a supervised learning algorithm to train the network. It adjusted the internal weights and biases of the network according to the second-order gradient information over repeated training epochs to reduce the overall error. One epoch consisted of a single presentation of each set of inputs followed by automatic adjustments of the weight connections to minimize the total error for all data that were used in the training. The estimation of error was based on the mean-squared error. The parameter, which determined the change in the weight for the second derivative approximation (σ), was set to 5×10-5. The parameter, which regulated the indefiniteness of the Hessian (λ), was set to 5×10-7. A logistic transformation of the weighted inputs to the output node was applied to determine the overall output of the network, which would range from 0 to 1. The training was terminated if the error in the validation subsets stopped dropping or, indeed, started to rise (early stopping). The number of hidden neurons was determined according to the test running on 5 to 25. In each group of training and validation subsets, 15 sets of different initial weights were analyzed, and the networks with the lowest validation errors were selected. Thus we got 20 networks after the twice 10-fold cross validation training and validating each time and these 20 networks were combined to generate the ensemble models by simple average of the outputs. These ensemble models were applied to the testing datasets. The variables used in ANN analyses were the same as those in CLR analyses. The ANN analyses were run by Neural Network Toolbox in MATLAB 7.8 (R2009a, MathWorks, Natick, MA).

Comparison of performance of models

In both modeling and testing datasets, the validity was checked by discrimination, and the reliability was checked by calibration (goodness of fit [14]). The discriminatory power of the models was assessed using the area under the ROC curves (AUROC). Discrimination refers to the ability to distinguish positive from negative cases. A good discriminating model in the present study would assign a higher risk score to hip fracture cases. Sensitivity, specificity, and accuracy were calculated in modeling and testing datasets according to the cut-off points selected by the Youden index on ROC curves. The calibration power of the models was compared using Hosmer-Lemeshow (HL) statistics [15]. The HL statistic is a single summary measure of the calibration and is based on comparing the observed and estimated fractured cases. The smaller the HL statistic is, the better the fit, with a perfectly calibrated model having a value of zero. Meanwhile, calibration curves based on the deciles from the data calculated using observed and expected values were built. The relationship between the observed and expected values was evaluated by ICC. The performance of classifiers, including discrimination, calibration and other measures of accuracy, sensitivity and specificity on the 10 pairs of ANN and CLR datasets, was compared using Wilcoxon signed-rank tests (p?<?0.05).

Results

Of the 149 pairs of women and 68 pairs of men, the average age was 80.7?±?7.8 (mean?±?standard deviation) years for women and 80?±?7.4 years for men in the fracture group and 77.8?±?6.8 years for women and 78.4?±?7.9 years for men in the control group. In univariate analyses among the 66 variables, 16 variables achieved significant level (Table 1). Milk intake meant milk consumption at least six times a week. Walking difficulty meant inability to walk or walking with assistance of crutches or walkers. Significant fall at home meant major fall at home more than once in the past year. Low education level meant lower than junior middle school. Current smoking meant a smoking habit of more than half of a pack per day. Fecal incontinence meant experience of uncontrolled stool passage. Vision impairment was recorded according to patients’ subjective feeling of impaired vision during walking. ADL difficulty meant impairment of at least two of the five activities. IADL difficulty meant impairment of at least two of the six activities. Regular exercise meant exercise habit at least four times per week. Coordination abnormality meant under or over shooting of a target and impaired timing or integration of muscle activity during finger-to-nose examination. In multivariate analyses, six variables remained statistically significant (Table 1). BMD was the most important factor causing hip fractures with the highest odds ratio and statistical significance. The average T-score was much lower in hip fracture patients than that in controls, -2.58?±?1.06 vs. -1.85?±?1.3. It was also lower in women than in men (−2.8?±?1.02 vs. -1.9?±?0.92 for fractured patients and −1.6?±?1.18 vs. -0.6?±?0.93 for controls). Here we chose BMD alone to access its prediction ability for hip fractures with CLR analyses in order to compare its combined effects with other risk factors.

Table 1. Results of univariate and multivariate analyses of CLR

The neural network with eight hidden neurons was selected in the training process. For discrimination in modeling datasets, ANN was significantly higher than CLR in AUROC and accuracy in 16- and 6-variable models (Table 2) (Figure 2). The sensitivity was not significantly different in the two models. For specificity, ANN was significantly higher than CLR only in the 16-variable model. In testing datasets ANN was significantly higher than CLR in AUROC and accuracy in the 16- and 6-variable models. There was no significant difference for sensitivity and specificity. In some datasets, AUROC and accuracy were very close between ANN and CLR, e.g., testing datasets 3 (0.865 vs. 0.863) and 4 (0.807 vs. 0.801) in 6-variable models. The accuracy of CLR was even higher than that of ANN in testing dataset 6 (0.698 vs. 0.651) and 7 (0.698 vs. 0.697) in 6-variable models. As for calibration in modeling datasets, ANN had significantly lower HL Chi-squares and was more calibrated than CLR in 16-variable models (Table 3) (Figure 3). There was no significant difference in 6-variable models. ICCs were not significantly different in the two models. In testing datasets, ANN was more calibrated than CLR with significantly lower HL chi-squares and higher ICCs in 16-variable models. In 6-variable models, HL chi-squares were not significantly different, but ANN still had significantly higher ICCs (Figure 4).

thumbnailFigure 2. Comparison of discrimination power. (a) ROC curves in the modeling dataset. (b) ROC curves in the testing dataset. Black dots indicate the cut-off points determined by Youden Index.

thumbnailFigure 3. Comparison of calibration power in modeling datasets. (a) Calibration curves in ANN models. (b) Calibration curves in CLR models. Calibration curves were based on predictions determined by deciles.

Table 2. Discrimination of ANN and CLR in modeling and testing datasets with 16- and 6-variable models

Table 3. Calibration of ANN and CLR in modeling and testing datasets with 16- and 6-variable models

thumbnailFigure 4. Comparison of calibration power in testing datasets. (a) Calibration curves in ANN models. (b) Calibration curves in CLR models. Calibration curves were based on predictions determined by deciles.

For using BMD alone to assess the fracture risk by CLR in modeling datasets, the AUROC and HL chi-square were 0.723?±?0.01 and 17.21?±?4.523, respectively. In testing datasets, the AUROC and HL chi-square were 0.702?±?0.056 and 12.86?±?5.214. The discrimination and calibration of the model of BMD alone was lower than the model created by BMD and other risk factors in CLR model (Table 2, Table 3).

Discussion

In the present study, univariate CLR analysis identified 16 significant factors, including low T-score, walking difficulty, low BMI, low MMSE score, low milk intake, significant fall at home, low education, smoking habit, fractures experienced after age 55 years, fecal incontinence, vision impairment, presence of major diseases, ADL difficulty, IADL difficulty, no regular exercise, and coordination abnormality. The first six factors remained statistically significant in stepwise multivariate analysis, with low T-score being the most important one among them. In comparison of ANN and CLR for fracture risk assessment, ANN provided statistically higher discrimination and calibration power in the modeling and testing datasets in cross validation analyses.

In the literature, various clinical risk factors have been reported for hip fractures [4], but their combined effects for fracture prediction varies. The present matched case control study investigated most of the different kinds of potential personal and environmental risk factors. The 16 significant factors left in univariate analysis were mostly personal and modifiable. This outcome supports the finding that at-home falls of old people are mainly due to impaired general health, rather than external hazards [16], and emphasizes the importance of improving bone strength and general health for fracture prevention. It has been reported that milk supplement can increase the bone density in Chinese women [17,18] and low milk intake could lead to high fracture risk in our study. Low milk intake might also account for low education level which was associated with high fracture risk [19]. Walking difficulty and low MMSE could account for vision impairment, poor coordination, low ADL and IADL. Low BMD, the most significant variables in our analyses, could account for smoking habit, associated diseases, lacking of exercise, fecal incontinence and previous fractures. BMD measurement is an important tool for assessing osteoporosis. It can be used for diagnosis, monitoring of treatment, and fracture risk prediction. Hip fracture risk increased by 3.7 times per SD decrease in femoral neck BMD at the age of 50 years [20]. The present study supports the finding that combining BMD and clinical risk factors can further improve the predictability of hip fracture and emphasize the multidirectional approach for patient at risks.

Logistic regression and ANN are currently the most widely used models for diagnosis and prognosis studies in biomedicine. Logistic regression has the advantages of high interpretability of model parameters and ease of use, but the use of linear combinations of variables is not suitable for modeling highly nonlinear complex interactions as is demonstrated in biologic and epidemiologic systems [21]. ANN with its resemblance to the human brain is appealing because of flexible nonlinear systems that show robust performance in dealing with noisy, incomplete or missing data and have the ability to generalize. They may be better at predicting outcomes when the relationships between the variables are multidimensional as found in complex biological systems. The ANN model allows inclusion of a large number of variables and there are not many assumptions (such as normality) that need to be verified. However, the comparative performance of these two methods has been widely reported with great controversy in the literature. In a review of 28 major studies carried out by Sargent [22], the performance was superior for ANN in 10 studies (36%), was superior for logistic regression in 4 cases (14%), and was similar in the remaining 14 cases. In another review of 72 papers conducted by Dreiseitl and Ohno-Machado [15], with statistical tests, both models performed similarly in 42%, ANN better in 18%, and logistic regression better in 1%. By contrast, without statistical tests, ANN was better in 33% and logistic regression better in 6%. The authors also surveyed the quality of the methodology and found a shortage of reporting ANN model building details in 49%, lack of statistical testing in 39%, and lack of calibration information in 75%. ANN is theoretically more flexible than logistic regression because of multi-layer networks, but on the other hand, it is threatened by over-fitting and instability [23]. Especially, there are still no set methods for constructing ANN models [23], which may lead to the wide variation in the comparative results.

Over-fitting ANN model which are trained too closely on limited available data would lose its generalization. The network with generalization could offer reasonable outputs in new unseen data. A commonly used method to improve generalization in data-mining is a three-way data split with cross validation [11] as in the present study. The modeling datasets were split into training and validation subsets. The error on the validation subset was monitored during training epoch and once the error had increased, the training was stopped (early stopping). The network with lowest validation errors was chose. This generalization property may obtain good output data without training on all possible available datasets. Another practical problem is ANN instability [23] which means that changes in the training data may produce very different models and consequently different performance on unseen data. The instability is caused by training getting caught in different local minima in the error surface. This instability problem can be fixed by building ANN ensembles and aggregating the results of the networks [24]. The aggregated outputs with diversified individual networks will have lower variance and smaller bias than a single network. Furthermore, the 10-fold cross splitting method used for building the ANN ensembles could ensure each datum was equally used for both training and validation. The present study showed that ANN significantly outperformed CLR in terms of discrimination and calibration in both 16- and 6-variable models. However, it may lead to biased superior performance in ANN training or validation subsets when compared with CLR models. Thus, we used the cross validation testing datasets for ANN and CLR generalization comparison. Besides, as shown in the Table 2, comparison of discrimination on a single testing dataset might lead to no significant difference or even higher accuracy in CLR. This might explain the high inconsistency in the comparisons of these two classifiers reported in the literature, especially if statistical testing was not performed [15,25]. In the present study, nonparametric tests for paired samples in 10 cross validation groups could detect the significant difference between the two classifiers in datasets with varied patterns.

Sensitivity, specificity and accuracy determined according to a pre-specified cutoff point are also commonly used for comparing the performance of the classifiers [15]. Actually, the risk score computed by the classifiers may be affected by the disease prevalence; thus selection of the cutoff points is important for a fair comparison. In the present study, the Youden index defined by the point with the minimum of the summation of the false positive and false negative rates in the ROC curve best differentiates between subjects with disease and those without disease when equal weight is given to sensitivity and specificity. Using the Youden index as the cut-off point can be independent from the disease prevalence and makes the predicting models more applicable to different series of patients [26]. It has been reported that the use of a cut-off point arbitrarily determined at a risk score equal to 0.5 might lead to biased results and unfair comparisons [27].

The present study had limitations. First, as a matched case control study, age and sex were not included in the predictive models. This exclusion might lower the performance of the classifiers. Second, some clinical risk factors were not included, such as the geometry of the proximal femurs or maternal history of hip fractures, because the former is not a routine examination for the elderly and the latter might be subject to information or reporting bias. Third, all the continuous variables were converted to binary variables with a cut-off point of the Youden index. This method could maximize the difference between cases and controls and make the comparison more fair and clinical application easier. However, some important information might be lost if the distribution of the variables was complex [28]. Fourth, it was not fair for CLR if the interaction terms or quadratic functions were not included. However, these interaction terms were not routinely examined in conventional analyses. Besides, no significant interaction between the input variables was found in the present study. Fifth, participant partition using 10-fold cross validation method in the present study might result in a sample size too small for validation and testing and increase the variance [25]. Besides, this sample size was also not enough for a standard HL analysis, which required at least 400 cases [29]. Bootstrap resampling method might be another option to improve the efficiency of validation. Last, although considerable efforts, through many trial-and-errors, were made to optimize the design of the neural networks, they still could be further improved in model topology or ensemble method [22].

Conclusions

The hip fracture risk in the elderly can be effectively assessed by neural networks and logistic regression analyses. The risk factors identified in the present study are more personal than environmental. Combining BMD and clinical risk factors can predict the fracture risk better than BMD alone. With adequate model construction and comparison, ANN may outperform CLR in both discrimination and calibration. However, ANN seems have not been developed to its full potential. More studies to further improve its performance are warranted. The models created in this study still need to be validated externally.

Abbreviations

CLR: Conditional logistic regression; ANN: Artificial neural network; BMD: Bone mineral density; BMI: Body mass index; ADL: Activities of daily living; IADL: Instrumental activities of daily living; MMSE: Mini-mental state examination; DEXA: Dual-energy x-ray absorptiometry; ICC: Intraclass correlation coefficient; ROC: Receiver operating characteristic; AUROC: Area under the ROC curve; HL: Statistics: Hosmer-Lemeshow statistics.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

WJT contributed to study design, data analysis and drafted the manuscript. LWH contributed to data analysis and drafted the manuscript. JSS, MFA and JL contributed to study design and manuscript review. All authors read and approved the final manuscript.

Acknowledgement

This study was supported by the National Health Research Institutes in Taiwan (Project GE-093-PP-05 & GE-094-PP-06).

References

  1. Wehren L, Magaziner J: Hip fracture: risk factors and outcomes.

    Curr Osteoporos Rep 2003, 1:78-85. PubMed Abstract | Publisher Full Text OpenURL

  2. Magaziner J, Lydick E, Hawkes W, Fox KM, Zimmerman SI, Epstein RS, Hebel JR: Excess mortality attributable to hip fracture in white women aged 70 years and older.

    Am J Public Health 1997, 87:1630-1636. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Taylor BC, Schreiner PJ, Stone KL, Fink HA, Cummings SR, Nevitt MC, Bowman PJ, Ensrud KE: Long-term prediction of incident hip fracture risk in elderly white women: study of osteoporotic fractures.

    J Am Getri Soc 2004, 52:1479-1486. Publisher Full Text OpenURL

  4. Marks R: Hip fracture epidemiological trends, outcomes, and risk factors, 1970–2009.

    Int J Gen Med 2010, 3:1-17. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  5. Robbins J, Aragaki AK, Kooperberg C, Watts N, Wactawski-Wende J, Jackson RD, LeBoff MS, Lewis CE, Chen Z, Stefanick ML, et al.: Factors Associated With 5-Year Risk of Hip Fracture in Postmenopausal Women.

    JAMA-J Am Med Assoc 2007, 298(20):2389-2398. Publisher Full Text OpenURL

  6. LAU EMC, SURIWONGPAISAL P, LEE JK, DE D, FESTIN MR, SAW SM, KHIR A, TORRALBA T, SHAM A, SAMBROOK P: Risk factors for hip fracture in asian men and women: the Asian osteoporosis study.

    J Bone Miner Res 2001, 16:572-580. PubMed Abstract | Publisher Full Text OpenURL

  7. Basheer IA, Hajmeer M: Artificial neural networks: fundamentals, computing, design, and application.

    J Microbiol Meth 2000, 43:3-31. Publisher Full Text OpenURL

  8. Patel JL, Goyal RK: Applications of artificial neural networks in medical science.

    Curr Clin Pharmacol 2007, 2:217-226. PubMed Abstract | Publisher Full Text OpenURL

  9. Eller-Vainicher C, Chiodini I, Santi I, Massarotti M, Pietrogrande L, Cairoli E, Beck-Peccoz P, Longhi M, Galmarini V, Gandolini G, Bevilacqua M, Grossi E: Recognition of morphometric vertebral fractures by artificial neural networks: analysis from GISMO Lombardia Database.

    PLoS One 2011, 6(11):e27277. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  10. Lin CC, Ou YK, Chen SH, Liu YC, Lin J: Comparison of artificial neural network and logistic regression models for predicting mortality in elderly patients with hip fracture.

    Injury 2010, 41(8):869-873. PubMed Abstract | Publisher Full Text OpenURL

  11. Winham SJ, Slater AJ, Motsinger-Reif AA: A comparison of internal validation techniques for multifactor dimensionality reduction.

    BMC Bioinformatics 2010, 11(1):394. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  12. Lin CC, Bai YM, Chen JY, Hwang TJ, Chen TT, Chiu HW, Li YC: Easy and low-cost identification of metabolic syndrome in patients treated with second-generation antipsychotics.

    J Clin Psychiat 2010, 71(03):225-234. Publisher Full Text OpenURL

  13. Meiller MF: A scaled conjugate gradient algorithm for fast supervised learning.

    Neural Netw 1993, 6:525-533. Publisher Full Text OpenURL

  14. Matheny M, Ohnomachado L, Resnic F: Discrimination and calibration of mortality risk prediction models in interventional cardiology.

    J Biomed Inform 2005, 38(5):367-375. PubMed Abstract | Publisher Full Text OpenURL

  15. Dreiseitl S, Ohno-Machado L: Logistic regression and artificial neural network classification models: a methodology review.

    J Biomed Inform 2002, 35(5–6):352-359. PubMed Abstract OpenURL

  16. Parker MJ, Twemlow TR, Pryor GA: Environmental hazards and hip fractures.

    Age Ageing 1996, 25:322-325. PubMed Abstract | Publisher Full Text OpenURL

  17. LAU EMC, HONG A, LAM V, WOO J: Milk supplementation of the diet of postmenopausal Chinese women on a low calcium intake retards bone loss.

    J Bone Miner Res 2001, 16:1704-1709. PubMed Abstract | Publisher Full Text OpenURL

  18. Ting G, Tan S, Chan S, Karuthan C, Zaitun Y, Suriah A, Chee W: A follow-up study on the effects of a milk supplement on bone mineral density of menopausal Chinese women in Malaysia.

    J Nutr Health Aging 2007, 11:69-73. PubMed Abstract OpenURL

  19. Lofthus CM, Osnes EK, Meyer HE, Kristiansen IS, Nordsletten L, Falch JA: Young patients with hip fracture: a population-based study of bone mass and risk factors for osteoporosis.

    Osteoporosis Int 2006, 17(11):1666-1672. Publisher Full Text OpenURL

  20. Kanis JA, Oden A, Johnell O, Johansson H, Laet C, Brown J, Burckhardt P, Cooper C, Christiansen C, Cummings S, et al.: The use of clinical risk factors enhances the performance of BMD in the prediction of hip and osteoporotic fractures in men and women.

    Osteoporosis Int 2007, 18(8):1033-1046. Publisher Full Text OpenURL

  21. Ayer T, Chhatwal J, Alagoz O, Kahn CE, Woods RW, Burnside ES: Comparison of logistic regression and artificial neural network models in breast cancer risk estimation.

    Radiographics 2010, 30:13-22. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  22. Sargent DJ: Comparison of artificial neural networks with other statistical approaches.

    Cancer 2001, 91:1636-1642. PubMed Abstract | Publisher Full Text OpenURL

  23. Cunningham P, Carney J, Jacob S: Stability problems with artifcial neural networks and the ensemble solution.

    Artif Intell Med 2000, 20:217-225. PubMed Abstract | Publisher Full Text OpenURL

  24. Santos-Garcı́a G, Varela G, Novoa N, Jiménez MF: Prediction of postoperative morbidity after lung resection using an artificial neural network ensemble.

    Artif Intell Med 2004, 30(1):61-69. PubMed Abstract | Publisher Full Text OpenURL

  25. Schwartzer G, Vach W, Schumacher M: On the misuses of artificial neural networks for pronostic and diagnostic classification in oncology.

    Stat Med 2000, 19:541-561. PubMed Abstract | Publisher Full Text OpenURL

  26. Fluss R, Faraggi D, Reiser B: Estimation of the Youden index and its associated cutoff point.

    Biom J 2005, 47:458-472. PubMed Abstract | Publisher Full Text OpenURL

  27. Jimenez-valverde A, Lobo J: Threshold criteria for conversion of probability of species presence to either–or presence–absence.

    Acta Oecol 2007, 31(3):361-369. Publisher Full Text OpenURL

  28. Sakai S, Kobayashi K, Akazawa K, Kanda T, Mandai N, Toyabe SI: Comparison of the levels of accuracy of an artificial neural network model and a logistic regression model for the diagnosis of acute appendicitis.

    J Med Syst 2007, 31(5):357-364. PubMed Abstract | Publisher Full Text OpenURL

  29. Bewick V, Cheek L, Ball J: Statistics review 14: Logistic regression.

    Crit Care 2005, 9(1):112. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

Pre-publication history

The pre-publication history for this paper can be accessed here:

http://www.biomedcentral.com/1471-2474/14/207/prepub