Application of Machine Learning Algorithms to Predict Osteoporotic Fractures in Women
Article information
Abstract
Background
Predicting the risk of osteoporotic fractures is vital for prevention. Traditional methods such as the Fracture Risk Assessment Tool (FRAX) model use clinical factors. This study examined the predictive power of the FRAX score and machine-learning algorithms trained on FRAX parameters.
Methods
We analyzed the data of 2,147 female participants from the Ansan cohort study. The FRAX parameters employed in this study included age, sex (female), height and weight, current smoking status, excessive alcohol consumption (>3 units/d of alcohol), and diagnosis of rheumatoid arthritis. Osteoporotic fracture was defined as one or more fractures of the hip, spine, or wrist during a 10-year observation period. Machine-learning algorithms, such as gradient boosting, random forest, decision tree, and logistic regression, were employed to predict osteoporotic fractures with a 70:30 training-to-test set ratio. We evaluated the area under the receiver operating characteristic curve (AUROC) scores to assess and compare the performance of these algorithms with the FRAX score.
Results
Of the 2,147 participants, 3.5% experienced osteoporotic fractures. Those with fractures were older, shorter in height, and had a higher prevalence of rheumatoid arthritis, as well as higher FRAX scores. The AUROC for the FRAX was 0.617. The machine-learning algorithms showed AUROC values of 0.662, 0.652, 0.648, and 0.637 for gradient boosting, logistic regression, decision tree, and random forest, respectively.
Conclusion
This study highlighted the immense potential of machine-learning algorithms to improve osteoporotic fracture risk prediction in women when complete FRAX parameter information is unavailable.
INTRODUCTION
The principal objective of osteoporosis management is to reduce the fracture risk. Therefore, individuals at risk of fractures must be identified. According to the World Health Organization (WHO) diagnostic criteria for osteoporosis, treatment can be initiated for people when their bone mineral density (BMD) T score falls below -2.5; however, traditionally, treatment also can be started after screening people at risk through clinical risk factors. Attempts have been made to improve fracture risk assessment by combining BMD and clinical risk factors [1]. The Fracture Risk Assessment Tool (FRAX) is one of the most widely used tools.
The FRAX, introduced by the WHO in 2008, is a computer-based algorithm that calculates the 10-year probability of hip and major osteoporotic fractures. It incorporates age, sex, height, weight, and a set of seven binary clinical risk factors including previous fractures, parentfractured hip, current smoking, glucocorticoid use, excessive alcohol consumption, diagnosis of rheumatoid arthritis, and other causes of secondary osteoporosis. It can also include femoral neck BMD data [2]. The FRAX fracture probability varies significantly from country to country. It was initially introduced in eight countries and has since been validated for use in more than 60 countries [3]. Over the course of 2 decades, the FRAX has gained widespread utilization to the extent that it is included in clinical practice guidelines. However, it has certain limitations primarily stemming from its acceptance of only binary inputs. These limitations include the absence of information regarding the recency of previous fractures, the lack of dose-response data for glucocorticoid exposure, and the questionable accuracy of self-reported family history information [4].
Recently, artificial intelligence and machine-learning algorithms have gained significant attention in the field of osteoporosis. They are recognized for their potential in exploring new research fields, including the investigation of novel risk factors and the prediction of osteoporosis, falls, and fractures by leveraging biological testing, imaging, and clinical data [5]. Currently, machine-learning algorithms, particularly supervised learning techniques, are the most extensively employed methods to predict outcomes. In this study, we aimed to construct a prediction model for osteoporotic fractures among female participants of a large-scale cohort study by employing machine-learning techniques and utilizing selected FRAX parameters, and to compare the performance of this model with conventional risk assessment tools.
METHODS
1. Study Population
This study analyzed data from the Ansan cohort study, an ongoing prospective cohort study of the Korean Genome Epidemiology Study (KoGES). This cohort comprised a general population aged 40 years and above. The study participants were adults aged 40 years and above who were residents of Ansan region. The baseline study was conducted from June 25, 2001, to January 29, 2003, followed by biennial on-site interviews and comprehensive health examinations. The analysis included data from the seventh phase of KoGES. This study aligned with previously published research in terms of its cohort inclusion criteria and methodology [6,7].
From 3,975 individuals, we selected 2,238 female participants who had responded to the FRAX parameters for our analyses. We excluded 53 individuals due to missing weight and height information. In accordance with the research objectives, 2,147 participants who had answered “no” to questions regarding parental fractures, glucocorticoid use, and previous fractures were included in the final analysis.
This study was approved by the Institutional Review Board of CHA Bundang Medical Center (IRB protocol no., 2023-08-048), and informed consent was waived because of the retrospective nature of the study.
2. FRAX Parameters and Scores
The FRAX parameters employed in this study included age, sex (female), height and weight, current smoking, excessive alcohol consumption (>3 units/d of alcohol), and the diagnosis of rheumatoid arthritis. FRAX scores were calculated for the 10-year major osteoporotic fracture risk using the Korea-specific FRAX tool available online from the University of Sheffield (https://www.sheffield.ac.uk/FRAX/tool.aspx?country=25), without including BMD information.
3. Osteoporosis Self-Assessment Tool for Asians
The Osteoporosis Self-Assessment Tool for Asians (OSTA) score is used as a traditional and straightforward screening tool for predicting the risk of osteoporosis based on height and weight [8,9]. The OSTA score is calculated by subtracting age (years) from weight (kg) and multiplying by 0.2 [9]. Individuals are stratified into being at low (OSTA >-1), medium (-1≤ OSTA ≤-4), and high risk (OSTA <-4) of sustaining osteoporosis [10].
4. Osteoporotic Fractures
An osteoporotic fracture was clinically defined as a fracture occurring spontaneously or following a minor trauma [11]. We collected 10 years of cumulative data on fractures reported by patients through face-to-face and telephone interviews. Among these, we defined those occurring in the wrist, femur, spine, and upper arm as osteoporotic fractures.
5. Machine-Learning Algorithms and Features
The implemented machine-learning algorithms included gradient boosting (GB), random forest (RF), decision tree (DT), and logistic regression (LR). All models were performed with Scikit-learn in the Python 3.11 environment (Python Software Foundation, Wilmington, DE, USA). GB and RF are common tree-based ensemble methods known for their accuracy across various datasets. DT is a schematic representation of several decisions, each with a different probability of occurrence. LR models are widely used for multivariate analysis [12,13]. During the training process, the hyperparameters were optimized using the grid-search approach.
Machine-learning models were trained to predict osteoporotic fractures using the selected FRAX parameters as input features. Among these parameters, age, height, and weight were continuous variables, whereas the others were categorical variables. All data were scaled using the StandardScaler method, which transforms continuous features to have a mean of 0 and a standard deviation of 1. The data were randomly divided into training and testing datasets with a 70:30 split, resulting in 1,502 in the training dataset and 645 in the test data. The prediction target of each model was a binary classifier, where “0” represented no fracture, where “1” represented one or more fractures accumulated over 10 years.
6. Performance Evaluation
The area under the receiver operating characteristic curve (AUROC) was calculated to compare the performance of each model.
7. Statistical Analysis
The baseline characteristics of individuals who experienced at least one fracture over a 10-year period were compared with those of individuals who did not experience any fracture during the same timeframe. Continuous variables were presented as means±standard deviations, while categorical variables were expressed as numbers (proportions). The FRAX and OSTA scores were presented as medians with interquartile ranges. To compare variables between the groups, we used the chi-square test or Fisher’s exact test for categorical variables. For continuous variables, the independent t-test and Mann-Whitney U test were used as appropriate. Statistical significance was defined as P-values <0.05. All statistical analyses were conducted using the IBM SPSS statistical package ver. 25.0 (IBM Corp., Armonk, NY, USA).
RESULTS
1. Baseline Characteristics
We included 2,147 participants in the analysis (Table 1). The patients’ mean age was 59.3 years. During the follow-up, osteoporotic fractures occurred in 76 patients (3.5%). Patients who experienced osteoporotic fractures were older (P<0.001) and had a shorter stature (P=0.009). Furthermore, these patients were more frequently diagnosed with rheumatoid arthritis (P=0.001). Additionally, their FRAX scores were higher (P=0.001), and their OSTA scores were lower (P=0.020).
2. Performance of Machine-Learning Models
The predictive results of each model are presented in Table 2. Among the four machine-learning models, GB showed the highest AUROC of 0.662. LR, DT, and RF yielded AUROC values of 0.652, 0.648, and 0.637, respectively (Figure 1). The AUROC values for the FRAX and OSTA were 0.617 and 0.579, respectively. When considering the AUROC for predicting osteoporotic fractures, the machine-learning models outperformed the FRAX and OSTA.
DISCUSSION
This study pioneered the use of machine-learning algorithms in predicting fractures in a community-based cohort of women using specific FRAX parameters. Although the FRAX is a widely adopted tool validated in over 60 countries, it has notable limitations. FRAX relies on a binary input for previous fractures, overlooking variations in fracture risk, such as the higher risk associated with two previous fractures compared to one. It does not account for the elevated recurrence risk of hip and vertebral fractures as opposed to distal fractures [14,15]. Additionally, FRAX lacks information regarding the recency of fractures [16-18]. Moreover, factors such as steroid exposure, smoking, and alcohol consumption are also likely to affect fractures in a dose-dependent manner [14,19-21].
Parental hip fracture is a well-established independent risk factor for osteoporotic fractures in offspring [4]. However, self-reported family history information can be negatively influenced by recall bias or lack of medical knowledge [4].
This study aimed to predict osteoporosis by exclusively utilizing responses related to relatively objective and bias-free indicators, excluding items in the FRAX that have been identified for their limitations.
In the field of osteoporosis, machine-learning algorithms have been applied for various purposes, including predicting the presence of osteoporosis, forecasting fractures and falls, and discovering novel risk factors [5]. Several studies have employed machine learning to predict fractures. In a study similar to ours that utilized Ansan cohort data, Kong et al. [7] found that the CatBoost algorithm outperformed FRAX with BMD in predicting total fractures, with AUROC scores of 0.688 and 0.666, respectively. The AUROC for predicting major osteoporotic fractures using the FRAX score without BMD was 0.638, similar to our findings. Kong et al. [7] trained their algorithm using additional clinical factors obtained from a cohort study and identified the top 20 risk factors for fractures. Their study also elucidated lesser-known novel factors such as arthralgia scores, homocysteine levels, and C-reactive protein levels, indicating their potential as new risk factors [7].
The reported AUROC for fracture prediction using FRAX varies from 0.6 to 0.79, and our study exhibited an AUROC of 0.61 [22,23]. The FRAX exhibits the best performance in predicting hip fractures in women when BMD information is available [23]. Hence, when predicting hip fractures without BMD information and relying solely on clinical risk factors, as analyzed in this study, machine-learning algorithms can demonstrate better performance than that of the FRAX model. Therefore, the application of machine-learning algorithms in clinical practice is a meaningful endeavor.
This study had several strengths. To our knowledge, it was the first study to use machine learning to predict fractures by training on FRAX parameters from a large prospective cohort and comparing it with the FRAX score to explore the clinical applicability of machine learning. Particularly noteworthy is the validation of the potential for predicting fractures using the relatively objective FRAX parameters in situations where clinical information regarding these parameters is limited, potentially leading to the development of more effective tools.
This study had certain limitations. First, the evaluation of fractures was based on patient reports rather than radiographs, which may introduce discrepancies between the reported and actual conditions. The relatively low rate of osteoporotic fractures may be attributed to the fact that only self-reported fractures were considered. However, this cohort is known for well-structured research procedures, interview, and surveys. Another limitation was the absence of BMD data in our analysis. The predictive accuracy could potentially have bene higher if BMD were included in the machine-learning training. This aspect should be investigated in future studies. Finally, this study aggregated all 10-year cumulative fractures into a single binary classification, which did not permit the prediction of fractures over time.
In conclusion, when obtaining complete FRAX information is challenging, machine-learning algorithms show promise for predicting osteoporotic fractures, particularly in women. This study highlights the advancements in the clinical application of machine learning.
Notes
CONFLICT OF INTEREST
No potential conflict of interest relevant to this article was reported.
FUNDING
This research was supported by the Korean Fund for Regenerative Medicine (KFRM) grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Health & Welfare) (code: KFRM 23C0102L1).