Skip to main content

Ultrasound-based deep learning radiomics model for differentiating benign, borderline, and malignant ovarian tumours: a multi-class classification exploratory study

Abstract

Background

Accurate preoperative identification of ovarian tumour subtypes is imperative for patients as it enables physicians to custom-tailor precise and individualized management strategies. So, we have developed an ultrasound (US)-based multiclass prediction algorithm for differentiating between benign, borderline, and malignant ovarian tumours.

Methods

We randomised data from 849 patients with ovarian tumours into training and testing sets in a ratio of 8:2. The regions of interest on the US images were segmented and handcrafted radiomics features were extracted and screened. We applied the one-versus-rest method in multiclass classification. We inputted the best features into machine learning (ML) models and constructed a radiomic signature (Rad_Sig). US images of the maximum trimmed ovarian tumour sections were inputted into a pre-trained convolutional neural network (CNN) model. After internal enhancement and complex algorithms, each sample’s predicted probability, known as the deep transfer learning signature (DTL_Sig), was generated. Clinical baseline data were analysed. Statistically significant clinical parameters and US semantic features in the training set were used to construct clinical signatures (Clinic_Sig). The prediction results of Rad_Sig, DTL_Sig, and Clinic_Sig for each sample were fused as new feature sets, to build the combined model, namely, the deep learning radiomic signature (DLR_Sig). We used the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) to estimate the performance of the multiclass classification model.

Results

The training set included 440 benign, 44 borderline, and 196 malignant ovarian tumours. The testing set included 109 benign, 11 borderline, and 49 malignant ovarian tumours. DLR_Sig three-class prediction model had the best overall and class-specific classification performance, with micro- and macro-average AUC of 0.90 and 0.84, respectively, on the testing set. Categories of identification AUC were 0.84, 0.85, and 0.83 for benign, borderline, and malignant ovarian tumours, respectively. In the confusion matrix, the classifier models of Clinic_Sig and Rad_Sig could not recognise borderline ovarian tumours. However, the proportions of borderline and malignant ovarian tumours identified by DLR_Sig were the highest at 54.55% and 63.27%, respectively.

Conclusions

The three-class prediction model of US-based DLR_Sig can discriminate between benign, borderline, and malignant ovarian tumours. Therefore, it may guide clinicians in determining the differential management of patients with ovarian tumours.

Peer Review reports

Introduction

Ovarian tumours are of various histological types, including benign, borderline, and malignant lesions [1,2,3]. Benign tumours have good prognosis and are treated conservatively and with regular follow-up observations [2, 4]. Epithelial hyperplasia and nuclear atypia are more prominent in borderline ovarian tumours (BOTs) than in benign ovarian tumours; however, BOTs have no stromal invasion, unlike ovarian malignancies [5]. BOTs have good prognosis, with a 10-year survival rate of > 95% for stages I, II, and III [6]. The primary treatment for BOTs is surgical intervention; however, more than one-third of BOTs cases occur in women aged under 40 years who may want to conceive in the future [1]. Therefore, prioritising fertility preservation in young women desiring to have children is crucial. Patients with malignant ovarian tumours should be referred to gynaecologic oncologists for further diagnosis and treatment, and depending on the stage of cancer, debulking surgery and chemotherapy may be considered [7]. Different types of ovarian tumours have distinct clinical and pathological characteristics, treatment strategies, and prognoses. The early detection and treatment of ovarian malignancies can improve patient outcomes [8]. Therefore, the preoperative identification of the nature of ovarian tumours is critical for patients and can guide physicians in developing individualised and precise management plans.

Ultrasonography, especially transvaginal, is considered the primary method for evaluating adnexal tumours [9, 10]. Currently, subjective assessment by ultrasound (US) experts is a relatively good method of distinguishing the nature of ovarian tumours. However, US specialists are few, and differences in subjective diagnoses among US physicians with different experience levels exist [11, 12]. Therefore, objectively and quantitatively analysing the various imaging features that may reveal the potential biological characteristics of tumours in a reproducible manner is necessary.

Radiomics is an emerging field of quantitative imaging that can significantly impact personalised medicine. They can mine quantitative features from medical images using high-throughput methods, which are then transformed into objective and structured data through complex algorithms and applied to clinical decision support systems to improve diagnosis, prognosis assessment, and prediction accuracy [13, 14]. Previous studies on computed tomography (CT)/magnetic resonance imaging (MRI)/US-based radiomics for differentiating benign and malignant ovarian tumours achieved satisfactory diagnostic results [15,16,17,18]. However, radiomic features are predefined, including morphology, intensity, texture, and wavelet features, which are superficial and low-order, and cannot represent the heterogeneity of the entire tumour [19, 20]. Therefore, to accurately classify ovarian tumours, studying their deeper- and higher-level features is necessary.

Deep learning (DL) is a branch of machine learning (ML) that allows computing models with multiple processing layers to learn data representations at numerous abstraction levels [21]. The convolutional neural network (CNN) is the most commonly used DL architecture type in medical image analysis [22]. Suggestions that CNN-extracted features can provide various high-order features of images and apply them to specific clinical outcomes exist [20]. Successful application of DL requires a large number of training sets. However, medical data sets are often limited in number. Many practical applications currently use CNNs pre-trained on ImageNet, known as transfer learning (TL), to replace DL [23, 24]. Research using deep transfer learning (DTL) to classify benign and malignant ovarian tumours has been successful [11, 12, 25]. However, BOTs were categorised as malignant ovarian tumours for statistical analysis. Combining DL classification networks with traditional hand-crafted radiomics frameworks is a new development [26, 27]. Few reports exist on US-based combined DL radiomics (DLR) models as multi-classification prediction models for classifying ovarian tumours as benign, borderline, or malignant. We hypothesised that DLR could differentiate between benign, borderline, and malignant ovarian tumours. Hence, this study aimed to develop an US-based DLR to identify benign, borderline, and malignant ovarian lesions.

Materials and methods

Study design and participants

We enrolled 849 patients with ovarian tumours confirmed by histopathological examination after surgical removal from July 2014 to October 2022. The inclusion criteria were: (a) complete US examination within 1 month before surgery and (b) a clear and definite US image of the target lesion. The exclusion criteria were: (a) poor image quality, (b) absent or incomplete US and clinical data, (c) pregnancy, (d) history of tumours in other parts of the body and ovarian metastatic cancer, (e) previous treatment before US examination or surgery, and (f) pathological diagnosis obtained through biopsy and uncertain pathology results. A flowchart of the participants is shown in Fig. 1.

Fig. 1
figure 1

Inclusion and exclusion criteria for patients with ovarian tumours for the training and testing sets. Abbreviation: BOTs = borderline ovarian tumours

The study population was categorised into different labels based on pathological results, with benign ovarian tumours labelled as “class 0”, BOT as “class 1”, and malignant ovarian tumours as “class 2”. Participant data was randomised into training and testing sets in a ratio of 8:2 using Python’s statistical package. Our data random partitioning adopted a stratified method to handle imbalanced data between the training and testing sets; hence, the proportion of patients with benign, borderline, and malignant ovarian tumours in the total study population, training set, and testing set was similar. No data overlap occurred between the training and testing sets, avoiding the repeated use of data from the same patient [28].

The training set was used to learn the parameters and build the model, whereas the testing set was used to evaluate the generalisability of the selected model and prevent overfitting.

Collecting clinical parameters

Preoperative clinical data of all patients, including age, menopausal status, height, weight, body mass index (BMI), carbohydrate antigen 125 (CA125), red blood cell count (RBC), white blood cell count (WBC), neutrophil count (N), lymphocyte count (L), monocyte count (M), platelet count (PLT), and haemoglobin were obtained from the patient’s electronic medical records. BMI and some inflammation-related risk factors, such as the neutrophil-to-lymphocyte ratio (NLR), derived neutrophil-to-lymphocyte ratio (dNLR), platelet-to-lymphocyte ratio (PLR), lymphocyte-to-monocyte ratio (LMR), and systemic immune-inflammation index (SII), were calculated using the following simple formulas:

$$ BMI=\frac{weight\left(kg\right)}{{height}^{2}\left({m}^{2}\right)}$$
$$ NLR=\frac{N\left({10}^{9}\right)}{L\left({10}^{9}\right)}$$
$$ dNLR=\frac{N\left({10}^{9}\right)}{(WBC-N)\left({10}^{9}\right)}$$
$$ PLR=\frac{PLT\left({10}^{9}\right)}{L\left({10}^{9}\right)}$$
$$ LMR=\frac{L\left({10}^{9}\right)}{M\left({10}^{9}\right)}$$
$$ SII=\frac{N\left({10}^{9}\right)\times PLT\left({10}^{9}\right)}{L\left({10}^{9}\right)}$$

Ultrasound data acquisition

All participants underwent transvaginal ultrasonography whenever possible. If a mass was too large to be fully displayed on transvaginal ultrasonography, it could be supplemented with a transabdominal US. Transrectal or transabdominal ultrasonography could be performed if a patient was unsuitable for transvaginal ultrasonography. The following US equipment was used in the study: GE Voluson E10, GE Voluson E8, GE Healthcare (GE Medical Systems, Zipf, Austria), and Mindray Resona R9 (Mindray Bio-Medical Electronics Co., Ltd., China), with RIC5-9-D, V11-3HU transvaginal US probes, and C1-5-D and SC6-1U abdominal US probes. Recorded US semantic features included: maximum diameter of the lesion (≤ 50, 50–100, and ≥ 100 mm), characteristics of the mass (cystic, cystic-solid mixed, solid), colour Doppler score (1, no blood flow signal; 2, low blood flow signal; 3, moderate blood flow signal; 4, rich blood flow signal), laterality of the mass (unilateral or bilateral), and ascites (present or absent). If a patient had more than one ovarian mass, we selected the mass with the most complex morphology or the largest for further assessment [12, 29, 30].

The specialized assessment of ultrasound images

Initially, the ultrasound image was assessed by Doctor A, a seasoned gynecology and obstetrics ultrasound specialist with ten years of professional experience, who provided the initial diagnosis. Subsequently, Doctor B, another gynecology and obstetrics ultrasound expert with over 15 years of experience, confirmed the diagnosis. In cases of discordant opinions, a senior expert in gynecology and obstetrics ultrasound with more than two decades of experience was consulted, leading to a consensus through collaborative discussion. These doctors were unaware of the patient’s clinical and biochemical indicators or pathological results.

Image pre-processing and regions of interest (ROI) segmentation

The grey-level ranges of two-dimensional images obtained using different US devices vary significantly, and the voxel spacing of images obtained using different US devices are typically different. To address these problems, we employed a fixed-resolution resampling method.

The US images were imported into the ITK-SNAP 3.8.0 software (http://www.itksnap.org) for manual ROI segmentation. Segmentation of all ROI was completed by A (an US expert with > 10 years of experience) and confirmed by B (an US expert with > 15 years of experience). When there were differences in opinion, a senior physician (an US expert with > 20 years of experience) was consulted for joint decision-making. To ensure the robustness and repeatability of the extracted radiomics features, we randomly selected 50 US images from the dataset two weeks later, in which A re-delineated the ROIs and C (an US expert with 12 years of experience) independently delineated the ROI simultaneously. All the US experts were blinded to the clinical and pathological results of the study population.

For DTL, the slice of the US image with the largest tumour area was trimmed to represent each patient. The grey values were normalised to the range [-1, 1] using a min-max transformation. Then, each cropped subregion US image was resized to 224 × 224 by the nearest interpolation method and saved as a “.png” file to meet the requirements for input into a CNN model.

Hand-crafted radiomics feature extraction and selection

We employed PyRadiomics (http://pyradiomics.readthedocs.io) to extract the handcrafted radiomic features. Subsequently, Z-score normalisation was performed to eliminate differences in the value scales of the extracted features.

A total of 1476 handcrafted radiomics features were extracted from tasks 1 and 2, including the first-order features, shape features, gray-level dependence matrix (GLDM), gray-level size zone matrix (GLSZM), gray-level run length matrix (GLRLM), and gray-level co-occurrence matrix (GLCM). The number and proportion of handcrafted radiomics features are presented in Fig. 2. The P-values for all handcrafted features are shown in Fig. 3.

Fig. 2
figure 2

The proportion of hand-crafted radiomics features. Abbreviation: GLDM = grey-level dependence matrices, GLSZM = grey-level size zone matrices, GLRLM = grey-level run length matrices, GLCM = grey-level co-occurrence matrices

Fig. 3
figure 3

All hand-crafted radiomics features’ corresponding P-value results. Abbreviation: GLDM = grey-level dependence matrices, GLSZM = grey-level size zone matrices, GLRLM = grey-level run length matrices, GLCM = grey-level co-occurrence matrices

First, we retained hand-crafted radiomic features with intra-/inter-class correlation coefficient > 0.8, to ensure the robustness and repeatability of these features. Only 1,444 features with P < 0.05 after a T- or Mann–Whitney U-test were retained. Subsequently, spearman correlation analysis was used to calculate the correlation between features. A feature with a correlation coefficient of more than 0.9 between any two features is retained; thus, using a greedy recursive deletion strategy to maintain the features strongly correlated with the predicted target, 295 features were retained. Finally, least absolute shrinkage and selection operator (LASSO) regression algorithms were used for feature selection. Depending on the regulation weight λ, LASSO shrinks all regression coefficients towards zero and sets the coefficients of the irrelevant features precisely to zero. We employed 10-fold cross-validation with the minimum criteria to determine the optimal λ, where the final value of λ (0.016768) yielded the minimum cross-validation error. We retained 53 non-zero-coefficient features as optimal features.

The deep transfer learning procedure

We used DTL, a CNN model pre-trained on the ImageNet dataset, to avoid overfitting owing to the limited size of the training dataset.

Data augmentation is often required to improve DTL’s prediction performance and generalisation ability in image classification because of imbalanced or insufficient data. Hence, we utilised horizontal flipping and random cropping for data augmentation, which helped increase the sample size and enhance the model performance.

To better perform the generalisation, we carefully set the learning rate. In this study, we adopted a cosine-decay learning rate algorithm. The learning rates are presented in Additional file 1.

Signature building

The baseline clinical data were analysed in the training set. Clinical parameters and ultrasonic semantic features with P < 0.05 were selected, and spearman correlation analysis was used to determine the linear relationship between these parameters. Parameters without a significant linear correlation were inputted into the support vector machine model to build clinical signature (Clinic_Sig).

After the LASSO regression feature screening, the optimal features were input into the Light Gradient Boosting Machine (LightGBM) model to construct radiomic signature (Rad_Sig).

After the US image of the mass with the largest section was inputted into a ResNet50 model, the prediction probability of each sample was used as deep transfer learning (DTL_Sig). Gradient-weighted class activation mapping (Grad-CAM) was applied to visualise the internal network algorithm and explain the decision basis of the CNN model.

We fused the prediction results of Rad_Sig, DTL_Sig, and Clinic_Sig for each sample as new features, put them into the Gradient Boosting model, and constructed a combined model on the training set, namely deep learning radiomic signature (DLR_Sig).

Model assessment

In this study, we employed a one-versus-rest method, which is often applied in multiclass classification. We evaluated the model’s performance based on receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC). We used Precision, Recall, F1 score, macro-average, micro-average, and weighted average to assess the class of discrimination of one-versus-rest for the ovarian tumours of each group and the whole. A confusion matrix was used to analyse the errors in the model.

Statistical analysis

Statistical analysis was performed using Python (https://www.python.org/). Normally distributed variables are reported as mean ± standard deviation, whereas non-normally distributed variables are reported as median (interquartile range). Categorical variables are expressed as frequencies (percentages). One-way analysis of variance was used to compare the three data groups with normality and homogeneity criteria, and a rank-sum nonparametric test for multiple independent samples was adopted for variables with no normality and homogeneity. Categorical data were analysed using the chi-square (χ2) test. A two-sided P<0.05 was considered statistically significant.

Results

Patient characteristics

We included 849 patients in this study. Among them, 549 (64.66%), 55 (6.48%), and 245 (28.86%) had benign, borderline, and malignant ovarian tumours, respectively. The proportions of benign, borderline, and malignant ovarian tumours in the entire study group, training set, and testing set were approximately the same. The baseline characteristics are shown in Table 1.

Table 1 Training and testing sets of clinical parameters and semantic features of ultrasound

The ultrasound expert assessment the benign, borderline, and malignant ovarian tumours

Ultrasound specialists demonstrated a high level of accuracy in distinguishing between benign and malignant ovarian tumours, with rates of 95.80% and 82.80%, respectively. Conversely, the accuracy in identifying borderline ovarian tumours was notably lower at 34.50% (Table 2).

Table 2 The expert assessment the ultrasound images

The confusion matrix of the three-class classification prediction model

We used the confusion matrix to understand where the classifier model made the classification errors and their proportions (Fig. 4; Table 3). These multiclass classification prediction models had a high rate of correctly distinguishing benign ovarian tumours, 89.91%, 88.99%, 86.24%, and 82.57%, respectively). Clinic_Sig and Rad_Sig showed relatively poor accuracy in determining malignant ovarian tumours (16.33% and 38.78%, respectively). The classifier models Clinic_Sig and Rad_Sig cannot recognise BOT. The proportion of BOT identified by DLR was the highest at 54.55%.

Fig. 4
figure 4

Confusion matrix of three-class classification results based on the test set. (4a) Clinic_Sig; (4b) Rad_Sig; (4c) DTL_Sig; (4d) DLR_Sig. Class 0: benign ovarian tumours; class 1: BOT; class 2: malignant ovarian tumours. LightGBM, Light Gradient Boosting Machine

Table 3 The error analysis of the three-class classification prediction model

Classification performance

The DLR_Sig three-class prediction model had the best overall and class-specific classification performance, with the micro/macro average AUC 0.90 and 0.84 on the testing set, respectively. The categories of identification AUC were 0.84 for benign, 0.85 for borderline, and 0.83 for malignant ovarian tumours (Fig. 5; Table 4).

Fig. 5
figure 5

Three-class (one-vs-rest) ROC of the test set. (5a) Clinic_Sig; (5b) Rad_Sig; (5c) DTL_Sig; (5d) DLR_Sig. Class 0: benign ovarian tumours; class 1: BOT; class 2: malignant ovarian tumours. Micro- and macro-average ROC indicated the overall distinguishing ability of the three-class classification. LightGBM, Light Gradient Boosting Machine

Table 4 Overall and class-specific classification performance

Application of grad-CAM

Grad-CAM, which can produce a coarse localisation map highlighting the critical regions for classification targets, is proposed as a method for visualising the decisions of CNN models. The red areas of the heat map are crucial references for model decision-making [31]. The site of concern for US diagnosis is consistent with the area of concern for CNN decision making (Fig. 6).

Fig. 6
figure 6

The Resnet50 model with Grad-CAM was used on ovarian tumour patients. (6a–c) A solid hypoechoic mass in one patient’s pelvis, 100 mm in diameter. (6a) US image; (6b) Grad-CAM; the red area is the basis of decision-making for Resnet50; (6c) Histopathological results: theca cell tumour (40x). (6d–f) A cystic-solid mixed mass in a patient’s pelvis, 112 mm in diameter. (6d) US image; (6e) Grad-CAM; the red area is the basis of decision-making for Resnet50;(6f) Histopathological results: borderline ovarian tumour (40x)

In Fig. 6a, b, and c, there was a solid low-echoic mass in one patient’s pelvis, 100 mm in diameter. Rich blood flow signals were observed in and around the mass, with a CA125 of 8.67 U/ml. An US expert suggested that the patient had a malignant ovarian tumour. However, DTL_Sig predicted benign lesions with a probability of 97.35%. Pathological results showed that it was a benign theca cell tumour. The prediction of DTL_Sig was highly consistent with the pathological diagnosis.

In Fig. 6d, e, and f, there was a cystic-solid mixed mass in a patient’s pelvis, which was 112 mm in diameterand had a CA125 level of 206 U/ml. An US expert suggested that the patient had a malignant ovarian tumour. However, DTL_Sig indicated a BOT with an 85.98% probability. A pathological diagnosis of BOT was made. DTL_Sig prediction was highly consistent with the pathological diagnosis.

Discussion

The accurate prediction of the category of ovarian tumours is critical for patient-centred care. Studies on the multiclass classification of DLR to classify ovarian tumours are relatively scarce. In this study, we constructed four multiclassification prediction models to classify benign, borderline, and malignant ovarian tumours. We found that the DLR prediction model had the optimum ability to classify ovarian tumours and generalise the testing set.

Ultrasonography is the primary method for screening ovarian tumours. Serum tumour markers are essential for discovering and treating ovarian cancer, and CA125 is the most important biomarker for evaluating ovarian cancer [32]. Inflammation is vital in the development and progression of ovarian cancer [33]. Therefore, we collected US semantic features, serum tumour markers, and related inflammatory factors from the study population. These US semantic features and clinical parameters are typically obtained during routine examinations and do not add additional burden to the patient. We selected some semantic elements, serum tumour markers, and related inflammatory factors to construct Clinic_Sig. The Clinic_Sig three-class prediction model had poor overall and class-specific classification performance and could not predict BOT; the precision, recall, and F1 scores were all zero.

The US examinations were subjective. US experts have higher diagnostic accuracy than less experienced doctors; however, US experts are few [11]. Recently, radiomics has become a powerful new method for quantifying features from medical images, including potential pathophysiological information of reference cancer tissues [34]. Some studies have used MRI/CT/US-based radiomics to differentiate between benign and malignant ovarian tumours with higher diagnostic performance [15, 35, 36]. However, these studies did not mention the classification of BOT. Qi et al. [16] established and validated US-based radiomics models to discriminate between benign, borderline, and malignant serous ovarian tumours and provided preoperative diagnostic information to differentiate the nature of ovarian tumours. However, this was a binary classification study. In our research, the Rad_Sig three-class prediction model could not predict BOT, and the precision, recall, and F1 scores were all zero.

DL is becoming increasingly essential for image pattern recognition [21]. Considering the limited scale of medical datasets, we used TL to replace DL. TL is beneficial because it improves the performance of a model built on small samples by utilising the knowledge learned in similar classification tasks [28]. Gao et al. [25] and Christiansen et al. [11] developed a DTL model to identify benign and malignant ovarian tumours, equivalent to the diagnostic level of an US specialist. Chen et al. [12] developed DTL algorithms to distinguish malignant from benign ovarian tumours, comparable to expert subjective and ovarian adnexal reporting and data system assessments. However, they classified BOT as malignant ovarian tumours for statistical analysis. We used models pre-trained on ImageNet Resnet50 [11, 37]. The DTL_Sig three-class prediction model had good overall and class-specific classification performance, with the micro/macro average AUC 0.89 and 0.85 on the test set, respectively. Categories of identification AUC were 0.87 for benign, 0.82 for borderline, and 0.84 for malignant ovarian tumours. Although DTL performs well in various classification prediction tasks, it is a black-box algorithm that lacks interpretability, which restricts its application [31, 38]. Grad-CAM is employed as a method of depicting the decision-making of DL. In our study, as shown in Fig. 6, the site of concern for US experts making the diagnosis was consistent with the area of concern for CNN decision-making using Grad-CAM, and the DTL_Sig predictions were highly compatible with the pathological diagnosis results.

The combination of traditional manual radiomics and DTL algorithms, namely DLR, can effectively improve the accuracy and reliability of model predictions. It is currently a popular topic in ML for tumour research. Many studies [20, 38,39,40] show that the DLR model has a better prediction efficacy than Rad_Sig or DTL_Sig alone. The fusion process of data between traditional radiomics and DTL includes the fusion of features and decision levels, and the fusion of features often leads to overfitting because of many features [38]. We constructed a combined model for the training set by fusing the predicted probabilities of Clinic_Sig, Rad_Sig, and DTL_Sig for each sample. The combined three-class prediction model, DLR_Sig, had the best overall and class-specific classification performance, with the micro/macro average AUC 0.90 and 0.84 on the testing set, respectively. Categories of identification AUC were 0.84 for benign, 0.85 for borderline, and 0.83 for malignant ovarian tumours. The combined three-class prediction model performance for predicting BOT was the best, and the categories of identification AUC, Precision, Recall, F1 score, and accuracy had the highest performances of 0.85, 42.86%, 54.55%, 57.14%, and 93.31%, respectively. The prevalence of BOTs predicted by DLR_Sig (54.55%) exceeded that determined by ultrasound experts (34.50%).

This study had limitations. First, this was a retrospective single-centre study with a small sample size. Larger prospective and multicentre studies are required to evaluate the applicability of predictive models in clinical practice. Second, owing to the strict inclusion and exclusion criteria for data in this study, bias could have been introduced in the model’s training. Thirdly, in this study, we extracted features from two-dimensional US images. In future studies, we will include other modalities such as colour Doppler flow imaging, spectral Doppler imaging, and contrast-enhanced US to provide more predictive information. Lastly, ROI delineation and cropping of the top section of the tumour represented only one slice of the lesion and could not describe the heterogeneity of the entire tumour. In the future, we plan to store dynamic images of the whole tumour and input them into ML to obtain more comprehensive information.

Conclusion

We developed a combined multiclass classification model that integrated clinical and traditional radiomics with DTL decision-level information to discriminate the nature of ovarian tumours. The performance and generalisation of this model have intensified its feasibility for distinguishing between benign, borderline, and malignant ovarian tumours.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Abbreviations

US:

ultrasound

ML:

machine learning

CNN:

convolutional neural network

Clinic_Sig:

clinical signatures

Rad_Sig:

radiomic signature

DTL_Sig:

deep transfer learning signature

DLR_Sig:

deep learning radiomic signature

AUC:

area under the curve of ROC

BOTs:

borderline ovarian tumours

CT:

computed tomography

MRI:

magnetic resonance imaging

DL:

deep learning

TL:

transfer learning

DTL:

deep transfer learning

DLR:

DL radiomics

BMI:

body mass index

CA125:

carbohydrate antigen 125

RBC:

red blood cell

WBC:

white blood cell

N:

neutrophil

L:

lymphocyte

M:

monocyte

PLT:

platelet count

NLR:

neutrophil-to-lymphocyte ratio

dNLR:

derived neutrophil-tolymphocyte ratio

PLR:

platelet-to-lymphocyte ratio

LMR:

lymphocyte-to- monocyte ratio

SII:

systemic immune-inflammation index

ROI:

regions of interest

GLDM:

grey-level dependence matrices

GLSZM:

grey-level size zone matrices

GLRLM:

grey-level run length matrices

GLCM:

grey-level co-occurrence matrices

LASSO:

least absolute shrinkage and selection operator

LightGBM:

light Gradient Boosting Machine

Grad-CAM:

Gradient-weighted class activation mapping

ROC:

receiver operating characteristic curves

References

  1. Maramai M, Barra F, Menada MV, Stigliani S, Moioli M, Costantini S, et al. Borderline ovarian tumours: management in the era of fertility-sparing surgery. Ecancermedicalscience. 2020;14:1031.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Sayasneh A, Ekechi C, Ferrara L, Kaijser J, Stalder C, Sur S, et al. The characteristic ultrasound features of specific types of ovarian pathology (review). Int J Oncol. 2015;46(2):445–58.

    Article  PubMed  Google Scholar 

  3. Jayson GC, Kohn EC, Kitchener HC, Ledermann JA. Ovarian cancer. Lancet (London England). 2014;384(9951):1376–88.

    Article  PubMed  Google Scholar 

  4. Meys EMJ, Jeelof LS, Achten NMJ, Slangen BFM, Lambrechts S, Kruitwagen R, et al. Estimating risk of malignancy in adnexal masses: external validation of the ADNEX model and comparison with other frequently used ultrasound methods. Ultrasound Obstet Gynecology: Official J Int Soc Ultrasound Obstet Gynecol. 2017;49(6):784–92.

    Article  CAS  Google Scholar 

  5. Prat P J. Pathology of borderline and invasive cancers. Best Pract Res Clin Obstet Gynecol. 2017;41:15–30.

    Article  Google Scholar 

  6. May J, Skorupskaite K, Congiu M, Ghaoui N, Walker GA, Fegan S, et al. Borderline Ovarian tumors: Fifteen Years’ experience at a Scottish Tertiary Cancer Center. Int J Gynecol cancer: Official J Int Gynecol Cancer Soc. 2018;28(9):1683–91.

    Article  Google Scholar 

  7. Fung-Kee-Fung M, Kennedy EB, Biagi J, Colgan T, D’Souza D, Elit LM, et al. The optimal organization of gynecologic oncology services: a systematic review. Curr Oncol (Toronto Ont). 2015;22(4):e282–93.

    Article  CAS  Google Scholar 

  8. Reid BM, Permuth JB, Sellers TA. Epidemiology of ovarian cancer: a review. Cancer Biology Med. 2017;14(1):9–32.

    Article  CAS  Google Scholar 

  9. Borrelli GM, de Mattos LA, Andres MP, Gonçalves MO, Kho RM, Abrão MS. Role of imaging tools for the diagnosis of Borderline ovarian tumors: a systematic review and Meta-analysis. J Minim Invasive Gynecol. 2017;24(3):353–63.

    Article  PubMed  Google Scholar 

  10. Chen H, Qian L, Jiang M, Du Q, Yuan F, Feng W. Performance of IOTA ADNEX model in evaluating adnexal masses in a gynecological oncology center in China. Ultrasound Obstet Gynecology: Official J Int Soc Ultrasound Obstet Gynecol. 2019;54(6):815–22.

    Article  CAS  Google Scholar 

  11. Christiansen F, Epstein EL, Smedberg E, Åkerlund M, Smith K, Epstein E. Ultrasound image analysis using deep neural networks for discriminating between benign and malignant ovarian tumors: comparison with expert subjective assessment. Ultrasound Obstet Gynecology: Official J Int Soc Ultrasound Obstet Gynecol. 2021;57(1):155–63.

    Article  CAS  Google Scholar 

  12. Chen H, Yang BW, Qian L, Meng YS, Bai XH, Hong XW, et al. Deep learning prediction of ovarian malignancy at US compared with O-RADS and Expert Assessment. Radiology. 2022;304(1):106–13.

    Article  PubMed  Google Scholar 

  13. Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Reviews Clin Oncol. 2017;14(12):749–62.

    Article  Google Scholar 

  14. Yip SS, Aerts HJ. Applications and limitations of radiomics. Phys Med Biol. 2016;61(13):R150–66.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Zhang H, Mao Y, Chen X, Wu G, Liu X, Zhang P, et al. Magnetic resonance imaging radiomics in categorizing ovarian masses and predicting clinical outcome: a preliminary study. Eur Radiol. 2019;29(7):3358–71.

    Article  PubMed  Google Scholar 

  16. Qi L, Chen D, Li C, Li J, Wang J, Zhang C, et al. Diagnosis of ovarian neoplasms using Nomogram in Combination with Ultrasound Image-based Radiomics signature and clinical factors. Front Genet. 2021;12:753948.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Song XL, Ren JL, Zhao D, Wang L, Ren H, Niu J. Radiomics derived from dynamic contrast-enhanced MRI pharmacokinetic protocol features: the value of precision diagnosis ovarian neoplasms. Eur Radiol. 2021;31(1):368–78.

    Article  CAS  PubMed  Google Scholar 

  18. Yu XP, Wang L, Yu HY, Zou YW, Wang C, Jiao JW, et al. MDCT-Based Radiomics features for the differentiation of Serous Borderline ovarian tumors and serous malignant ovarian tumors. Cancer Manage Res. 2021;13:329–36.

    Article  Google Scholar 

  19. Gao W, Wang W, Song D, Yang C, Zhu K, Zeng M, et al. A predictive model integrating deep and radiomics features based on gadobenate dimeglumine-enhanced MRI for postoperative early recurrence of hepatocellular carcinoma. Radiol Med. 2022;127(3):259–71.

    Article  PubMed  Google Scholar 

  20. Liu P, Liang X, Liao S, Lu Z. Pattern classification for ovarian tumors by Integration of Radiomics and Deep Learning features. Curr Med Imaging. 2022;18(14):1486–502.

    Article  PubMed  Google Scholar 

  21. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.

    Article  CAS  PubMed  Google Scholar 

  22. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, et al. Deep learning: a primer for radiologists. Radiographics: Rev Publication Radiological Soc North Am Inc. 2017;37(7):2113–31.

    Article  Google Scholar 

  23. Bo L, Zhang Z, Jiang Z, Yang C, Huang P, Chen T, et al. Differentiation of Brain Abscess from cystic glioma using conventional MRI based on deep transfer learning features and hand-crafted Radiomics features. Front Med. 2021;8:748144.

    Article  Google Scholar 

  24. Feng B, Huang L, Liu Y, Chen Y, Zhou H, Yu T, et al. A transfer learning Radiomics Nomogram for Preoperative Prediction of Borrmann Type IV gastric Cancer from primary gastric lymphoma. Front Oncol. 2021;11:802205.

    Article  PubMed  Google Scholar 

  25. Gao Y, Zeng S, Xu X, Li H, Yao S, Song K, et al. Deep learning-enabled pelvic ultrasound images for accurate diagnosis of ovarian cancer in China: a retrospective, multicentre, diagnostic study. Lancet Digit Health. 2022;4(3):e179–87.

    Article  CAS  PubMed  Google Scholar 

  26. Han W, Qin L, Bay C, Chen X, Yu KH, Miskin N, et al. Deep transfer learning and Radiomics Feature Prediction of Survival of patients with high-Grade Gliomas. AJNR Am J Neuroradiol. 2020;41(1):40–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Hu X, Zhou J, Li Y, Wang Y, Guo J, Sack I, et al. Added value of viscoelasticity for MRI-Based prediction of Ki-67 expression of Hepatocellular Carcinoma using a deep learning combined Radiomics (DLCR) Model. Cancers. 2022;14(11):2575.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Zhang Y, Hong D, McClement D, Oladosu O, Pridham G, Slaney G. Grad-CAM helps interpret the deep learning models trained to classify multiple sclerosis types using clinical brain magnetic resonance imaging. J Neurosci Methods. 2021;353:109098.

    Article  PubMed  Google Scholar 

  29. Cao L, Wei M, Liu Y, Fu J, Zhang H, Huang J, et al. Validation of American College of Radiology Ovarian-Adnexal Reporting and Data System Ultrasound (O-RADS US): analysis on 1054 adnexal masses. Gynecol Oncol. 2021;162(1):107–12.

    Article  PubMed  Google Scholar 

  30. Van Calster B, Van Hoorde K, Valentin L, Testa AC, Fischerova D, Van Holsbeke C, et al. Evaluating the risk of ovarian cancer before surgery using the ADNEX model to differentiate between benign, borderline, early and advanced stage invasive, and secondary metastatic tumours: prospective multicentre diagnostic study. BMJ (Clinical Res ed). 2014;349:g5920.

    Google Scholar 

  31. Hsu ST, Su YJ, Hung CH, Chen MJ, Lu CH, Kuo CE. Automatic ovarian tumors recognition system based on ensemble convolutional neural network with ultrasound imaging. BMC Med Inf Decis Mak. 2022;22(1):298.

    Article  Google Scholar 

  32. Charkhchi P, Cybulski C, Gronwald J, Wong FO, Narod SA, Akbari MR. CA125 and ovarian Cancer: a Comprehensive Review. Cancers. 2020;12(12):3730.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kisielewski R, Tołwińska A, Mazurek A, Laudański P. Inflammation and ovarian cancer–current views. Ginekologia Polska. 2013;84(4):293–7.

    Article  PubMed  Google Scholar 

  34. Chiappa V, Bogani G, Interlenghi M, Salvatore C, Bertolina F, Sarpietro G, et al. The adoption of Radiomics and machine learning improves the diagnostic processes of women with ovarian MAsses (the AROMA pilot study). J Ultrasound. 2021;24(4):429–37.

    Article  PubMed  Google Scholar 

  35. Chiappa V, Interlenghi M, Bogani G, Salvatore C, Bertolina F, Sarpietro G, et al. A decision support system based on radiomics and machine learning to predict the risk of malignancy of ovarian masses from transvaginal ultrasonography and serum CA-125. Eur Radiol Experimental. 2021;5(1):28.

    Article  Google Scholar 

  36. Li S, Liu J, Xiong Y, Pang P, Lei P, Zou H, et al. A radiomics approach for automated diagnosis of ovarian neoplasm malignancy in computed tomography. Sci Rep. 2021;11(1):8730.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2014;115:211–52.

    Article  Google Scholar 

  38. Gong J, Zhang W, Huang W, Liao Y, Yin Y, Shi M, et al. CT-based radiomics nomogram may predict local recurrence-free survival in esophageal cancer patients receiving definitive chemoradiation or radiotherapy: a multicenter study. Radiotherapy Oncology: J Eur Soc Therapeutic Radiol Oncol. 2022;174:8–15.

    Article  CAS  Google Scholar 

  39. Zheng YM, Che JY, Yuan MG, Wu ZJ, Pang J, Zhou RZ, et al. A CT-Based Deep Learning Radiomics Nomogram to predict histological grades of Head and Neck squamous cell carcinoma. Acad Radiol. 2023;30(8):1591–9.

    Article  PubMed  Google Scholar 

  40. Zeng Q, Li H, Zhu Y, Feng Z, Shu X, Wu A, et al. Development and validation of a predictive model combining clinical, radiomics, and deep transfer learning features for lymph node metastasis in early gastric cancer. Front Med. 2022;9:986437.

    Article  Google Scholar 

Download references

Acknowledgements

We sincerely thank Platform Onekey AI for the Code consultation for the study.

Funding

This research was supported by the Natural Science Foundation of Guangxi Zhuang Autonomous.

Region (Grant No. 2023GXNSFDA026010), the Guangxi Medical"139” Project for Training.

High-level Backbone Talents (Grant No. G201903014) and the Guangxi Promotion of Appropriate.

Health Technologies Project (Grant No. S2021055).

Author information

Authors and Affiliations

Authors

Contributions

YD and JW contributed to the study design, data analysis, and manuscript preparation. WG and YX contributed to the quality control of data and algorithms. HC and JY contributed to data collection and investigation. The first draft of the manuscript was written and edited by YD. All authors critically reviewed the manuscript. All authors approved the final manuscript.

Corresponding author

Correspondence to Ji Wu.

Ethics declarations

Ethics approval and consent to participate

This retrospective study was approved by the ethics committee of the People’s Hospital of Guangxi Zhuang Autonomous Region (KY-SY-2021-21) and waived the requirement for informed consent. The procedures used in this study adhere to the tenets of the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, Y., Guo, W., Xiao, Y. et al. Ultrasound-based deep learning radiomics model for differentiating benign, borderline, and malignant ovarian tumours: a multi-class classification exploratory study. BMC Med Imaging 24, 89 (2024). https://doi.org/10.1186/s12880-024-01251-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12880-024-01251-2

Keywords