Variable | N | % |
---|---|---|
thy_classification | 915 | 79.6 |
albumin | 515 | 44.8 |
tsh_value | 413 | 35.9 |
monocyte | 363 | 31.6 |
lymphocytes | 359 | 31.2 |
size_nodule_mm | 319 | 27.7 |
family_history_thyroid_cancer | 281 | 24.4 |
hypertension | 126 | 11.0 |
exposure_radiation | 121 | 10.5 |
compressive_symptoms | 106 | 9.22 |
vocal_cord_paresis | 76 | 6.61 |
hashimotos_thyroiditis | 73 | 6.35 |
graves_disease | 67 | 5.83 |
palpable_nodule | 58 | 5.04 |
bta_u_classification | 50 | 4.35 |
ethnicity | 48 | 4.17 |
rapid_enlargement | 43 | 3.74 |
cervical_lymphadenopathy | 9 | 0.783 |
solitary_nodule | 8 | 0.696 |
incidental_nodule | 5 | 0.435 |
age_at_scan | 0 | 0 |
gender | 0 | 0 |
final_pathology | 0 | 0 |
1 Introduction
Thyroid nodules are common. The challenge in the management of thyroid nodules is differentiating between benign and malignant nodule thyroid nodules.The use fine needle aspiration and cytology (FNAC) still leaves around 20% of patients that cannot be clearly classified as either benign or malignant. This scenario traditionally leads to diagnostic hemithyroidectomy for definitive histology. Other clinical variables such as patients’ demographics, clinical and biochemical factors have been shown to be associated with thyroid cancer in patients with thyroid nodules. This has been utilised in studies evaluating predictors of thyroid cancer with a view of creating a model to aid prediction. Standard practice on the management of thyroid nodules does not utilise these non ultrasound and non cytological factors. Combination of these variables considered to be significant with ultrasound and cytological characteristics may improve management of patients with thyroid nodules. Thyroid nodules are increasingly being incidentally detected with increased use of imaging in the evaluation of non thyroid related pathologies. Thus, leading to increase investigation of thyroid nodules and subsequent increased number of thyroid operations in non diagnostic cases. There are morbidities associated with thyroid surgery including scar, recurrent laryngeal nerve injury, hypothyroidism and hypoparathyroidism. We performed a systematic review to evaluate for predictors of thyroid cancer specifically in patients presenting with thyroid nodules. The systematic review a number of potential important variables that may be useful in the prediction of thyroid cancer in patients with thyroid nodules. The aim of this study was to evaluate the predictors of thyroid cancer with a view of improving prediction of thyroid cancer using computer age statistical inference techniques (Efron and Hastie (2016)).
2 Methods
This study was reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines
2.1 Study design
This was a retrospective cohort study.
2.2 Setting
The study was conducted at the Sheffield Teaching hospitals NHS Foundation Trusts. This is a tertiary referral centre for the management of thyroid cancer
2.3 Participants
We included all consecutive patients who presented with thyroid nodule(s) or that were found to have thyroid nodule(s) on ultrasound done for thyroid pathology or for other non thyroid related pathologies
2.4 Variables
Variable evaluated was based on findings from a systematic review evaluating predictors of thyroid cancer in patients with thyroid nodules. Data on the following variables were collected: patient demographics (age, gender, ethnicity), nodule presentation (incidental nodule, palpable nodule, rapid enlargement, compressive symptoms, vocal paresis), past medical history (hypertension, Graves’ disease, Hashimotos’ thyroiditis, family history of thyroid cancer, exposure to neck radiation), biochemistry (thyroid stimulating hormone, lymphocytes, monocytes), ultrasound characteristics (British Thyroid Association ultrasound (BTA U), nodule size, solitary nodule, nodule consistency, cervical lymphadenopathy), Royal College of Pathology (RCP) FNAC classification, type of thyroid surgery, and histological diagnosis.
2.5 Data source
Data was collected from patients’ case notes and electronic patients’ database using a standardised data collection proforma. This was initially piloted on 30 patients and revised to improve data entry. In addition a number of variables that were not standard collected during workout of patients were not further checked; these include body mass index (BMI), serum thyroglobulin, serum triiodothyronine (T3), thyroxine (T4), thyroglobulin antibody (TgAb), thyroid peroxidase antibody (TP0Ab), and urinary iodine.
2.6 Study size
We sought to have a large data set of at least 100 thyroid nodules with a cancer diagnosis using consecutive sampling technique. We aimed for a total of 1500 patients with thyroid nodules to achieve our target sample size. With the use of modern statistical techniques, we proposed such number will be appropriate to detect important variables if it exists.
2.7 Data analysis
Data was cleaned and analysed using the R Statistical Software R Core Team (2023) and the Tidyverse (Wickham et al. (2019)), Tidymodels (Kuhn and Wickham (2020)) collection of packages.
2.8 Imputation
The dataset is incomplete and there are missing observations across all variables to varying degrees. In order to maximise the sample available for analysis imputation was used to infer missing values. The Multivariat Imputation via Chained Equations (MICE and implemented in the eponymous R package Buuren and Groothuis-Oudshoorn (2011)) was employed which assumes data is missing at random (a difficult assumption to formally test). The approach takes each variable with missing data and attempts to predict it using statistical modelling based on the observed values. In essence it is the same approach as the statistical methods being employed to try and predict Thyroid Cancer and there are a range of statistical techniques available which include
2.9 Modelling
We used a selection of statistic modelling techniques to evaluate association between variables and thyroid cancer in patients with thyroid nodules. The patient population was split into training and testing cohorts in a ratio of 0.75:0.25 and each model is fitted using the training cohort. This split ratio is generally used in traditional machine learning techniques. The training set of the data was used to estimate the relation between variables and thyroid cancer. The larger the training data, the better it is for the model to learn the trends. The test set was used to determine the accuracy of the model in predicting thyroid cancer; the bigger the test data the more confidence we have in the model prognostic values. We used simple randomisation technique for the split to prevent bias in the data split. We ensured that there was no duplicate in the data sets so any test data was not accidentally trained. Furthermore, cross validation was used to estimate the accuracy of the various machine learning models. The k-fold techniques splits the data in ?10 folds, and the data was trained on all but one of the the fold, and the one fold not trained is used to test the data. This was repeated multiple times using a different fold for test and the others for training until all the folds is utilised for training and testing. Following multiple training process with k-fold, we selected the model that has the best predictive value for thyroid cancer in the test cohort. We also used the leave one out (loo) cross-validation to train and test the data set.In this technique, all but one observation is use to train the data set and one observation is use to test the data; this is repeated until all the data test is used for testing and training. The model with the best predictive value was selected.
2.9.1 LASSO / Elastic Net
LASSO (Least Absolute Shrinkage and Selection Operatror) and Elastic Net Zou and Hastie (2005) are regression methods that perform variable selection. The original LASSO method proposed by “Regression Shrinkage and Selection via the Lasso” (1996) allows the coefficients for independent/predictor variables to “shrink” down towards zero, effectively eliminating them from influencing the model, this is often referred to as L1 regularisation. The Elastic Net Zou and Hastie (2005) improves on the LASSO by balancing L1 regularisation with ridge-regression or L2 regularisation which helps avoid over-fitting.
Both methods avoid many of the shortcomings/pitfalls of stepwise variable selection Thompson (1995) Smith (2018) and have been shown to be more accurate in clinical decision making in small datasets with well code, externally selected variables Steyerberg et al. (2001)
2.9.2 Random Forest
To add reference The random forest plot is an extension of the decision tree methodology to reduce variance. Decision trees are very sensitive to the training data set and can lead to high variance; thus potential issues with generalisation of the model. The random forest plot selects random observation of the dataset to create multiple decision trees. Random variables are selected for each tree in the training of the data set. The aggregated output of the generated decision trees is then used to create an estimate.
2.9.3 Gradient Boosting
Gradient boosting is a machine learning algorithm that uses decision tree as a base model. The data is initially trained on this decision tree, but the initial prediction is weak, thus termed a weak based model. In gradient boosting the process is iterative; a sequence of decision trees is added to the initial tree. Each tree learns from the prior tree(s) to improve the model, increasing strength and minimising error.
2.9.4 SVM
Support Vector Machines is an approach that allows observation with a binary classifications to be separated using a hyperplane. It finds a hyperplane that best stratify the two classes i.e benign versus malignant nodules. SVM finds the hyperplane with the maximum margin of separation between the two classes. The support vectors are the data point that are positioned close to the margin of the hyperplane and these used to select the most appropraite hyperplane. The support vectors are the only data points that have an influence on the maximum margin in SVM.
2.9.5 Comparision
3 Results
?@tbl-patient-demographics shows the demographics of patients included in this study. A total of 1364 patients were included in this study with a median (IQR) age of 55 ( 41-69). ?@tbl-clinical-characteristics shows the distribution of clinical variables evaluated between benign and malignant thyroid nodules.
3.1 Data Description
A summary of the variables that are available in this data set can be found in Table 3.
The completeness of the original data is shown in tables ?@tbl-imputation-summary-pmm, ?@tbl-imputation-summary-cart, ?@tbl-imputation-summary-rf, along with summaries from four rounds of imputation for each of three imputation methods. Where variables continuous (e.g. age
or size_nodule_mm
) basic summary statistics in the form of mean, standard deviation, median and inter-quartile range are given. For categorical variables that are logical TRUE
/FALSE
(e.g. palpable_nodule
) the number of TRUE
observations and the percentage (of those with observed data for that variable) are shown along with the number that are Unknown. For categorical variables such as gender
percentages in each category are reported. For all variables an indication of the number of missing observations is also given and it is worth noting that there are 214 instances where the final_pathology
is not known which reduces the sample size to 1150.
3.1.1 Missing Data
More detailed tabulations of missing data by variable are shown in Table 1 which shows the number and percentage of missing data for each variable and by case in Table 2 which shows how much missing data each case has. A visualisation of this is shown in Figure 1 .
NB - Currently there is a bug in the stable release of Quarto which prevents rendering of the missing data figures. It is fixed in development version v1.6.1
(currently available as pre-release, so if things don’t render try upgrading).
Missing Variables | N | % |
---|---|---|
0 | 65 | 5.652 |
1 | 227 | 19.739 |
2 | 229 | 19.913 |
3 | 181 | 15.739 |
4 | 139 | 12.087 |
5 | 102 | 8.870 |
6 | 76 | 6.609 |
7 | 35 | 3.043 |
8 | 30 | 2.609 |
9 | 13 | 1.130 |
10 | 19 | 1.652 |
11 | 15 | 1.304 |
12 | 9 | 0.783 |
13 | 4 | 0.348 |
14 | 3 | 0.261 |
15 | 2 | 0.174 |
16 | 1 | 0.087 |
The MICE package also provides tools for visualising missing data and these are shown in figures Figure 2, ?@fig-mice-vis-missing-biomarker and Figure 4.
The columns of these plots, labelled along the top, show the variable, if a cell is blue it indicates data is present, if it is red it indicates there is missing data. The left-hand side shows the total number of observations for that rows particular combination of variables with number of missing variables indicated on the right. The first row shows that for these variables there are 604 observations with zero missing data across the listed variables, the second row indicates there are 166 observations with just family_history_thyroid_cancer
but there are some with this missing and other variables. The numbers on the bottom of the figure indicate the total number of missing observations for that variable (e.g. for family_history_thyroid_cancer
there is a total of 281 missing observations).
TODO - Workout why out-width: "80%"
isn’t applied to these figures and/or how to make the All
figure readable.
3.1.2 Imputation
The MICE package@mice offers a number of different methods for imputing variables (see [documentation][mice_details]) we have investigated Predictive Mean Matching (PMM), Classification and Regression Trees (CART) and Random Forests (RF). Four rounds of imputation using each method were made.
A comparison of distributions/proportions before and after imputation are presented below to allow assessment of the utility of each method.
The convergence of the imputation methods are shown in figues ?@fig-mice-convergence-pmm, ?@fig-mice-convergence-cart, and ?@fig-mice-convergence-rf.
TODO - Extract the legends from individual plots and add them to the end of each row, see the cowplot shared legends article for pointers on how to do this. Should ideally also get the fill
colours to align with those used by ggmice
.
Characteristic |
Original, N = 1,150 1 |
1, N = 1,150 1 |
2, N = 1,150 1 |
3, N = 1,150 1 |
4, N = 1,150 1 |
5, N = 1,150 1 |
---|---|---|---|---|---|---|
.id | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) |
Age | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) |
gender | ||||||
F | 903 (79%) | 903 (79%) | 903 (79%) | 903 (79%) | 903 (79%) | 903 (79%) |
M | 247 (21%) | 247 (21%) | 247 (21%) | 247 (21%) | 247 (21%) | 247 (21%) |
ethnicity | ||||||
A | 812 (74%) | 843 (73%) | 841 (73%) | 839 (73%) | 845 (73%) | 842 (73%) |
B | 3 (0.3%) | 4 (0.3%) | 4 (0.3%) | 4 (0.3%) | 3 (0.3%) | 3 (0.3%) |
C | 44 (4.0%) | 45 (3.9%) | 47 (4.1%) | 44 (3.8%) | 48 (4.2%) | 45 (3.9%) |
D | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) |
F | 4 (0.4%) | 4 (0.3%) | 4 (0.3%) | 4 (0.3%) | 4 (0.3%) | 4 (0.3%) |
G | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) |
H | 8 (0.7%) | 8 (0.7%) | 8 (0.7%) | 9 (0.8%) | 8 (0.7%) | 8 (0.7%) |
J | 46 (4.2%) | 50 (4.3%) | 49 (4.3%) | 51 (4.4%) | 50 (4.3%) | 49 (4.3%) |
K | 11 (1.0%) | 11 (1.0%) | 13 (1.1%) | 12 (1.0%) | 11 (1.0%) | 12 (1.0%) |
L | 16 (1.5%) | 19 (1.7%) | 16 (1.4%) | 18 (1.6%) | 16 (1.4%) | 17 (1.5%) |
M | 16 (1.5%) | 16 (1.4%) | 16 (1.4%) | 17 (1.5%) | 16 (1.4%) | 18 (1.6%) |
N | 18 (1.6%) | 19 (1.7%) | 18 (1.6%) | 18 (1.6%) | 19 (1.7%) | 20 (1.7%) |
P | 14 (1.3%) | 14 (1.2%) | 15 (1.3%) | 16 (1.4%) | 15 (1.3%) | 14 (1.2%) |
R | 6 (0.5%) | 7 (0.6%) | 6 (0.5%) | 8 (0.7%) | 7 (0.6%) | 7 (0.6%) |
S | 38 (3.4%) | 40 (3.5%) | 43 (3.7%) | 42 (3.7%) | 40 (3.5%) | 40 (3.5%) |
Z | 57 (5.2%) | 61 (5.3%) | 61 (5.3%) | 59 (5.1%) | 59 (5.1%) | 62 (5.4%) |
Unknown | 48 | 0 | 0 | 0 | 0 | 0 |
incidental_nodule | 620 (54%) | 623 (54%) | 622 (54%) | 623 (54%) | 624 (54%) | 623 (54%) |
Unknown | 5 | 0 | 0 | 0 | 0 | 0 |
palpable_nodule | 441 (40%) | 471 (41%) | 470 (41%) | 470 (41%) | 471 (41%) | 470 (41%) |
Unknown | 58 | 0 | 0 | 0 | 0 | 0 |
rapid_enlargement | 19 (1.7%) | 19 (1.7%) | 21 (1.8%) | 22 (1.9%) | 20 (1.7%) | 19 (1.7%) |
Unknown | 43 | 0 | 0 | 0 | 0 | 0 |
compressive_symptoms | 88 (8.4%) | 107 (9.3%) | 103 (9.0%) | 102 (8.9%) | 98 (8.5%) | 101 (8.8%) |
Unknown | 106 | 0 | 0 | 0 | 0 | 0 |
hypertension | 262 (26%) | 287 (25%) | 301 (26%) | 292 (25%) | 293 (25%) | 291 (25%) |
Unknown | 126 | 0 | 0 | 0 | 0 | 0 |
vocal_cord_paresis | 3 (0.3%) | 5 (0.4%) | 3 (0.3%) | 6 (0.5%) | 3 (0.3%) | 3 (0.3%) |
Unknown | 76 | 0 | 0 | 0 | 0 | 0 |
graves_disease | 17 (1.6%) | 20 (1.7%) | 20 (1.7%) | 19 (1.7%) | 19 (1.7%) | 18 (1.6%) |
Unknown | 67 | 0 | 0 | 0 | 0 | 0 |
hashimotos_thyroiditis | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 8 (0.7%) | 9 (0.8%) | 7 (0.6%) |
Unknown | 73 | 0 | 0 | 0 | 0 | 0 |
family_history_thyroid_cancer | 8 (0.9%) | 10 (0.9%) | 14 (1.2%) | 11 (1.0%) | 11 (1.0%) | 16 (1.4%) |
Unknown | 281 | 0 | 0 | 0 | 0 | 0 |
exposure_radiation | 9 (0.9%) | 9 (0.8%) | 9 (0.8%) | 9 (0.8%) | 10 (0.9%) | 10 (0.9%) |
Unknown | 121 | 0 | 0 | 0 | 0 | 0 |
Albumin | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) |
Unknown | 515 | 0 | 0 | 0 | 0 | 0 |
TSH value | 1.48 (0.85, 2.30) | 1.50 (0.89, 2.30) | 1.50 (0.86, 2.40) | 1.50 (0.87, 2.50) | 1.50 (0.89, 2.50) | 1.40 (0.85, 2.30) |
Unknown | 413 | 0 | 0 | 0 | 0 | 0 |
Lymphocytes | 1.94 (1.51, 2.43) | 1.96 (1.54, 2.46) | 1.95 (1.53, 2.45) | 1.94 (1.50, 2.42) | 1.95 (1.53, 2.42) | 1.93 (1.49, 2.42) |
Unknown | 359 | 0 | 0 | 0 | 0 | 0 |
Monocytes | 0.52 (0.42, 0.66) | 0.53 (0.43, 0.66) | 0.52 (0.42, 0.66) | 0.53 (0.42, 0.66) | 0.52 (0.43, 0.66) | 0.52 (0.42, 0.66) |
Unknown | 363 | 0 | 0 | 0 | 0 | 0 |
bta_u_classification | ||||||
U1 | 1 (<0.1%) | 1 (<0.1%) | 2 (0.2%) | 1 (<0.1%) | 1 (<0.1%) | 1 (<0.1%) |
U2 | 860 (78%) | 902 (78%) | 895 (78%) | 898 (78%) | 893 (78%) | 893 (78%) |
U3 | 210 (19%) | 214 (19%) | 222 (19%) | 220 (19%) | 223 (19%) | 224 (19%) |
U4 | 22 (2.0%) | 25 (2.2%) | 23 (2.0%) | 24 (2.1%) | 26 (2.3%) | 22 (1.9%) |
U5 | 7 (0.6%) | 8 (0.7%) | 8 (0.7%) | 7 (0.6%) | 7 (0.6%) | 10 (0.9%) |
Unknown | 50 | 0 | 0 | 0 | 0 | 0 |
solitary_nodule | 320 (28%) | 323 (28%) | 322 (28%) | 323 (28%) | 323 (28%) | 323 (28%) |
Unknown | 8 | 0 | 0 | 0 | 0 | 0 |
Nodule size (mm) | 14 (7, 28) | 13 (6, 27) | 13 (6, 27) | 13 (6, 27) | 12 (6, 26) | 13 (6, 27) |
Unknown | 319 | 0 | 0 | 0 | 0 | 0 |
cervical_lymphadenopathy | 26 (2.3%) | 27 (2.3%) | 27 (2.3%) | 26 (2.3%) | 27 (2.3%) | 26 (2.3%) |
Unknown | 9 | 0 | 0 | 0 | 0 | 0 |
thy_classification | ||||||
Thy1 | 34 (14%) | 362 (31%) | 174 (15%) | 242 (21%) | 179 (16%) | 220 (19%) |
Thy1c | 8 (3.4%) | 19 (1.7%) | 35 (3.0%) | 39 (3.4%) | 44 (3.8%) | 28 (2.4%) |
Thy2 | 63 (27%) | 403 (35%) | 412 (36%) | 360 (31%) | 373 (32%) | 360 (31%) |
Thy2c | 11 (4.7%) | 29 (2.5%) | 34 (3.0%) | 44 (3.8%) | 35 (3.0%) | 38 (3.3%) |
Thy3a | 18 (7.7%) | 36 (3.1%) | 44 (3.8%) | 60 (5.2%) | 61 (5.3%) | 69 (6.0%) |
Thy3f | 74 (31%) | 193 (17%) | 302 (26%) | 296 (26%) | 335 (29%) | 336 (29%) |
Thy4 | 10 (4.3%) | 17 (1.5%) | 42 (3.7%) | 24 (2.1%) | 37 (3.2%) | 29 (2.5%) |
Thy5 | 17 (7.2%) | 91 (7.9%) | 107 (9.3%) | 85 (7.4%) | 86 (7.5%) | 70 (6.1%) |
Unknown | 915 | 0 | 0 | 0 | 0 | 0 |
final_pathology | ||||||
Benign | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) |
Cancer | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) |
1
Median (IQR); n (%) |
Characteristic |
Original, N = 1,150 1 |
1, N = 1,150 1 |
2, N = 1,150 1 |
3, N = 1,150 1 |
4, N = 1,150 1 |
5, N = 1,150 1 |
---|---|---|---|---|---|---|
.id | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) |
Age | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) |
gender | ||||||
F | 903 (79%) | 903 (79%) | 903 (79%) | 903 (79%) | 903 (79%) | 903 (79%) |
M | 247 (21%) | 247 (21%) | 247 (21%) | 247 (21%) | 247 (21%) | 247 (21%) |
ethnicity | ||||||
A | 812 (74%) | 843 (73%) | 842 (73%) | 843 (73%) | 845 (73%) | 845 (73%) |
B | 3 (0.3%) | 3 (0.3%) | 3 (0.3%) | 4 (0.3%) | 3 (0.3%) | 3 (0.3%) |
C | 44 (4.0%) | 46 (4.0%) | 46 (4.0%) | 47 (4.1%) | 47 (4.1%) | 47 (4.1%) |
D | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) |
F | 4 (0.4%) | 4 (0.3%) | 4 (0.3%) | 5 (0.4%) | 4 (0.3%) | 4 (0.3%) |
G | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 8 (0.7%) | 7 (0.6%) |
H | 8 (0.7%) | 8 (0.7%) | 9 (0.8%) | 8 (0.7%) | 8 (0.7%) | 8 (0.7%) |
J | 46 (4.2%) | 48 (4.2%) | 48 (4.2%) | 52 (4.5%) | 49 (4.3%) | 46 (4.0%) |
K | 11 (1.0%) | 11 (1.0%) | 11 (1.0%) | 11 (1.0%) | 12 (1.0%) | 11 (1.0%) |
L | 16 (1.5%) | 16 (1.4%) | 17 (1.5%) | 16 (1.4%) | 16 (1.4%) | 17 (1.5%) |
M | 16 (1.5%) | 17 (1.5%) | 17 (1.5%) | 17 (1.5%) | 18 (1.6%) | 19 (1.7%) |
N | 18 (1.6%) | 19 (1.7%) | 21 (1.8%) | 19 (1.7%) | 19 (1.7%) | 18 (1.6%) |
P | 14 (1.3%) | 16 (1.4%) | 14 (1.2%) | 15 (1.3%) | 14 (1.2%) | 15 (1.3%) |
R | 6 (0.5%) | 8 (0.7%) | 7 (0.6%) | 6 (0.5%) | 6 (0.5%) | 6 (0.5%) |
S | 38 (3.4%) | 41 (3.6%) | 39 (3.4%) | 38 (3.3%) | 41 (3.6%) | 42 (3.7%) |
Z | 57 (5.2%) | 61 (5.3%) | 63 (5.5%) | 60 (5.2%) | 58 (5.0%) | 60 (5.2%) |
Unknown | 48 | 0 | 0 | 0 | 0 | 0 |
incidental_nodule | 620 (54%) | 623 (54%) | 624 (54%) | 624 (54%) | 623 (54%) | 624 (54%) |
Unknown | 5 | 0 | 0 | 0 | 0 | 0 |
palpable_nodule | 441 (40%) | 470 (41%) | 472 (41%) | 468 (41%) | 472 (41%) | 468 (41%) |
Unknown | 58 | 0 | 0 | 0 | 0 | 0 |
rapid_enlargement | 19 (1.7%) | 19 (1.7%) | 20 (1.7%) | 19 (1.7%) | 19 (1.7%) | 21 (1.8%) |
Unknown | 43 | 0 | 0 | 0 | 0 | 0 |
compressive_symptoms | 88 (8.4%) | 102 (8.9%) | 105 (9.1%) | 104 (9.0%) | 101 (8.8%) | 99 (8.6%) |
Unknown | 106 | 0 | 0 | 0 | 0 | 0 |
hypertension | 262 (26%) | 291 (25%) | 293 (25%) | 292 (25%) | 291 (25%) | 292 (25%) |
Unknown | 126 | 0 | 0 | 0 | 0 | 0 |
vocal_cord_paresis | 3 (0.3%) | 4 (0.3%) | 3 (0.3%) | 3 (0.3%) | 3 (0.3%) | 3 (0.3%) |
Unknown | 76 | 0 | 0 | 0 | 0 | 0 |
graves_disease | 17 (1.6%) | 17 (1.5%) | 19 (1.7%) | 18 (1.6%) | 17 (1.5%) | 17 (1.5%) |
Unknown | 67 | 0 | 0 | 0 | 0 | 0 |
hashimotos_thyroiditis | 7 (0.6%) | 9 (0.8%) | 7 (0.6%) | 9 (0.8%) | 8 (0.7%) | 8 (0.7%) |
Unknown | 73 | 0 | 0 | 0 | 0 | 0 |
family_history_thyroid_cancer | 8 (0.9%) | 12 (1.0%) | 9 (0.8%) | 8 (0.7%) | 11 (1.0%) | 16 (1.4%) |
Unknown | 281 | 0 | 0 | 0 | 0 | 0 |
exposure_radiation | 9 (0.9%) | 10 (0.9%) | 9 (0.8%) | 11 (1.0%) | 11 (1.0%) | 10 (0.9%) |
Unknown | 121 | 0 | 0 | 0 | 0 | 0 |
Albumin | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) |
Unknown | 515 | 0 | 0 | 0 | 0 | 0 |
TSH value | 1.48 (0.85, 2.30) | 1.50 (0.88, 2.39) | 1.50 (0.88, 2.50) | 1.50 (0.91, 2.50) | 1.50 (0.90, 2.40) | 1.42 (0.85, 2.30) |
Unknown | 413 | 0 | 0 | 0 | 0 | 0 |
Lymphocytes | 1.94 (1.51, 2.43) | 1.94 (1.54, 2.43) | 1.94 (1.51, 2.43) | 1.95 (1.53, 2.44) | 1.91 (1.50, 2.41) | 1.94 (1.50, 2.43) |
Unknown | 359 | 0 | 0 | 0 | 0 | 0 |
Monocytes | 0.52 (0.42, 0.66) | 0.53 (0.42, 0.66) | 0.53 (0.43, 0.66) | 0.53 (0.43, 0.66) | 0.52 (0.42, 0.66) | 0.52 (0.42, 0.65) |
Unknown | 363 | 0 | 0 | 0 | 0 | 0 |
bta_u_classification | ||||||
U1 | 1 (<0.1%) | 1 (<0.1%) | 1 (<0.1%) | 1 (<0.1%) | 1 (<0.1%) | 2 (0.2%) |
U2 | 860 (78%) | 894 (78%) | 899 (78%) | 903 (79%) | 897 (78%) | 897 (78%) |
U3 | 210 (19%) | 226 (20%) | 219 (19%) | 215 (19%) | 220 (19%) | 220 (19%) |
U4 | 22 (2.0%) | 22 (1.9%) | 24 (2.1%) | 23 (2.0%) | 24 (2.1%) | 24 (2.1%) |
U5 | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 8 (0.7%) | 8 (0.7%) | 7 (0.6%) |
Unknown | 50 | 0 | 0 | 0 | 0 | 0 |
solitary_nodule | 320 (28%) | 322 (28%) | 322 (28%) | 322 (28%) | 322 (28%) | 323 (28%) |
Unknown | 8 | 0 | 0 | 0 | 0 | 0 |
Nodule size (mm) | 14 (7, 28) | 13 (6, 26) | 12 (6, 27) | 13 (6, 26) | 12 (6, 27) | 13 (6, 26) |
Unknown | 319 | 0 | 0 | 0 | 0 | 0 |
cervical_lymphadenopathy | 26 (2.3%) | 26 (2.3%) | 26 (2.3%) | 27 (2.3%) | 26 (2.3%) | 26 (2.3%) |
Unknown | 9 | 0 | 0 | 0 | 0 | 0 |
thy_classification | ||||||
Thy1 | 34 (14%) | 203 (18%) | 192 (17%) | 187 (16%) | 221 (19%) | 188 (16%) |
Thy1c | 8 (3.4%) | 46 (4.0%) | 35 (3.0%) | 36 (3.1%) | 59 (5.1%) | 63 (5.5%) |
Thy2 | 63 (27%) | 278 (24%) | 272 (24%) | 281 (24%) | 258 (22%) | 237 (21%) |
Thy2c | 11 (4.7%) | 45 (3.9%) | 70 (6.1%) | 58 (5.0%) | 70 (6.1%) | 76 (6.6%) |
Thy3a | 18 (7.7%) | 60 (5.2%) | 75 (6.5%) | 74 (6.4%) | 59 (5.1%) | 125 (11%) |
Thy3f | 74 (31%) | 347 (30%) | 321 (28%) | 317 (28%) | 325 (28%) | 264 (23%) |
Thy4 | 10 (4.3%) | 54 (4.7%) | 69 (6.0%) | 79 (6.9%) | 68 (5.9%) | 74 (6.4%) |
Thy5 | 17 (7.2%) | 117 (10%) | 116 (10%) | 118 (10%) | 90 (7.8%) | 123 (11%) |
Unknown | 915 | 0 | 0 | 0 | 0 | 0 |
final_pathology | ||||||
Benign | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) |
Cancer | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) |
1
Median (IQR); n (%) |
Characteristic |
Original, N = 1,150 1 |
1, N = 1,150 1 |
2, N = 1,150 1 |
3, N = 1,150 1 |
4, N = 1,150 1 |
5, N = 1,150 1 |
---|---|---|---|---|---|---|
.id | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) | 576 (288, 863) |
Age | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) | 55 (41, 68) |
gender | ||||||
F | 903 (79%) | 903 (79%) | 903 (79%) | 903 (79%) | 903 (79%) | 903 (79%) |
M | 247 (21%) | 247 (21%) | 247 (21%) | 247 (21%) | 247 (21%) | 247 (21%) |
ethnicity | ||||||
A | 812 (74%) | 851 (74%) | 845 (73%) | 850 (74%) | 849 (74%) | 853 (74%) |
B | 3 (0.3%) | 3 (0.3%) | 3 (0.3%) | 4 (0.3%) | 3 (0.3%) | 3 (0.3%) |
C | 44 (4.0%) | 47 (4.1%) | 46 (4.0%) | 46 (4.0%) | 45 (3.9%) | 45 (3.9%) |
D | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) | 2 (0.2%) |
F | 4 (0.4%) | 4 (0.3%) | 4 (0.3%) | 4 (0.3%) | 4 (0.3%) | 4 (0.3%) |
G | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 8 (0.7%) |
H | 8 (0.7%) | 8 (0.7%) | 8 (0.7%) | 8 (0.7%) | 8 (0.7%) | 8 (0.7%) |
J | 46 (4.2%) | 47 (4.1%) | 49 (4.3%) | 47 (4.1%) | 49 (4.3%) | 48 (4.2%) |
K | 11 (1.0%) | 11 (1.0%) | 15 (1.3%) | 11 (1.0%) | 11 (1.0%) | 11 (1.0%) |
L | 16 (1.5%) | 16 (1.4%) | 16 (1.4%) | 16 (1.4%) | 17 (1.5%) | 16 (1.4%) |
M | 16 (1.5%) | 17 (1.5%) | 17 (1.5%) | 18 (1.6%) | 18 (1.6%) | 16 (1.4%) |
N | 18 (1.6%) | 19 (1.7%) | 19 (1.7%) | 19 (1.7%) | 19 (1.7%) | 20 (1.7%) |
P | 14 (1.3%) | 14 (1.2%) | 15 (1.3%) | 14 (1.2%) | 14 (1.2%) | 14 (1.2%) |
R | 6 (0.5%) | 6 (0.5%) | 6 (0.5%) | 6 (0.5%) | 6 (0.5%) | 6 (0.5%) |
S | 38 (3.4%) | 40 (3.5%) | 39 (3.4%) | 39 (3.4%) | 40 (3.5%) | 39 (3.4%) |
Z | 57 (5.2%) | 58 (5.0%) | 59 (5.1%) | 59 (5.1%) | 58 (5.0%) | 57 (5.0%) |
Unknown | 48 | 0 | 0 | 0 | 0 | 0 |
incidental_nodule | 620 (54%) | 623 (54%) | 623 (54%) | 624 (54%) | 624 (54%) | 622 (54%) |
Unknown | 5 | 0 | 0 | 0 | 0 | 0 |
palpable_nodule | 441 (40%) | 462 (40%) | 472 (41%) | 465 (40%) | 469 (41%) | 467 (41%) |
Unknown | 58 | 0 | 0 | 0 | 0 | 0 |
rapid_enlargement | 19 (1.7%) | 19 (1.7%) | 19 (1.7%) | 19 (1.7%) | 20 (1.7%) | 20 (1.7%) |
Unknown | 43 | 0 | 0 | 0 | 0 | 0 |
compressive_symptoms | 88 (8.4%) | 94 (8.2%) | 100 (8.7%) | 94 (8.2%) | 89 (7.7%) | 95 (8.3%) |
Unknown | 106 | 0 | 0 | 0 | 0 | 0 |
hypertension | 262 (26%) | 282 (25%) | 284 (25%) | 285 (25%) | 281 (24%) | 286 (25%) |
Unknown | 126 | 0 | 0 | 0 | 0 | 0 |
vocal_cord_paresis | 3 (0.3%) | 3 (0.3%) | 3 (0.3%) | 3 (0.3%) | 3 (0.3%) | 3 (0.3%) |
Unknown | 76 | 0 | 0 | 0 | 0 | 0 |
graves_disease | 17 (1.6%) | 17 (1.5%) | 17 (1.5%) | 17 (1.5%) | 17 (1.5%) | 17 (1.5%) |
Unknown | 67 | 0 | 0 | 0 | 0 | 0 |
hashimotos_thyroiditis | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) | 7 (0.6%) |
Unknown | 73 | 0 | 0 | 0 | 0 | 0 |
family_history_thyroid_cancer | 8 (0.9%) | 9 (0.8%) | 8 (0.7%) | 11 (1.0%) | 9 (0.8%) | 9 (0.8%) |
Unknown | 281 | 0 | 0 | 0 | 0 | 0 |
exposure_radiation | 9 (0.9%) | 9 (0.8%) | 9 (0.8%) | 10 (0.9%) | 9 (0.8%) | 9 (0.8%) |
Unknown | 121 | 0 | 0 | 0 | 0 | 0 |
Albumin | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) | 45.0 (43.0, 47.0) |
Unknown | 515 | 0 | 0 | 0 | 0 | 0 |
TSH value | 1.48 (0.85, 2.30) | 1.50 (0.87, 2.40) | 1.50 (0.90, 2.30) | 1.50 (0.88, 2.30) | 1.50 (0.87, 2.30) | 1.50 (0.85, 2.40) |
Unknown | 413 | 0 | 0 | 0 | 0 | 0 |
Lymphocytes | 1.94 (1.51, 2.43) | 1.94 (1.54, 2.43) | 1.94 (1.51, 2.44) | 1.95 (1.54, 2.44) | 1.95 (1.53, 2.42) | 1.94 (1.51, 2.44) |
Unknown | 359 | 0 | 0 | 0 | 0 | 0 |
Monocytes | 0.52 (0.42, 0.66) | 0.53 (0.42, 0.66) | 0.52 (0.42, 0.66) | 0.53 (0.43, 0.66) | 0.52 (0.42, 0.66) | 0.52 (0.42, 0.66) |
Unknown | 363 | 0 | 0 | 0 | 0 | 0 |
bta_u_classification | ||||||
U1 | 1 (<0.1%) | 1 (<0.1%) | 1 (<0.1%) | 1 (<0.1%) | 1 (<0.1%) | 1 (<0.1%) |
U2 | 860 (78%) | 902 (78%) | 904 (79%) | 904 (79%) | 904 (79%) | 899 (78%) |
U3 | 210 (19%) | 216 (19%) | 216 (19%) | 216 (19%) | 215 (19%) | 219 (19%) |
U4 | 22 (2.0%) | 23 (2.0%) | 22 (1.9%) | 22 (1.9%) | 22 (1.9%) | 24 (2.1%) |
U5 | 7 (0.6%) | 8 (0.7%) | 7 (0.6%) | 7 (0.6%) | 8 (0.7%) | 7 (0.6%) |
Unknown | 50 | 0 | 0 | 0 | 0 | 0 |
solitary_nodule | 320 (28%) | 321 (28%) | 321 (28%) | 320 (28%) | 323 (28%) | 321 (28%) |
Unknown | 8 | 0 | 0 | 0 | 0 | 0 |
Nodule size (mm) | 14 (7, 28) | 13 (6, 27) | 12 (6, 26) | 12 (6, 26) | 13 (6, 26) | 13 (6, 26) |
Unknown | 319 | 0 | 0 | 0 | 0 | 0 |
cervical_lymphadenopathy | 26 (2.3%) | 26 (2.3%) | 26 (2.3%) | 26 (2.3%) | 26 (2.3%) | 26 (2.3%) |
Unknown | 9 | 0 | 0 | 0 | 0 | 0 |
thy_classification | ||||||
Thy1 | 34 (14%) | 206 (18%) | 173 (15%) | 136 (12%) | 175 (15%) | 147 (13%) |
Thy1c | 8 (3.4%) | 46 (4.0%) | 38 (3.3%) | 69 (6.0%) | 36 (3.1%) | 32 (2.8%) |
Thy2 | 63 (27%) | 299 (26%) | 283 (25%) | 323 (28%) | 333 (29%) | 341 (30%) |
Thy2c | 11 (4.7%) | 57 (5.0%) | 78 (6.8%) | 71 (6.2%) | 64 (5.6%) | 40 (3.5%) |
Thy3a | 18 (7.7%) | 105 (9.1%) | 115 (10%) | 117 (10%) | 85 (7.4%) | 97 (8.4%) |
Thy3f | 74 (31%) | 291 (25%) | 299 (26%) | 306 (27%) | 325 (28%) | 359 (31%) |
Thy4 | 10 (4.3%) | 48 (4.2%) | 52 (4.5%) | 47 (4.1%) | 61 (5.3%) | 48 (4.2%) |
Thy5 | 17 (7.2%) | 98 (8.5%) | 112 (9.7%) | 81 (7.0%) | 71 (6.2%) | 86 (7.5%) |
Unknown | 915 | 0 | 0 | 0 | 0 | 0 |
final_pathology | ||||||
Benign | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) | 1,050 (91%) |
Cancer | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) | 100 (8.7%) |
1
Median (IQR); n (%) |
3.2 Modelling
TODO - And in light of having removed ?@tbl-data-completness in favour of the imputed datesets this too has been removed? (@ns-rse
2024-07-11). TODO - This table feels like duplication of ?@tbl-data-completeness, perhaps have just one? (@ns-rse
2024-07-11).
The predictor variables selected to predict final_pathology
are shown in ?@tbl-predictors
Section that sets up the modelling
The following section is output from a Tidymodel approach to logistic regression to try and work out why variables are not being included.
A total of 1150 patients had complete data for the selected predictor variables (see ?@tbl-predictors). Because of the volume of missing data which if a saturated model were used would include only ~350 people with complete data across all co-variates imputed datasets were analysed instead.
3.2.1 Logistic Regression
3.2.1.1 Clinical Characteristics
Characteristic |
OR 1 |
95% CI 1 |
p-value |
---|---|---|---|
Age | 0.98 | 0.96, 0.99 | 0.005 |
gender | 1.97 | 1.11, 3.43 | 0.019 |
incidental_nodule | 0.99 | 0.48, 2.02 | >0.9 |
palpable_nodule | 2.76 | 1.29, 6.08 | 0.010 |
rapid_enlargement | 1.66 | 0.46, 5.37 | 0.4 |
compressive_symptoms | 0.98 | 0.43, 2.09 | >0.9 |
hashimotos_thyroiditis | 1.19 | 0.02, 13.0 | >0.9 |
family_history_thyroid_cancer | 3.80 | 0.48, 21.7 | 0.15 |
exposure_radiation | 0.00 | >0.9 | |
TSH value | 1.00 | 0.94, 1.05 | >0.9 |
Nodule size (mm) | 1.03 | 1.01, 1.05 | <0.001 |
solitary_nodule | 1.48 | 0.85, 2.51 | 0.2 |
cervical_lymphadenopathy | 6.74 | 2.37, 18.8 | <0.001 |
1
OR = Odds Ratio, CI = Confidence Interval |
3.2.1.2 Biomarkers
Characteristic |
OR 1 |
95% CI 1 |
p-value |
---|---|---|---|
Age | 0.98 | 0.97, 1.00 | 0.010 |
gender | 1.95 | 1.15, 3.26 | 0.012 |
TSH value | 1.01 | 0.96, 1.06 | 0.5 |
Albumin | 1.03 | 0.96, 1.12 | 0.4 |
Lymphocytes | 1.04 | 0.74, 1.43 | 0.8 |
Monocytes | 0.18 | 0.04, 0.72 | 0.019 |
1
OR = Odds Ratio, CI = Confidence Interval |
3.2.1.3 Ultrasound 1
Characteristic |
OR 1 |
95% CI 1 |
p-value |
---|---|---|---|
Age | 0.99 | 0.97, 1.01 | 0.2 |
gender | 1.23 | 0.63, 2.32 | 0.5 |
bta_u_classification | |||
U1 | — | — | |
U2 | 461,584 | 0.00, NA | >0.9 |
U3 | 7,642,048 | 0.00, NA | >0.9 |
U4 | 36,642,711 | 0.00, NA | >0.9 |
U5 | 126,534,639 | 0.00, NA | >0.9 |
thy_classification | |||
Thy1 | — | — | |
Thy1c | 1.63 | 0.19, 9.45 | 0.6 |
Thy2 | 0.36 | 0.09, 1.25 | 0.11 |
Thy2c | 0.00 | 0.00, 0.00 | >0.9 |
Thy3a | 2.24 | 0.66, 7.80 | 0.2 |
Thy3f | 3.01 | 1.21, 8.47 | 0.025 |
Thy4 | 7.06 | 1.93, 26.6 | 0.003 |
Thy5 | 9.52 | 3.19, 31.1 | <0.001 |
1
OR = Odds Ratio, CI = Confidence Interval |
3.2.1.4 Ultrasound 2
Characteristic |
OR 1 |
95% CI 1 |
p-value |
---|---|---|---|
Age | 0.99 | 0.97, 1.01 | 0.2 |
gender | 1.19 | 0.58, 2.34 | 0.6 |
incidental_nodule | 0.94 | 0.48, 1.84 | 0.9 |
TSH value | 0.95 | 0.89, 1.00 | 0.091 |
Nodule size (mm) | 1.04 | 1.02, 1.06 | <0.001 |
solitary_nodule | 1.49 | 0.77, 2.82 | 0.2 |
cervical_lymphadenopathy | 6.61 | 1.45, 29.3 | 0.013 |
bta_u_classification | |||
U1 | — | — | |
U2 | 239,169 | 0.00, NA | >0.9 |
U3 | 3,105,400 | 0.00, NA | >0.9 |
U4 | 14,770,696 | 0.00, NA | >0.9 |
U5 | 39,781,959 | 0.00, NA | >0.9 |
thy_classification | |||
Thy1 | — | — | |
Thy1c | 0.97 | 0.06, 7.66 | >0.9 |
Thy2 | 0.25 | 0.06, 0.97 | 0.049 |
Thy2c | 0.00 | 0.00, 1,045,453 | >0.9 |
Thy3a | 2.69 | 0.76, 9.91 | 0.13 |
Thy3f | 2.45 | 0.92, 7.39 | 0.089 |
Thy4 | 8.34 | 2.18, 33.3 | 0.002 |
Thy5 | 11.1 | 3.47, 39.6 | <0.001 |
1
OR = Odds Ratio, CI = Confidence Interval |
3.2.2 LASSO
## Specify the LASSO model using parsnip, the key here is the use of the glmnet engine which is the R package for
## fitting LASSO regression. Technically the package fits Elastic Net but with a mixture value of 1 it is equivalent to
## a plain LASSO (mixture value of 0 is equivalent to Ridge Regression in an Elastic Net)
<- parsnip::logistic_reg(penalty = hardhat::tune(), mixture = 1) |>
tune_spec_lasso ::set_engine("glmnet")
parsnip
## Tune the LASSO parameters via cross-validation
<- tune::tune_grid(
lasso_grid object = workflows::add_model(thyroid_workflow, tune_spec_lasso),
resamples = cv_folds,
grid = dials::grid_regular(penalty(), levels = 50)
)
NB - We may wish to inspect the coefficients at each step of tuning. A related example of how to do this can be found in the Tidymodels documentation under the Tuning a glmnet
model. This would be desirable as it looks like only two features are selected as being important by this method and so rather than just accepting this I would want to investigate and see how the coefficients changed over iterations. Another useful resource is the glmnet documentation, although note that since we are using the Tidymodels framework the model fit
is wrapped up inside (hence the above article on how to extract this information).
3.2.3 Elastic Net
NB - We may wish to inspect the coefficients at each step of tuning. A related example of how to do this can be found in the Tidymodels documentation under the Tuning a glmnet
model. This would be desirable as it looks like only two features are selected as being important by this method and so rather than just accepting this I would want to investigate and see how the coefficients changed over iterations. Another useful resource is the glmnet documentation, although note that since we are using the Tidymodels framework the model fit
is wrapped up inside (hence the above article on how to extract this information).
3.2.4 Random Forest
3.2.5 Gradient Boosting
Length Class Mode
pre 3 stage_pre list
fit 2 stage_fit list
post 1 stage_post list
trained 1 -none- logical
# A tibble: 23 × 4
variable type role source
<chr> <list> <chr> <chr>
1 age_at_scan <chr [2]> predictor original
2 gender <chr [3]> predictor original
3 ethnicity <chr [3]> predictor original
4 incidental_nodule <chr [3]> predictor original
5 palpable_nodule <chr [3]> predictor original
6 rapid_enlargement <chr [3]> predictor original
7 compressive_symptoms <chr [3]> predictor original
8 hypertension <chr [3]> predictor original
9 vocal_cord_paresis <chr [3]> predictor original
10 graves_disease <chr [3]> predictor original
# ℹ 13 more rows
Length Class Mode
handle 1 xgb.Booster.handle externalptr
raw 95013 -none- raw
niter 1 -none- numeric
evaluation_log 2 data.table list
call 8 -none- call
params 10 -none- list
callbacks 1 -none- list
feature_names 45 -none- character
nfeatures 1 -none- numeric
Feature Gain Cover Frequency
<char> <num> <num> <num>
1: bta_u_classification_U2 0.8201 0.5114 0.3106
2: tsh_value 0.0900 0.0693 0.1677
3: size_nodule_mm 0.0332 0.3619 0.3634
4: age_at_scan 0.0314 0.0344 0.1025
5: albumin 0.0253 0.0230 0.0559
[1] ".pred_Benign" ".pred_Cancer"
[3] ".pred_class" "age_at_scan"
[5] "gender" "ethnicity"
[7] "incidental_nodule" "palpable_nodule"
[9] "rapid_enlargement" "compressive_symptoms"
[11] "hypertension" "vocal_cord_paresis"
[13] "graves_disease" "hashimotos_thyroiditis"
[15] "family_history_thyroid_cancer" "exposure_radiation"
[17] "albumin" "tsh_value"
[19] "lymphocytes" "monocyte"
[21] "bta_u_classification" "solitary_nodule"
[23] "size_nodule_mm" "cervical_lymphadenopathy"
[25] "thy_classification" "final_pathology"
Description | |
---|---|
age_at_scan | Age |
albumin | Albumin |
bta_u_classification | BTA U |
cervical_lymphadenopathy | Cervical Lymphadenopathy |
compressive_symptoms | Compressive symptoms |
consistency_nodule | Nodule consistency |
eligibility | Eligibility |
ethinicity | Ethinicity |
ethnicity | Ethnicity |
exposure_radiation | Exposure to radiation |
family_history_thyroid_cancer | Family history of thyroid cancer |
final_pathology | Final diagnosis |
fna_done | FNA done |
gender | Gender |
graves_disease | Graves’ disease |
hashimotos_thyroiditis | Hashimoto’s disease |
hypertension | Hypertension |
incidental_nodule | Incidental nodule |
lymphocytes | Lymphocytes |
monocyte | Monocytes |
palpable_nodule | Palpable nodule |
rapid_enlargement | Rapid enlargement |
repeat_bta_u_classification | Repeat BTA U |
repeat_fna_done | Repeat FNA |
repeat_thy_classification | Repeat Thy class |
repeat_ultrasound | Repeat ultrasound |
size_nodule_mm | Nodule size (mm) |
solitary_nodule | Solitary nodule |
study_id | Study ID |
thy_classification | Thy classification |
thyroid_histology_diagnosis | Histology |
thyroid_surgery | Thyroid surgery |
tsh_value | TSH value |
vocal_cord_paresis | Vocal cord paresis |
References
Citation
@online{edafe2024,
author = {Edafe, Ovie and Shephard, Neil and Sisley, Karen and P
Balasubramanian, Sabapathy},
title = {An Investigation of the Predictors of Thyroid Cancer in
Patients with Thyroid Nodules},
date = {2024-04-26},
langid = {en},
abstract = {An abstract summarising the work undertaken and the
overall conclusions can be placed here. Sub-headings are currently
removed because they conflict with those in the body of the text and
mess up the links in the Table of Contents.}
}