An investigation of the predictors of thyroid cancer in patients with thyroid nodules

Authors
Affiliations

Ovie Edafe

Department of Oncology & Metabolism, University of Sheffield

Neil Shephard

Research Software Engineer, Department of Computer Science, University of Sheffield

Karen Sisley

Senior Lecturer, Clinical Medicine, School of Medicine and Population Health, University of Sheffield

Sabapathy P Balasubramanian

Directorate of General Surgery, Sheffield Teaching Hospitals NHS Foundation Trust

Published

April 26, 2024

Other Formats
Abstract

An abstract summarising the work undertaken and the overall conclusions can be placed here. Sub-headings are currently removed because they conflict with those in the body of the text and mess up the links in the Table of Contents.

Keywords

Thyroid nodules, Thyroid cancer

1 Introduction

Thyroid nodules are common. The challenge in the management of thyroid nodules is differentiating between benign and malignant nodule thyroid nodules.The use fine needle aspiration and cytology (FNAC) still leaves around 20% of patients that cannot be clearly classified as either benign or malignant. This scenario traditionally leads to diagnostic hemithyroidectomy for definitive histology. Other clinical variables such as patients’ demographics, clinical and biochemical factors have been shown to be associated with thyroid cancer in patients with thyroid nodules. This has been utilised in studies evaluating predictors of thyroid cancer with a view of creating a model to aid prediction. Standard practice on the management of thyroid nodules does not utilise these non ultrasound and non cytological factors. Combination of these variables considered to be significant with ultrasound and cytological characteristics may improve management of patients with thyroid nodules. Thyroid nodules are increasingly being incidentally detected with increased use of imaging in the evaluation of non thyroid related pathologies. Thus, leading to increase investigation of thyroid nodules and subsequent increased number of thyroid operations in non diagnostic cases. There are morbidities associated with thyroid surgery including scar, recurrent laryngeal nerve injury, hypothyroidism and hypoparathyroidism. We performed a systematic review to evaluate for predictors of thyroid cancer specifically in patients presenting with thyroid nodules. The systematic review a number of potential important variables that may be useful in the prediction of thyroid cancer in patients with thyroid nodules. The aim of this study was to evaluate the predictors of thyroid cancer with a view of improving prediction of thyroid cancer using computer age statistical inference techniques (Efron and Hastie (2016)).

2 Methods

This study was reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines

2.1 Study design

This was a retrospective cohort study.

2.2 Setting

The study was conducted at the Sheffield Teaching hospitals NHS Foundation Trusts. This is a tertiary referral centre for the management of thyroid cancer

2.3 Participants

We included all consecutive patients who presented with thyroid nodule(s) or that were found to have thyroid nodule(s) on ultrasound done for thyroid pathology or for other non thyroid related pathologies

2.4 Variables

Variable evaluated was based on findings from a systematic review evaluating predictors of thyroid cancer in patients with thyroid nodules. Data on the following variables were collected: patient demographics (age, gender, ethnicity), nodule presentation (incidental nodule, palpable nodule, rapid enlargement, compressive symptoms, vocal paresis), past medical history (hypertension, Graves’ disease, Hashimotos’ thyroiditis, family history of thyroid cancer, exposure to neck radiation), biochemistry (thyroid stimulating hormone, lymphocytes, monocytes), ultrasound characteristics (British Thyroid Association ultrasound (BTA U), nodule size, solitary nodule, nodule consistency, cervical lymphadenopathy), Royal College of Pathology (RCP) FNAC classification, type of thyroid surgery, and histological diagnosis.

2.5 Data source

Data was collected from patients’ case notes and electronic patients’ database using a standardised data collection proforma. This was initially piloted on 30 patients and revised to improve data entry. In addition a number of variables that were not standard collected during workout of patients were not further checked; these include body mass index (BMI), serum thyroglobulin, serum triiodothyronine (T3), thyroxine (T4), thyroglobulin antibody (TgAb), thyroid peroxidase antibody (TP0Ab), and urinary iodine.

2.6 Study size

We sought to have a large data set of at least 100 thyroid nodules with a cancer diagnosis using consecutive sampling technique. We aimed for a total of 1500 patients with thyroid nodules to achieve our target sample size. With the use of modern statistical techniques, we proposed such number will be appropriate to detect important variables if it exists.

2.7 Data analysis

Data was cleaned and analysed using the R Statistical Software R Core Team (2023) and the Tidyverse (Wickham et al. (2019)), Tidymodels (Kuhn and Wickham (2020)) collection of packages.

2.8 Imputation

The dataset is incomplete and there are missing observations across all variables to varying degrees. In order to maximise the sample available for analysis imputation was used to infer missing values. The Multivariat Imputation via Chained Equations (MICE and implemented in the eponymous R package Buuren and Groothuis-Oudshoorn (2011)) was employed which assumes data is missing at random (a difficult assumption to formally test). The approach takes each variable with missing data and attempts to predict it using statistical modelling based on the observed values. In essence it is the same approach as the statistical methods being employed to try and predict Thyroid Cancer and there are a range of statistical techniques available which include

2.9 Modelling

We used a selection of statistic modelling techniques to evaluate association between variables and thyroid cancer in patients with thyroid nodules. The patient population was split into training and testing cohorts in a ratio of 0.75:0.25 and each model is fitted using the training cohort. This split ratio is generally used in traditional machine learning techniques. The training set of the data was used to estimate the relation between variables and thyroid cancer. The larger the training data, the better it is for the model to learn the trends. The test set was used to determine the accuracy of the model in predicting thyroid cancer; the bigger the test data the more confidence we have in the model prognostic values. We used simple randomisation technique for the split to prevent bias in the data split. We ensured that there was no duplicate in the data sets so any test data was not accidentally trained. Furthermore, cross validation was used to estimate the accuracy of the various machine learning models. The k-fold techniques splits the data in ?10 folds, and the data was trained on all but one of the the fold, and the one fold not trained is used to test the data. This was repeated multiple times using a different fold for test and the others for training until all the folds is utilised for training and testing. Following multiple training process with k-fold, we selected the model that has the best predictive value for thyroid cancer in the test cohort. We also used the leave one out (loo) cross-validation to train and test the data set.In this technique, all but one observation is use to train the data set and one observation is use to test the data; this is repeated until all the data test is used for testing and training. The model with the best predictive value was selected.

2.9.1 LASSO / Elastic Net

LASSO (Least Absolute Shrinkage and Selection Operatror) and Elastic Net Zou and Hastie (2005) are regression methods that perform variable selection. The original LASSO method proposed by Regression Shrinkage and Selection via the Lasso (1996) allows the coefficients for independent/predictor variables to “shrink” down towards zero, effectively eliminating them from influencing the model, this is often referred to as L1 regularisation. The Elastic Net Zou and Hastie (2005) improves on the LASSO by balancing L1 regularisation with ridge-regression or L2 regularisation which helps avoid over-fitting.

Both methods avoid many of the shortcomings/pitfalls of stepwise variable selection Thompson (1995) Smith (2018) and have been shown to be more accurate in clinical decision making in small datasets with well code, externally selected variables Steyerberg et al. (2001)

2.9.2 Random Forest

To add reference The random forest plot is an extension of the decision tree methodology to reduce variance. Decision trees are very sensitive to the training data set and can lead to high variance; thus potential issues with generalisation of the model. The random forest plot selects random observation of the dataset to create multiple decision trees. Random variables are selected for each tree in the training of the data set. The aggregated output of the generated decision trees is then used to create an estimate.

2.9.3 Gradient Boosting

Gradient boosting is a machine learning algorithm that uses decision tree as a base model. The data is initially trained on this decision tree, but the initial prediction is weak, thus termed a weak based model. In gradient boosting the process is iterative; a sequence of decision trees is added to the initial tree. Each tree learns from the prior tree(s) to improve the model, increasing strength and minimising error.

2.9.4 SVM

Support Vector Machines is an approach that allows observation with a binary classifications to be separated using a hyperplane. It finds a hyperplane that best stratify the two classes i.e benign versus malignant nodules. SVM finds the hyperplane with the maximum margin of separation between the two classes. The support vectors are the data point that are positioned close to the margin of the hyperplane and these used to select the most appropraite hyperplane. The support vectors are the only data points that have an influence on the maximum margin in SVM.

2.9.5 Comparision

3 Results

?@tbl-patient-demographics shows the demographics of patients included in this study. A total of 1364 patients were included in this study with a median (IQR) age of 55 ( 41-69). ?@tbl-clinical-characteristics shows the distribution of clinical variables evaluated between benign and malignant thyroid nodules.

3.1 Data Description

A summary of the variables that are available in this data set can be found in Table 3.

The completeness of the original data is shown in tables ?@tbl-imputation-summary-pmm, ?@tbl-imputation-summary-cart, ?@tbl-imputation-summary-rf, along with summaries from four rounds of imputation for each of three imputation methods. Where variables continuous (e.g. age or size_nodule_mm) basic summary statistics in the form of mean, standard deviation, median and inter-quartile range are given. For categorical variables that are logical TRUE/FALSE (e.g. palpable_nodule) the number of TRUE observations and the percentage (of those with observed data for that variable) are shown along with the number that are Unknown. For categorical variables such as gender percentages in each category are reported. For all variables an indication of the number of missing observations is also given and it is worth noting that there are 214 instances where the final_pathology is not known which reduces the sample size to 1150.

3.1.1 Missing Data

More detailed tabulations of missing data by variable are shown in Table 1 which shows the number and percentage of missing data for each variable and by case in Table 2 which shows how much missing data each case has. A visualisation of this is shown in Figure 1 .

NB - Currently there is a bug in the stable release of Quarto which prevents rendering of the missing data figures. It is fixed in development version v1.6.1 (currently available as pre-release, so if things don’t render try upgrading).

Table 1: Summary of missing data by variable.
Variable N %
thy_classification 915 79.6
albumin 515 44.8
tsh_value 413 35.9
monocyte 363 31.6
lymphocytes 359 31.2
size_nodule_mm 319 27.7
family_history_thyroid_cancer 281 24.4
hypertension 126 11.0
exposure_radiation 121 10.5
compressive_symptoms 106 9.22
vocal_cord_paresis 76 6.61
hashimotos_thyroiditis 73 6.35
graves_disease 67 5.83
palpable_nodule 58 5.04
bta_u_classification 50 4.35
ethnicity 48 4.17
rapid_enlargement 43 3.74
cervical_lymphadenopathy 9 0.783
solitary_nodule 8 0.696
incidental_nodule 5 0.435
age_at_scan 0 0
gender 0 0
final_pathology 0 0
Source: Article Notebook
Table 2: Summary of missing data by case, how much missing data is there per person?
Missing Variables N %
0 65 5.652
1 227 19.739
2 229 19.913
3 181 15.739
4 139 12.087
5 102 8.870
6 76 6.609
7 35 3.043
8 30 2.609
9 13 1.130
10 19 1.652
11 15 1.304
12 9 0.783
13 4 0.348
14 3 0.261
15 2 0.174
16 1 0.087

The MICE package also provides tools for visualising missing data and these are shown in figures Figure 2, ?@fig-mice-vis-missing-biomarker and Figure 4.

The columns of these plots, labelled along the top, show the variable, if a cell is blue it indicates data is present, if it is red it indicates there is missing data. The left-hand side shows the total number of observations for that rows particular combination of variables with number of missing variables indicated on the right. The first row shows that for these variables there are 604 observations with zero missing data across the listed variables, the second row indicates there are 166 observations with just family_history_thyroid_cancer but there are some with this missing and other variables. The numbers on the bottom of the figure indicate the total number of missing observations for that variable (e.g. for family_history_thyroid_cancer there is a total of 281 missing observations).

TODO - Workout why out-width: "80%" isn’t applied to these figures and/or how to make the All figure readable.

3.1.2 Imputation

The MICE package@mice offers a number of different methods for imputing variables (see [documentation][mice_details]) we have investigated Predictive Mean Matching (PMM), Classification and Regression Trees (CART) and Random Forests (RF). Four rounds of imputation using each method were made.

A comparison of distributions/proportions before and after imputation are presented below to allow assessment of the utility of each method.

The convergence of the imputation methods are shown in figues ?@fig-mice-convergence-pmm, ?@fig-mice-convergence-cart, and ?@fig-mice-convergence-rf.

TODO - Extract the legends from individual plots and add them to the end of each row, see the cowplot shared legends article for pointers on how to do this. Should ideally also get the fill colours to align with those used by ggmice.

Characteristic

Original, N = 1,150

1

1, N = 1,150

1

2, N = 1,150

1

3, N = 1,150

1

4, N = 1,150

1

5, N = 1,150

1
.id 576 (288, 863) 576 (288, 863) 576 (288, 863) 576 (288, 863) 576 (288, 863) 576 (288, 863)
Age 55 (41, 68) 55 (41, 68) 55 (41, 68) 55 (41, 68) 55 (41, 68) 55 (41, 68)
gender





    F 903 (79%) 903 (79%) 903 (79%) 903 (79%) 903 (79%) 903 (79%)
    M 247 (21%) 247 (21%) 247 (21%) 247 (21%) 247 (21%) 247 (21%)
ethnicity





    A 812 (74%) 843 (73%) 841 (73%) 839 (73%) 845 (73%) 842 (73%)
    B 3 (0.3%) 4 (0.3%) 4 (0.3%) 4 (0.3%) 3 (0.3%) 3 (0.3%)
    C 44 (4.0%) 45 (3.9%) 47 (4.1%) 44 (3.8%) 48 (4.2%) 45 (3.9%)
    D 2 (0.2%) 2 (0.2%) 2 (0.2%) 2 (0.2%) 2 (0.2%) 2 (0.2%)
    F 4 (0.4%) 4 (0.3%) 4 (0.3%) 4 (0.3%) 4 (0.3%) 4 (0.3%)
    G 7 (0.6%) 7 (0.6%) 7 (0.6%) 7 (0.6%) 7 (0.6%) 7 (0.6%)
    H 8 (0.7%) 8 (0.7%) 8 (0.7%) 9 (0.8%) 8 (0.7%) 8 (0.7%)
    J 46 (4.2%) 50 (4.3%) 49 (4.3%) 51 (4.4%) 50 (4.3%) 49 (4.3%)
    K 11 (1.0%) 11 (1.0%) 13 (1.1%) 12 (1.0%) 11 (1.0%) 12 (1.0%)
    L 16 (1.5%) 19 (1.7%) 16 (1.4%) 18 (1.6%) 16 (1.4%) 17 (1.5%)
    M 16 (1.5%) 16 (1.4%) 16 (1.4%) 17 (1.5%) 16 (1.4%) 18 (1.6%)
    N 18 (1.6%) 19 (1.7%) 18 (1.6%) 18 (1.6%) 19 (1.7%) 20 (1.7%)
    P 14 (1.3%) 14 (1.2%) 15 (1.3%) 16 (1.4%) 15 (1.3%) 14 (1.2%)
    R 6 (0.5%) 7 (0.6%) 6 (0.5%) 8 (0.7%) 7 (0.6%) 7 (0.6%)
    S 38 (3.4%) 40 (3.5%) 43 (3.7%) 42 (3.7%) 40 (3.5%) 40 (3.5%)
    Z 57 (5.2%) 61 (5.3%) 61 (5.3%) 59 (5.1%) 59 (5.1%) 62 (5.4%)
    Unknown 48 0 0 0 0 0
incidental_nodule 620 (54%) 623 (54%) 622 (54%) 623 (54%) 624 (54%) 623 (54%)
    Unknown 5 0 0 0 0 0
palpable_nodule 441 (40%) 471 (41%) 470 (41%) 470 (41%) 471 (41%) 470 (41%)
    Unknown 58 0 0 0 0 0
rapid_enlargement 19 (1.7%) 19 (1.7%) 21 (1.8%) 22 (1.9%) 20 (1.7%) 19 (1.7%)
    Unknown 43 0 0 0 0 0
compressive_symptoms 88 (8.4%) 107 (9.3%) 103 (9.0%) 102 (8.9%) 98 (8.5%) 101 (8.8%)
    Unknown 106 0 0 0 0 0
hypertension 262 (26%) 287 (25%) 301 (26%) 292 (25%) 293 (25%) 291 (25%)
    Unknown 126 0 0 0 0 0
vocal_cord_paresis 3 (0.3%) 5 (0.4%) 3 (0.3%) 6 (0.5%) 3 (0.3%) 3 (0.3%)
    Unknown 76 0 0 0 0 0
graves_disease 17 (1.6%) 20 (1.7%) 20 (1.7%) 19 (1.7%) 19 (1.7%) 18 (1.6%)
    Unknown 67 0 0 0 0 0
hashimotos_thyroiditis 7 (0.6%) 7 (0.6%) 7 (0.6%) 8 (0.7%) 9 (0.8%) 7 (0.6%)
    Unknown 73 0 0 0 0 0
family_history_thyroid_cancer 8 (0.9%) 10 (0.9%) 14 (1.2%) 11 (1.0%) 11 (1.0%) 16 (1.4%)
    Unknown 281 0 0 0 0 0
exposure_radiation 9 (0.9%) 9 (0.8%) 9 (0.8%) 9 (0.8%) 10 (0.9%) 10 (0.9%)
    Unknown 121 0 0 0 0 0
Albumin 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0)
    Unknown 515 0 0 0 0 0
TSH value 1.48 (0.85, 2.30) 1.50 (0.89, 2.30) 1.50 (0.86, 2.40) 1.50 (0.87, 2.50) 1.50 (0.89, 2.50) 1.40 (0.85, 2.30)
    Unknown 413 0 0 0 0 0
Lymphocytes 1.94 (1.51, 2.43) 1.96 (1.54, 2.46) 1.95 (1.53, 2.45) 1.94 (1.50, 2.42) 1.95 (1.53, 2.42) 1.93 (1.49, 2.42)
    Unknown 359 0 0 0 0 0
Monocytes 0.52 (0.42, 0.66) 0.53 (0.43, 0.66) 0.52 (0.42, 0.66) 0.53 (0.42, 0.66) 0.52 (0.43, 0.66) 0.52 (0.42, 0.66)
    Unknown 363 0 0 0 0 0
bta_u_classification





    U1 1 (<0.1%) 1 (<0.1%) 2 (0.2%) 1 (<0.1%) 1 (<0.1%) 1 (<0.1%)
    U2 860 (78%) 902 (78%) 895 (78%) 898 (78%) 893 (78%) 893 (78%)
    U3 210 (19%) 214 (19%) 222 (19%) 220 (19%) 223 (19%) 224 (19%)
    U4 22 (2.0%) 25 (2.2%) 23 (2.0%) 24 (2.1%) 26 (2.3%) 22 (1.9%)
    U5 7 (0.6%) 8 (0.7%) 8 (0.7%) 7 (0.6%) 7 (0.6%) 10 (0.9%)
    Unknown 50 0 0 0 0 0
solitary_nodule 320 (28%) 323 (28%) 322 (28%) 323 (28%) 323 (28%) 323 (28%)
    Unknown 8 0 0 0 0 0
Nodule size (mm) 14 (7, 28) 13 (6, 27) 13 (6, 27) 13 (6, 27) 12 (6, 26) 13 (6, 27)
    Unknown 319 0 0 0 0 0
cervical_lymphadenopathy 26 (2.3%) 27 (2.3%) 27 (2.3%) 26 (2.3%) 27 (2.3%) 26 (2.3%)
    Unknown 9 0 0 0 0 0
thy_classification





    Thy1 34 (14%) 362 (31%) 174 (15%) 242 (21%) 179 (16%) 220 (19%)
    Thy1c 8 (3.4%) 19 (1.7%) 35 (3.0%) 39 (3.4%) 44 (3.8%) 28 (2.4%)
    Thy2 63 (27%) 403 (35%) 412 (36%) 360 (31%) 373 (32%) 360 (31%)
    Thy2c 11 (4.7%) 29 (2.5%) 34 (3.0%) 44 (3.8%) 35 (3.0%) 38 (3.3%)
    Thy3a 18 (7.7%) 36 (3.1%) 44 (3.8%) 60 (5.2%) 61 (5.3%) 69 (6.0%)
    Thy3f 74 (31%) 193 (17%) 302 (26%) 296 (26%) 335 (29%) 336 (29%)
    Thy4 10 (4.3%) 17 (1.5%) 42 (3.7%) 24 (2.1%) 37 (3.2%) 29 (2.5%)
    Thy5 17 (7.2%) 91 (7.9%) 107 (9.3%) 85 (7.4%) 86 (7.5%) 70 (6.1%)
    Unknown 915 0 0 0 0 0
final_pathology





    Benign 1,050 (91%) 1,050 (91%) 1,050 (91%) 1,050 (91%) 1,050 (91%) 1,050 (91%)
    Cancer 100 (8.7%) 100 (8.7%) 100 (8.7%) 100 (8.7%) 100 (8.7%) 100 (8.7%)
1

Median (IQR); n (%)

Characteristic

Original, N = 1,150

1

1, N = 1,150

1

2, N = 1,150

1

3, N = 1,150

1

4, N = 1,150

1

5, N = 1,150

1
.id 576 (288, 863) 576 (288, 863) 576 (288, 863) 576 (288, 863) 576 (288, 863) 576 (288, 863)
Age 55 (41, 68) 55 (41, 68) 55 (41, 68) 55 (41, 68) 55 (41, 68) 55 (41, 68)
gender





    F 903 (79%) 903 (79%) 903 (79%) 903 (79%) 903 (79%) 903 (79%)
    M 247 (21%) 247 (21%) 247 (21%) 247 (21%) 247 (21%) 247 (21%)
ethnicity





    A 812 (74%) 843 (73%) 842 (73%) 843 (73%) 845 (73%) 845 (73%)
    B 3 (0.3%) 3 (0.3%) 3 (0.3%) 4 (0.3%) 3 (0.3%) 3 (0.3%)
    C 44 (4.0%) 46 (4.0%) 46 (4.0%) 47 (4.1%) 47 (4.1%) 47 (4.1%)
    D 2 (0.2%) 2 (0.2%) 2 (0.2%) 2 (0.2%) 2 (0.2%) 2 (0.2%)
    F 4 (0.4%) 4 (0.3%) 4 (0.3%) 5 (0.4%) 4 (0.3%) 4 (0.3%)
    G 7 (0.6%) 7 (0.6%) 7 (0.6%) 7 (0.6%) 8 (0.7%) 7 (0.6%)
    H 8 (0.7%) 8 (0.7%) 9 (0.8%) 8 (0.7%) 8 (0.7%) 8 (0.7%)
    J 46 (4.2%) 48 (4.2%) 48 (4.2%) 52 (4.5%) 49 (4.3%) 46 (4.0%)
    K 11 (1.0%) 11 (1.0%) 11 (1.0%) 11 (1.0%) 12 (1.0%) 11 (1.0%)
    L 16 (1.5%) 16 (1.4%) 17 (1.5%) 16 (1.4%) 16 (1.4%) 17 (1.5%)
    M 16 (1.5%) 17 (1.5%) 17 (1.5%) 17 (1.5%) 18 (1.6%) 19 (1.7%)
    N 18 (1.6%) 19 (1.7%) 21 (1.8%) 19 (1.7%) 19 (1.7%) 18 (1.6%)
    P 14 (1.3%) 16 (1.4%) 14 (1.2%) 15 (1.3%) 14 (1.2%) 15 (1.3%)
    R 6 (0.5%) 8 (0.7%) 7 (0.6%) 6 (0.5%) 6 (0.5%) 6 (0.5%)
    S 38 (3.4%) 41 (3.6%) 39 (3.4%) 38 (3.3%) 41 (3.6%) 42 (3.7%)
    Z 57 (5.2%) 61 (5.3%) 63 (5.5%) 60 (5.2%) 58 (5.0%) 60 (5.2%)
    Unknown 48 0 0 0 0 0
incidental_nodule 620 (54%) 623 (54%) 624 (54%) 624 (54%) 623 (54%) 624 (54%)
    Unknown 5 0 0 0 0 0
palpable_nodule 441 (40%) 470 (41%) 472 (41%) 468 (41%) 472 (41%) 468 (41%)
    Unknown 58 0 0 0 0 0
rapid_enlargement 19 (1.7%) 19 (1.7%) 20 (1.7%) 19 (1.7%) 19 (1.7%) 21 (1.8%)
    Unknown 43 0 0 0 0 0
compressive_symptoms 88 (8.4%) 102 (8.9%) 105 (9.1%) 104 (9.0%) 101 (8.8%) 99 (8.6%)
    Unknown 106 0 0 0 0 0
hypertension 262 (26%) 291 (25%) 293 (25%) 292 (25%) 291 (25%) 292 (25%)
    Unknown 126 0 0 0 0 0
vocal_cord_paresis 3 (0.3%) 4 (0.3%) 3 (0.3%) 3 (0.3%) 3 (0.3%) 3 (0.3%)
    Unknown 76 0 0 0 0 0
graves_disease 17 (1.6%) 17 (1.5%) 19 (1.7%) 18 (1.6%) 17 (1.5%) 17 (1.5%)
    Unknown 67 0 0 0 0 0
hashimotos_thyroiditis 7 (0.6%) 9 (0.8%) 7 (0.6%) 9 (0.8%) 8 (0.7%) 8 (0.7%)
    Unknown 73 0 0 0 0 0
family_history_thyroid_cancer 8 (0.9%) 12 (1.0%) 9 (0.8%) 8 (0.7%) 11 (1.0%) 16 (1.4%)
    Unknown 281 0 0 0 0 0
exposure_radiation 9 (0.9%) 10 (0.9%) 9 (0.8%) 11 (1.0%) 11 (1.0%) 10 (0.9%)
    Unknown 121 0 0 0 0 0
Albumin 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0)
    Unknown 515 0 0 0 0 0
TSH value 1.48 (0.85, 2.30) 1.50 (0.88, 2.39) 1.50 (0.88, 2.50) 1.50 (0.91, 2.50) 1.50 (0.90, 2.40) 1.42 (0.85, 2.30)
    Unknown 413 0 0 0 0 0
Lymphocytes 1.94 (1.51, 2.43) 1.94 (1.54, 2.43) 1.94 (1.51, 2.43) 1.95 (1.53, 2.44) 1.91 (1.50, 2.41) 1.94 (1.50, 2.43)
    Unknown 359 0 0 0 0 0
Monocytes 0.52 (0.42, 0.66) 0.53 (0.42, 0.66) 0.53 (0.43, 0.66) 0.53 (0.43, 0.66) 0.52 (0.42, 0.66) 0.52 (0.42, 0.65)
    Unknown 363 0 0 0 0 0
bta_u_classification





    U1 1 (<0.1%) 1 (<0.1%) 1 (<0.1%) 1 (<0.1%) 1 (<0.1%) 2 (0.2%)
    U2 860 (78%) 894 (78%) 899 (78%) 903 (79%) 897 (78%) 897 (78%)
    U3 210 (19%) 226 (20%) 219 (19%) 215 (19%) 220 (19%) 220 (19%)
    U4 22 (2.0%) 22 (1.9%) 24 (2.1%) 23 (2.0%) 24 (2.1%) 24 (2.1%)
    U5 7 (0.6%) 7 (0.6%) 7 (0.6%) 8 (0.7%) 8 (0.7%) 7 (0.6%)
    Unknown 50 0 0 0 0 0
solitary_nodule 320 (28%) 322 (28%) 322 (28%) 322 (28%) 322 (28%) 323 (28%)
    Unknown 8 0 0 0 0 0
Nodule size (mm) 14 (7, 28) 13 (6, 26) 12 (6, 27) 13 (6, 26) 12 (6, 27) 13 (6, 26)
    Unknown 319 0 0 0 0 0
cervical_lymphadenopathy 26 (2.3%) 26 (2.3%) 26 (2.3%) 27 (2.3%) 26 (2.3%) 26 (2.3%)
    Unknown 9 0 0 0 0 0
thy_classification





    Thy1 34 (14%) 203 (18%) 192 (17%) 187 (16%) 221 (19%) 188 (16%)
    Thy1c 8 (3.4%) 46 (4.0%) 35 (3.0%) 36 (3.1%) 59 (5.1%) 63 (5.5%)
    Thy2 63 (27%) 278 (24%) 272 (24%) 281 (24%) 258 (22%) 237 (21%)
    Thy2c 11 (4.7%) 45 (3.9%) 70 (6.1%) 58 (5.0%) 70 (6.1%) 76 (6.6%)
    Thy3a 18 (7.7%) 60 (5.2%) 75 (6.5%) 74 (6.4%) 59 (5.1%) 125 (11%)
    Thy3f 74 (31%) 347 (30%) 321 (28%) 317 (28%) 325 (28%) 264 (23%)
    Thy4 10 (4.3%) 54 (4.7%) 69 (6.0%) 79 (6.9%) 68 (5.9%) 74 (6.4%)
    Thy5 17 (7.2%) 117 (10%) 116 (10%) 118 (10%) 90 (7.8%) 123 (11%)
    Unknown 915 0 0 0 0 0
final_pathology





    Benign 1,050 (91%) 1,050 (91%) 1,050 (91%) 1,050 (91%) 1,050 (91%) 1,050 (91%)
    Cancer 100 (8.7%) 100 (8.7%) 100 (8.7%) 100 (8.7%) 100 (8.7%) 100 (8.7%)
1

Median (IQR); n (%)

Characteristic

Original, N = 1,150

1

1, N = 1,150

1

2, N = 1,150

1

3, N = 1,150

1

4, N = 1,150

1

5, N = 1,150

1
.id 576 (288, 863) 576 (288, 863) 576 (288, 863) 576 (288, 863) 576 (288, 863) 576 (288, 863)
Age 55 (41, 68) 55 (41, 68) 55 (41, 68) 55 (41, 68) 55 (41, 68) 55 (41, 68)
gender





    F 903 (79%) 903 (79%) 903 (79%) 903 (79%) 903 (79%) 903 (79%)
    M 247 (21%) 247 (21%) 247 (21%) 247 (21%) 247 (21%) 247 (21%)
ethnicity





    A 812 (74%) 851 (74%) 845 (73%) 850 (74%) 849 (74%) 853 (74%)
    B 3 (0.3%) 3 (0.3%) 3 (0.3%) 4 (0.3%) 3 (0.3%) 3 (0.3%)
    C 44 (4.0%) 47 (4.1%) 46 (4.0%) 46 (4.0%) 45 (3.9%) 45 (3.9%)
    D 2 (0.2%) 2 (0.2%) 2 (0.2%) 2 (0.2%) 2 (0.2%) 2 (0.2%)
    F 4 (0.4%) 4 (0.3%) 4 (0.3%) 4 (0.3%) 4 (0.3%) 4 (0.3%)
    G 7 (0.6%) 7 (0.6%) 7 (0.6%) 7 (0.6%) 7 (0.6%) 8 (0.7%)
    H 8 (0.7%) 8 (0.7%) 8 (0.7%) 8 (0.7%) 8 (0.7%) 8 (0.7%)
    J 46 (4.2%) 47 (4.1%) 49 (4.3%) 47 (4.1%) 49 (4.3%) 48 (4.2%)
    K 11 (1.0%) 11 (1.0%) 15 (1.3%) 11 (1.0%) 11 (1.0%) 11 (1.0%)
    L 16 (1.5%) 16 (1.4%) 16 (1.4%) 16 (1.4%) 17 (1.5%) 16 (1.4%)
    M 16 (1.5%) 17 (1.5%) 17 (1.5%) 18 (1.6%) 18 (1.6%) 16 (1.4%)
    N 18 (1.6%) 19 (1.7%) 19 (1.7%) 19 (1.7%) 19 (1.7%) 20 (1.7%)
    P 14 (1.3%) 14 (1.2%) 15 (1.3%) 14 (1.2%) 14 (1.2%) 14 (1.2%)
    R 6 (0.5%) 6 (0.5%) 6 (0.5%) 6 (0.5%) 6 (0.5%) 6 (0.5%)
    S 38 (3.4%) 40 (3.5%) 39 (3.4%) 39 (3.4%) 40 (3.5%) 39 (3.4%)
    Z 57 (5.2%) 58 (5.0%) 59 (5.1%) 59 (5.1%) 58 (5.0%) 57 (5.0%)
    Unknown 48 0 0 0 0 0
incidental_nodule 620 (54%) 623 (54%) 623 (54%) 624 (54%) 624 (54%) 622 (54%)
    Unknown 5 0 0 0 0 0
palpable_nodule 441 (40%) 462 (40%) 472 (41%) 465 (40%) 469 (41%) 467 (41%)
    Unknown 58 0 0 0 0 0
rapid_enlargement 19 (1.7%) 19 (1.7%) 19 (1.7%) 19 (1.7%) 20 (1.7%) 20 (1.7%)
    Unknown 43 0 0 0 0 0
compressive_symptoms 88 (8.4%) 94 (8.2%) 100 (8.7%) 94 (8.2%) 89 (7.7%) 95 (8.3%)
    Unknown 106 0 0 0 0 0
hypertension 262 (26%) 282 (25%) 284 (25%) 285 (25%) 281 (24%) 286 (25%)
    Unknown 126 0 0 0 0 0
vocal_cord_paresis 3 (0.3%) 3 (0.3%) 3 (0.3%) 3 (0.3%) 3 (0.3%) 3 (0.3%)
    Unknown 76 0 0 0 0 0
graves_disease 17 (1.6%) 17 (1.5%) 17 (1.5%) 17 (1.5%) 17 (1.5%) 17 (1.5%)
    Unknown 67 0 0 0 0 0
hashimotos_thyroiditis 7 (0.6%) 7 (0.6%) 7 (0.6%) 7 (0.6%) 7 (0.6%) 7 (0.6%)
    Unknown 73 0 0 0 0 0
family_history_thyroid_cancer 8 (0.9%) 9 (0.8%) 8 (0.7%) 11 (1.0%) 9 (0.8%) 9 (0.8%)
    Unknown 281 0 0 0 0 0
exposure_radiation 9 (0.9%) 9 (0.8%) 9 (0.8%) 10 (0.9%) 9 (0.8%) 9 (0.8%)
    Unknown 121 0 0 0 0 0
Albumin 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0) 45.0 (43.0, 47.0)
    Unknown 515 0 0 0 0 0
TSH value 1.48 (0.85, 2.30) 1.50 (0.87, 2.40) 1.50 (0.90, 2.30) 1.50 (0.88, 2.30) 1.50 (0.87, 2.30) 1.50 (0.85, 2.40)
    Unknown 413 0 0 0 0 0
Lymphocytes 1.94 (1.51, 2.43) 1.94 (1.54, 2.43) 1.94 (1.51, 2.44) 1.95 (1.54, 2.44) 1.95 (1.53, 2.42) 1.94 (1.51, 2.44)
    Unknown 359 0 0 0 0 0
Monocytes 0.52 (0.42, 0.66) 0.53 (0.42, 0.66) 0.52 (0.42, 0.66) 0.53 (0.43, 0.66) 0.52 (0.42, 0.66) 0.52 (0.42, 0.66)
    Unknown 363 0 0 0 0 0
bta_u_classification





    U1 1 (<0.1%) 1 (<0.1%) 1 (<0.1%) 1 (<0.1%) 1 (<0.1%) 1 (<0.1%)
    U2 860 (78%) 902 (78%) 904 (79%) 904 (79%) 904 (79%) 899 (78%)
    U3 210 (19%) 216 (19%) 216 (19%) 216 (19%) 215 (19%) 219 (19%)
    U4 22 (2.0%) 23 (2.0%) 22 (1.9%) 22 (1.9%) 22 (1.9%) 24 (2.1%)
    U5 7 (0.6%) 8 (0.7%) 7 (0.6%) 7 (0.6%) 8 (0.7%) 7 (0.6%)
    Unknown 50 0 0 0 0 0
solitary_nodule 320 (28%) 321 (28%) 321 (28%) 320 (28%) 323 (28%) 321 (28%)
    Unknown 8 0 0 0 0 0
Nodule size (mm) 14 (7, 28) 13 (6, 27) 12 (6, 26) 12 (6, 26) 13 (6, 26) 13 (6, 26)
    Unknown 319 0 0 0 0 0
cervical_lymphadenopathy 26 (2.3%) 26 (2.3%) 26 (2.3%) 26 (2.3%) 26 (2.3%) 26 (2.3%)
    Unknown 9 0 0 0 0 0
thy_classification





    Thy1 34 (14%) 206 (18%) 173 (15%) 136 (12%) 175 (15%) 147 (13%)
    Thy1c 8 (3.4%) 46 (4.0%) 38 (3.3%) 69 (6.0%) 36 (3.1%) 32 (2.8%)
    Thy2 63 (27%) 299 (26%) 283 (25%) 323 (28%) 333 (29%) 341 (30%)
    Thy2c 11 (4.7%) 57 (5.0%) 78 (6.8%) 71 (6.2%) 64 (5.6%) 40 (3.5%)
    Thy3a 18 (7.7%) 105 (9.1%) 115 (10%) 117 (10%) 85 (7.4%) 97 (8.4%)
    Thy3f 74 (31%) 291 (25%) 299 (26%) 306 (27%) 325 (28%) 359 (31%)
    Thy4 10 (4.3%) 48 (4.2%) 52 (4.5%) 47 (4.1%) 61 (5.3%) 48 (4.2%)
    Thy5 17 (7.2%) 98 (8.5%) 112 (9.7%) 81 (7.0%) 71 (6.2%) 86 (7.5%)
    Unknown 915 0 0 0 0 0
final_pathology





    Benign 1,050 (91%) 1,050 (91%) 1,050 (91%) 1,050 (91%) 1,050 (91%) 1,050 (91%)
    Cancer 100 (8.7%) 100 (8.7%) 100 (8.7%) 100 (8.7%) 100 (8.7%) 100 (8.7%)
1

Median (IQR); n (%)

3.2 Modelling

TODO - And in light of having removed ?@tbl-data-completness in favour of the imputed datesets this too has been removed? (@ns-rse 2024-07-11). TODO - This table feels like duplication of ?@tbl-data-completeness, perhaps have just one? (@ns-rse 2024-07-11).

The predictor variables selected to predict final_pathology are shown in ?@tbl-predictors

Section that sets up the modelling

The following section is output from a Tidymodel approach to logistic regression to try and work out why variables are not being included.

A total of 1150 patients had complete data for the selected predictor variables (see ?@tbl-predictors). Because of the volume of missing data which if a saturated model were used would include only ~350 people with complete data across all co-variates imputed datasets were analysed instead.

3.2.1 Logistic Regression

3.2.1.1 Clinical Characteristics

Characteristic

OR

1

95% CI

1

p-value

Age 0.98 0.96, 0.99 0.005
gender 1.97 1.11, 3.43 0.019
incidental_nodule 0.99 0.48, 2.02 >0.9
palpable_nodule 2.76 1.29, 6.08 0.010
rapid_enlargement 1.66 0.46, 5.37 0.4
compressive_symptoms 0.98 0.43, 2.09 >0.9
hashimotos_thyroiditis 1.19 0.02, 13.0 >0.9
family_history_thyroid_cancer 3.80 0.48, 21.7 0.15
exposure_radiation 0.00
>0.9
TSH value 1.00 0.94, 1.05 >0.9
Nodule size (mm) 1.03 1.01, 1.05 <0.001
solitary_nodule 1.48 0.85, 2.51 0.2
cervical_lymphadenopathy 6.74 2.37, 18.8 <0.001
1

OR = Odds Ratio, CI = Confidence Interval

3.2.1.2 Biomarkers

Characteristic

OR

1

95% CI

1

p-value

Age 0.98 0.97, 1.00 0.010
gender 1.95 1.15, 3.26 0.012
TSH value 1.01 0.96, 1.06 0.5
Albumin 1.03 0.96, 1.12 0.4
Lymphocytes 1.04 0.74, 1.43 0.8
Monocytes 0.18 0.04, 0.72 0.019
1

OR = Odds Ratio, CI = Confidence Interval

3.2.1.3 Ultrasound 1

Characteristic

OR

1

95% CI

1

p-value

Age 0.99 0.97, 1.01 0.2
gender 1.23 0.63, 2.32 0.5
bta_u_classification


    U1
    U2 461,584 0.00, NA >0.9
    U3 7,642,048 0.00, NA >0.9
    U4 36,642,711 0.00, NA >0.9
    U5 126,534,639 0.00, NA >0.9
thy_classification


    Thy1
    Thy1c 1.63 0.19, 9.45 0.6
    Thy2 0.36 0.09, 1.25 0.11
    Thy2c 0.00 0.00, 0.00 >0.9
    Thy3a 2.24 0.66, 7.80 0.2
    Thy3f 3.01 1.21, 8.47 0.025
    Thy4 7.06 1.93, 26.6 0.003
    Thy5 9.52 3.19, 31.1 <0.001
1

OR = Odds Ratio, CI = Confidence Interval

3.2.1.4 Ultrasound 2

Characteristic

OR

1

95% CI

1

p-value

Age 0.99 0.97, 1.01 0.2
gender 1.19 0.58, 2.34 0.6
incidental_nodule 0.94 0.48, 1.84 0.9
TSH value 0.95 0.89, 1.00 0.091
Nodule size (mm) 1.04 1.02, 1.06 <0.001
solitary_nodule 1.49 0.77, 2.82 0.2
cervical_lymphadenopathy 6.61 1.45, 29.3 0.013
bta_u_classification


    U1
    U2 239,169 0.00, NA >0.9
    U3 3,105,400 0.00, NA >0.9
    U4 14,770,696 0.00, NA >0.9
    U5 39,781,959 0.00, NA >0.9
thy_classification


    Thy1
    Thy1c 0.97 0.06, 7.66 >0.9
    Thy2 0.25 0.06, 0.97 0.049
    Thy2c 0.00 0.00, 1,045,453 >0.9
    Thy3a 2.69 0.76, 9.91 0.13
    Thy3f 2.45 0.92, 7.39 0.089
    Thy4 8.34 2.18, 33.3 0.002
    Thy5 11.1 3.47, 39.6 <0.001
1

OR = Odds Ratio, CI = Confidence Interval

3.2.2 LASSO

## Specify the LASSO model using parsnip, the key here is the use of the glmnet engine which is the R package for
## fitting LASSO regression. Technically the package fits Elastic Net but with a mixture value of 1 it is equivalent to
## a plain LASSO (mixture value of 0 is equivalent to Ridge Regression in an Elastic Net)
tune_spec_lasso <- parsnip::logistic_reg(penalty = hardhat::tune(), mixture = 1) |>
  parsnip::set_engine("glmnet")

## Tune the LASSO parameters via cross-validation
lasso_grid <- tune::tune_grid(
  object = workflows::add_model(thyroid_workflow, tune_spec_lasso),
  resamples = cv_folds,
  grid = dials::grid_regular(penalty(), levels = 50)
)
Figure 24: Autoplot of LASSO grid search
Figure 25: Importance of variables fitted using LASSO

NB - We may wish to inspect the coefficients at each step of tuning. A related example of how to do this can be found in the Tidymodels documentation under the Tuning a glmnet model. This would be desirable as it looks like only two features are selected as being important by this method and so rather than just accepting this I would want to investigate and see how the coefficients changed over iterations. Another useful resource is the glmnet documentation, although note that since we are using the Tidymodels framework the model fit is wrapped up inside (hence the above article on how to extract this information).

3.2.3 Elastic Net

Figure 26: Autoplot of Elastic Net grid search
Figure 27: Importance of variables fitted using Elastic Net

NB - We may wish to inspect the coefficients at each step of tuning. A related example of how to do this can be found in the Tidymodels documentation under the Tuning a glmnet model. This would be desirable as it looks like only two features are selected as being important by this method and so rather than just accepting this I would want to investigate and see how the coefficients changed over iterations. Another useful resource is the glmnet documentation, although note that since we are using the Tidymodels framework the model fit is wrapped up inside (hence the above article on how to extract this information).

3.2.4 Random Forest

3.2.5 Gradient Boosting

        Length Class      Mode   
pre     3      stage_pre  list   
fit     2      stage_fit  list   
post    1      stage_post list   
trained 1      -none-     logical
# A tibble: 23 × 4
   variable             type      role      source  
   <chr>                <list>    <chr>     <chr>   
 1 age_at_scan          <chr [2]> predictor original
 2 gender               <chr [3]> predictor original
 3 ethnicity            <chr [3]> predictor original
 4 incidental_nodule    <chr [3]> predictor original
 5 palpable_nodule      <chr [3]> predictor original
 6 rapid_enlargement    <chr [3]> predictor original
 7 compressive_symptoms <chr [3]> predictor original
 8 hypertension         <chr [3]> predictor original
 9 vocal_cord_paresis   <chr [3]> predictor original
10 graves_disease       <chr [3]> predictor original
# ℹ 13 more rows
               Length Class              Mode       
handle             1  xgb.Booster.handle externalptr
raw            95013  -none-             raw        
niter              1  -none-             numeric    
evaluation_log     2  data.table         list       
call               8  -none-             call       
params            10  -none-             list       
callbacks          1  -none-             list       
feature_names     45  -none-             character  
nfeatures          1  -none-             numeric    
                   Feature   Gain  Cover Frequency
                    <char>  <num>  <num>     <num>
1: bta_u_classification_U2 0.8201 0.5114    0.3106
2:               tsh_value 0.0900 0.0693    0.1677
3:          size_nodule_mm 0.0332 0.3619    0.3634
4:             age_at_scan 0.0314 0.0344    0.1025
5:                 albumin 0.0253 0.0230    0.0559
 [1] ".pred_Benign"                  ".pred_Cancer"                 
 [3] ".pred_class"                   "age_at_scan"                  
 [5] "gender"                        "ethnicity"                    
 [7] "incidental_nodule"             "palpable_nodule"              
 [9] "rapid_enlargement"             "compressive_symptoms"         
[11] "hypertension"                  "vocal_cord_paresis"           
[13] "graves_disease"                "hashimotos_thyroiditis"       
[15] "family_history_thyroid_cancer" "exposure_radiation"           
[17] "albumin"                       "tsh_value"                    
[19] "lymphocytes"                   "monocyte"                     
[21] "bta_u_classification"          "solitary_nodule"              
[23] "size_nodule_mm"                "cervical_lymphadenopathy"     
[25] "thy_classification"            "final_pathology"              
Table 3: Description of variables in the Sheffield Thyroid dataset.
Description
age_at_scan Age
albumin Albumin
bta_u_classification BTA U
cervical_lymphadenopathy Cervical Lymphadenopathy
compressive_symptoms Compressive symptoms
consistency_nodule Nodule consistency
eligibility Eligibility
ethinicity Ethinicity
ethnicity Ethnicity
exposure_radiation Exposure to radiation
family_history_thyroid_cancer Family history of thyroid cancer
final_pathology Final diagnosis
fna_done FNA done
gender Gender
graves_disease Graves’ disease
hashimotos_thyroiditis Hashimoto’s disease
hypertension Hypertension
incidental_nodule Incidental nodule
lymphocytes Lymphocytes
monocyte Monocytes
palpable_nodule Palpable nodule
rapid_enlargement Rapid enlargement
repeat_bta_u_classification Repeat BTA U
repeat_fna_done Repeat FNA
repeat_thy_classification Repeat Thy class
repeat_ultrasound Repeat ultrasound
size_nodule_mm Nodule size (mm)
solitary_nodule Solitary nodule
study_id Study ID
thy_classification Thy classification
thyroid_histology_diagnosis Histology
thyroid_surgery Thyroid surgery
tsh_value TSH value
vocal_cord_paresis Vocal cord paresis

References

Buuren, Stef van, and Karin Groothuis-Oudshoorn. 2011. mice: Multivariate Imputation by Chained Equations in R.” J. Stat. Soft. 45 (December): 1–67. https://doi.org/10.18637/jss.v045.i03.
Efron, Bradley, and Trevor Hastie. 2016. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge Core. Cambridge, England, UK: Cambridge University Press. https://doi.org/10.1017/CBO9781316576533.
Kuhn, Max, and Hadley Wickham. 2020. Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles. https://www.tidymodels.org.
R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Regression Shrinkage and Selection via the Lasso.” 1996. Journal of the Royal Statistical Society. Series B (Methodological). https://www.jstor.org/stable/2346178.
Smith, Gary. 2018. Step away from stepwise.” J. Big Data 5 (1): 1–12. https://doi.org/10.1186/s40537-018-0143-6.
Steyerberg, Ewout W., Marinus J. C. Eijkemans, Frank E. Harrell, and Dik. 2001. “Prognostic Modeling with Logistic Regression Analysis.” Medical Decision Making 21 (1): 45–56. https://doi.org/10.1177/0272989x0102100106.
Thompson, Bruce. 1995. “Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply Here: A Guidelines Editorial.” Educational and Psychological Measurement 55 (4): 525–34. https://doi.org/10.1177/0013164495055004001.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Zou, Hui, and Trevor Hastie. 2005. Regularization and Variable Selection Via the Elastic Net.” J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 (2): 301–20. https://doi.org/10.1111/j.1467-9868.2005.00503.x.

Citation

BibTeX citation:
@online{edafe2024,
  author = {Edafe, Ovie and Shephard, Neil and Sisley, Karen and P
    Balasubramanian, Sabapathy},
  title = {An Investigation of the Predictors of Thyroid Cancer in
    Patients with Thyroid Nodules},
  date = {2024-04-26},
  langid = {en},
  abstract = {An abstract summarising the work undertaken and the
    overall conclusions can be placed here. Sub-headings are currently
    removed because they conflict with those in the body of the text and
    mess up the links in the Table of Contents.}
}
For attribution, please cite this work as:
Edafe, Ovie, Neil Shephard, Karen Sisley, and Sabapathy P Balasubramanian. 2024. “An Investigation of the Predictors of Thyroid Cancer in Patients with Thyroid Nodules.” Sheffield Study on Thyroid Nodules. April 26, 2024.