Start Date
9-10-2025 12:00 PM
End Date
9-10-2025 1:30 PM
Description
Objective:
(i) To introduce a methodology based on TabTransformers to model Diabetes risk scores that can continuously be updated to better adjust to target populations. Benefits include enhanced accuracy, increased sensitivity to local risk factors, and good response to missing or incomplete data.
(ii) Identify the accuracy and benefits of the methodology in existing datasets.
Methods:
The methodology employs a deep neural network architecture known as TabTransformer, which extends the self-attention mechanism, initially developed for translation and language tasks, to tabular data.
The model is initially pretrained with a portion (60%) of data available from Jackson Hearth Study which is a cross-sectional study that was conducted in a sample population of African Americans (20 to 95 years) with and without diabetes (n=3,098). Selected risk factors associated with diabetes are based on the socioecological model. The model is then validated with 20% of the data, evaluated with ROC methodology, and finally adjusted with the final 20% to evaluate the continuous update capability of the TabTransformer.
Results:
The accuracy of the multi-logistic regression method was around 78% while the artificial deep neural network around 80%.
Conclusion:
The proposed model complements the original multi-logistics regression approach with improved accuracy, extra benefits include the easier way to develop the score, better prediction of local factors, insensitivity to missing data, and online training capabilities. All models are appropriate for the development of apps and integration with IT tools.
Continuously Generated Diabetes Type 2 Risk scores with Deep Learning
Objective:
(i) To introduce a methodology based on TabTransformers to model Diabetes risk scores that can continuously be updated to better adjust to target populations. Benefits include enhanced accuracy, increased sensitivity to local risk factors, and good response to missing or incomplete data.
(ii) Identify the accuracy and benefits of the methodology in existing datasets.
Methods:
The methodology employs a deep neural network architecture known as TabTransformer, which extends the self-attention mechanism, initially developed for translation and language tasks, to tabular data.
The model is initially pretrained with a portion (60%) of data available from Jackson Hearth Study which is a cross-sectional study that was conducted in a sample population of African Americans (20 to 95 years) with and without diabetes (n=3,098). Selected risk factors associated with diabetes are based on the socioecological model. The model is then validated with 20% of the data, evaluated with ROC methodology, and finally adjusted with the final 20% to evaluate the continuous update capability of the TabTransformer.
Results:
The accuracy of the multi-logistic regression method was around 78% while the artificial deep neural network around 80%.
Conclusion:
The proposed model complements the original multi-logistics regression approach with improved accuracy, extra benefits include the easier way to develop the score, better prediction of local factors, insensitivity to missing data, and online training capabilities. All models are appropriate for the development of apps and integration with IT tools.
