•  
  •  
 

Document Type

Conference Proceedings

Abstract

Objective: To present the results of artificial intelligence models for estimating Type 2 diabetes risk and to Identify the accuracy and benefits of each methodology in existing datasets. Methods: The methods used include the multivariate logistics regression, a Decision tree classifier, a random forest, a support vector machine, an artificial neural network, and a model based on a deep neural network architecture known as TabTransformer, which extends the self-attention mechanism, initially developed for translation and language tasks, to tabular data. The models were initially pretrained with a portion (80%) of data available from Jackson Hearth Study which is a cross-sectional study that was conducted in a sample population of African Americans (20 to 95 years) with and without diabetes (n=3,098). Selected risk factors associated with diabetes are based on the socioecological model. The models were then validated with 20% of the data, evaluated with ROC methodology. Results: The accuracy of the multi-logistic regression method was around 78%, SVM 77%, Decision Tree 75%, Random Fores 75%, ANN79%, and TabTransformer 80%. TabTransformer showed a better accuracy, better prediction of local factors, insensitivity to missing data, and online training capabilities. Conclusion: The proposed models enrich the original multi-logistics regression approach with improved accuracy, extra benefits include the easier way to develop the score, better prediction of local factors, insensitivity to missing data, and online training capabilities. All models are appropriate for the development of apps and integration with IT tools.

Share

COinS