Abstract
苏文钰,董士红,葛怀举,于晴,马桂峰.基于可解释机器学习的老年人抑郁症状预测及影响因素分析[J].Chinese journal of Epidemiology,2025,46(2):316-324
基于可解释机器学习的老年人抑郁症状预测及影响因素分析
Prediction of depression symptoms in seniors and analysis of influencing factors based on explainable machine learning
Received:August 09, 2024  
DOI:10.3760/cma.j.cn112338-20240809-00488
KeyWord: 老年人  抑郁症状  影响因素  机器学习  Shapley加性解释
English Key Word: Elderly  Depressive symptoms  Influencing factors  Machine learning  Shapley additive interpretation
FundProject:国家食品安全风险中心联合攻关计划(LH2022GG06)
Author NameAffiliationE-mail
Su Wenyu School of Public Health, Shandong Second Medical University, Weifang 261053, China  
Dong Shihong School of Public Health, Shandong Second Medical University, Weifang 261053, China  
Ge Huaiju School of Public Health, Shandong Second Medical University, Weifang 261053, China  
Yu Qing School of Public Health, Shandong Second Medical University, Weifang 261053, China  
Ma Guifeng School of Public Health, Shandong Second Medical University, Weifang 261053, China maguifeng10@126.com 
Hits: 416
Download times: 134
Abstract:
      目的 旨在构建预测老年人抑郁症状的机器学习模型,利用shapley加性解释(SHAP)方法分析影响老年人抑郁的关键影响因素。方法 根据2018年中国健康与养老追踪调查数据库筛选出5 954名老年人样本,使用支持向量机递归特征消除、极端梯度提升(XGBoost)-递归特征消除(RFE)及套索算法实现特征选择,结合logistic回归、决策树、随机森林、支持向量机、XGBoost,探索不同模型对老年人抑郁症状的分类效果,最后基于SHAP方法对受试者工作特征曲线下面积(AUC)最大的模型进行解释分析。结果 15种预测模型的准确率为0.702~0.743,AUC为0.730~0.795,灵敏度为0.546~0.588,特异度为0.783~0.865;XGBoost-RFE-XGBoost模型的AUC最高;基于SHAP值得出影响老年人抑郁症状的前4位重要因素是对生活满意度、夜晚睡眠时长、失能情况、自评健康。结论 本研究开发了高效可解释的老年人抑郁症状风险预测模型,有助于识别高风险老年人并给予个性化干预。
English Abstract:
      Objective This study aims to construct a machine learning model to predict depression symptoms in the elderly and analyze the key influencing factors of depression in the elderly using the shapley additive interpretation (SHAP) method. Methods Based on entries from the 2018 China Health and Retirement Longitudinal Study database, a sample of 5 954 elderly individuals was selected. Feature selection using Support Vector Machine Recursive Feature Elimination, Extreme Gradient Boosting (XGBoost) - Recursive Feature Elimination (RFE), and the Lasso algorithm, which was combined with five classifiers-logistic regression, decision trees, random forests, support vector machines, and XGBoost-to explore the classification effectiveness for depressive symptoms in the elderly. Finally, the SHAP method was used to interpret the analysis of the model with the highest receiver operating characteristic curve areas under the curve (AUC). Results The accuracy of 15 prediction models ranged from 0.702 to 0.743, with AUC between 0.730 and 0.795. Sensitivity was reported at 0.546 to 0.588, while specificity ranges from 0.783 to 0.865. The model XGBoost-RFE-XGBoost presented the highest AUC. Based on SHAP values, the top four factors influencing depressive symptoms in older adults were life satisfaction, duration of nighttime sleep, disability status, and self-rated health. Conclusion This study developed a highly efficient and interpretable risk prediction model for depressive symptoms in older adults, which could help identify high-risk older adults and give personalized interventions.
View Fulltext   Html FullText     View/Add Comment  Download reader
Close