文章摘要
苏文钰,董士红,葛怀举,于晴,马桂峰.基于可解释机器学习的老年人抑郁症状预测及影响因素分析[J].中华流行病学杂志,2025,46(2):316-324
基于可解释机器学习的老年人抑郁症状预测及影响因素分析
Prediction of depression symptoms in seniors and analysis of influencing factors based on explainable machine learning
收稿日期:2024-08-09  出版日期:2025-02-14
DOI:10.3760/cma.j.cn112338-20240809-00488
中文关键词: 老年人  抑郁症状  影响因素  机器学习  Shapley加性解释
英文关键词: Elderly  Depressive symptoms  Influencing factors  Machine learning  Shapley additive interpretation
基金项目:国家食品安全风险中心联合攻关计划(LH2022GG06)
作者单位E-mail
苏文钰 山东第二医科大学公共卫生学院, 潍坊 261053  
董士红 山东第二医科大学公共卫生学院, 潍坊 261053  
葛怀举 山东第二医科大学公共卫生学院, 潍坊 261053  
于晴 山东第二医科大学公共卫生学院, 潍坊 261053  
马桂峰 山东第二医科大学公共卫生学院, 潍坊 261053 maguifeng10@126.com 
摘要点击次数: 1224
全文下载次数: 385
中文摘要:
      目的 旨在构建预测老年人抑郁症状的机器学习模型,利用shapley加性解释(SHAP)方法分析影响老年人抑郁的关键影响因素。方法 根据2018年中国健康与养老追踪调查数据库筛选出5 954名老年人样本,使用支持向量机递归特征消除、极端梯度提升(XGBoost)-递归特征消除(RFE)及套索算法实现特征选择,结合logistic回归、决策树、随机森林、支持向量机、XGBoost,探索不同模型对老年人抑郁症状的分类效果,最后基于SHAP方法对受试者工作特征曲线下面积(AUC)最大的模型进行解释分析。结果 15种预测模型的准确率为0.702~0.743,AUC为0.730~0.795,灵敏度为0.546~0.588,特异度为0.783~0.865;XGBoost-RFE-XGBoost模型的AUC最高;基于SHAP值得出影响老年人抑郁症状的前4位重要因素是对生活满意度、夜晚睡眠时长、失能情况、自评健康。结论 本研究开发了高效可解释的老年人抑郁症状风险预测模型,有助于识别高风险老年人并给予个性化干预。
英文摘要:
      Objective This study aims to construct a machine learning model to predict depression symptoms in the elderly and analyze the key influencing factors of depression in the elderly using the shapley additive interpretation (SHAP) method. Methods Based on entries from the 2018 China Health and Retirement Longitudinal Study database, a sample of 5 954 elderly individuals was selected. Feature selection using Support Vector Machine Recursive Feature Elimination, Extreme Gradient Boosting (XGBoost) - Recursive Feature Elimination (RFE), and the Lasso algorithm, which was combined with five classifiers-logistic regression, decision trees, random forests, support vector machines, and XGBoost-to explore the classification effectiveness for depressive symptoms in the elderly. Finally, the SHAP method was used to interpret the analysis of the model with the highest receiver operating characteristic curve areas under the curve (AUC). Results The accuracy of 15 prediction models ranged from 0.702 to 0.743, with AUC between 0.730 and 0.795. Sensitivity was reported at 0.546 to 0.588, while specificity ranges from 0.783 to 0.865. The model XGBoost-RFE-XGBoost presented the highest AUC. Based on SHAP values, the top four factors influencing depressive symptoms in older adults were life satisfaction, duration of nighttime sleep, disability status, and self-rated health. Conclusion This study developed a highly efficient and interpretable risk prediction model for depressive symptoms in older adults, which could help identify high-risk older adults and give personalized interventions.
查看全文   Html全文     查看/发表评论  下载PDF阅读器
关闭