文章摘要
江震,斗智,宋炜路,徐杰,吴尊友.MSM人群HIV感染者病毒载量抽样调查缺失数据填补方法研究[J].中华流行病学杂志,2017,38(11):1563-1568
MSM人群HIV感染者病毒载量抽样调查缺失数据填补方法研究
Comparison of different methods in dealing with HIV viral load data with diversified missing value mechanism on HIV positive MSM
收稿日期:2017-01-20  出版日期:2017-11-11
DOI:10.3760/cma.j.issn.0254-6450.2017.11.025
中文关键词: 艾滋病病毒;病毒载量;缺失数据;多重填补;马尔科夫链蒙特卡罗法
英文关键词: HIV;Viral load;Missing data;Multiple imputation;Markov Chain Monte Carlo
基金项目:国家科技重大专项(2012ZX10001-007-005)
作者单位E-mail
江震 102206 北京, 中国疾病预防控制中心性病艾滋病预防控制中心预防干预室  
斗智 102206 北京, 中国疾病预防控制中心性病艾滋病预防控制中心预防干预室  
宋炜路 102206 北京, 中国疾病预防控制中心性病艾滋病预防控制中心预防干预室  
徐杰 102206 北京, 中国疾病预防控制中心性病艾滋病预防控制中心预防干预室 xujie@chinaaids.cn 
吴尊友 102206 北京, 中国疾病预防控制中心性病艾滋病预防控制中心  
摘要点击次数: 797
全文下载次数: 782
中文摘要:
      目的 探讨不同缺失数据填补法对MSM人群HIV感染者(MSM感染者)病毒载量(VL)缺失数据的填补效果。方法 以2013年中国16个大城市MSM感染者VL抽样检测数据为基础,采用SPSS 17.0软件,模拟完整数据集和5种不同类型的缺失数据集,采用最大期望值法(EM)、回归法、均值填补法、删除法、马尔科夫链蒙特卡罗法(MCMC)对5种VL缺失数据填补处理,从数据分布、准确度、精确度3个方面比较填补效果。结果 VL数据呈偏态非连续分布,难以进行有效正态分布转化;不同填补方法对完全随机缺失数据填补效果均较好;对于其他类型缺失数据,回归法、MCMC较好保留完整数据主要分布特征;EM、回归法、均值填补法、删除法普遍低估数据均值,MCMC多高估数据均值。结论 MCMC可作为首选的VL数据对数转换后缺失数据填补方法。填补数据可作为调查人群VL均值水平估算的参考依据。
英文摘要:
      Objective To compare results of different methods in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.
查看全文   Html全文     查看/发表评论  下载PDF阅读器
关闭