江震,斗智,宋炜路,徐杰,吴尊友.MSM人群HIV感染者病毒载量抽样调查缺失数据填补方法研究[J].Chinese journal of Epidemiology,2017,38(11):1563-1568 |
MSM人群HIV感染者病毒载量抽样调查缺失数据填补方法研究 |
Comparison of different methods in dealing with HIV viral load data with diversified missing value mechanism on HIV positive MSM |
Received:January 20, 2017 |
DOI:10.3760/cma.j.issn.0254-6450.2017.11.025 |
KeyWord: 艾滋病病毒 病毒载量 缺失数据 多重填补 马尔科夫链蒙特卡罗法 |
English Key Word: HIV Viral load Missing data Multiple imputation Markov Chain Monte Carlo |
FundProject:国家科技重大专项(2012ZX10001-007-005) |
Author Name | Affiliation | E-mail | Jiang Zhen | Division of Prevention and Intervention, Chinese Center for Disease Control and Prevention, Beijing 102206, China | | Dou Zhi | Division of Prevention and Intervention, Chinese Center for Disease Control and Prevention, Beijing 102206, China | | Song Weilu | Division of Prevention and Intervention, Chinese Center for Disease Control and Prevention, Beijing 102206, China | | Xu Jie | Division of Prevention and Intervention, Chinese Center for Disease Control and Prevention, Beijing 102206, China | xujie@chinaaids.cn | Wu Zunyou | National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, China | |
|
Hits: 3395 |
Download times: 1896 |
Abstract: |
目的 探讨不同缺失数据填补法对MSM人群HIV感染者(MSM感染者)病毒载量(VL)缺失数据的填补效果。方法 以2013年中国16个大城市MSM感染者VL抽样检测数据为基础,采用SPSS 17.0软件,模拟完整数据集和5种不同类型的缺失数据集,采用最大期望值法(EM)、回归法、均值填补法、删除法、马尔科夫链蒙特卡罗法(MCMC)对5种VL缺失数据填补处理,从数据分布、准确度、精确度3个方面比较填补效果。结果 VL数据呈偏态非连续分布,难以进行有效正态分布转化;不同填补方法对完全随机缺失数据填补效果均较好;对于其他类型缺失数据,回归法、MCMC较好保留完整数据主要分布特征;EM、回归法、均值填补法、删除法普遍低估数据均值,MCMC多高估数据均值。结论 MCMC可作为首选的VL数据对数转换后缺失数据填补方法。填补数据可作为调查人群VL均值水平估算的参考依据。 |
English Abstract: |
Objective To compare results of different methods in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population. |
View Fulltext
Html FullText
View/Add Comment Download reader |
Close |
|
|
|