文章摘要
朱猛,吕筠,余灿清,靳光付,郭彧,卞铮,Robin Walters,Iona Millwood,陈铮鸣,沈洪兵,胡志斌,李立明.中国不同地区群体遗传结构差异及调整策略研究[J].中华流行病学杂志,2019,40(1):20-25
中国不同地区群体遗传结构差异及调整策略研究
Study on genetic structure differences and adjustment strategies in different areas of China
收稿日期:2018-08-09  出版日期:2019-01-14
DOI:10.3760/cma.j.issn.0254-6450.2019.01.006
中文关键词: 分子流行病学  群体遗传结构  地区差异  混合线性模型
英文关键词: Molecular epidemiology  Population genetic structure  Area differences  Linear mixed model
基金项目:国家自然科学基金(81390540,81390541,81390543);国家重点研发计划精准医学研究重点专项(2016YFC0900500,2016YFC0900501,2016YFC0900504);中国香港Kadoorie Charitable基金;英国Wellcome Trust(202922/Z/16/Z,088158/Z/09/Z,104085/Z/14/Z)
作者单位E-mail
朱猛 南京医科大学公共卫生学院流行病学系 211166  
吕筠 北京大学公共卫生学院流行病与卫生统计学系 100191
北京大学分子心血管学教育部重点实验室 100191 
 
余灿清 北京大学公共卫生学院流行病与卫生统计学系 100191  
靳光付 南京医科大学公共卫生学院流行病学系 211166  
郭彧 中国医学科学院, 北京 100730  
卞铮 中国医学科学院, 北京 100730  
Robin Walters 英国牛津大学临床与流行病学研究中心纳菲尔德人群健康系 OX3 7LF  
Iona Millwood 英国牛津大学临床与流行病学研究中心纳菲尔德人群健康系 OX3 7LF  
陈铮鸣 英国牛津大学临床与流行病学研究中心纳菲尔德人群健康系 OX3 7LF  
沈洪兵 南京医科大学公共卫生学院流行病学系 211166  
胡志斌 南京医科大学公共卫生学院流行病学系 211166 zhibin_hu@njmu.edu.cn 
李立明 北京大学公共卫生学院流行病与卫生统计学系 100191 lmlee@pumc.edu.cn 
摘要点击次数: 4407
全文下载次数: 1925
中文摘要:
      目的 描述中国不同地区群体遗传结构特征,探索并评价不同分析方案控制队列样本群体遗传结构混杂因素的效果。方法 通过中国慢性病前瞻性研究(CKB)队列10个地区4 500例样本的全基因组关联研究数据,通过主成分分析提取样本第一、二主成分,绘制主成分二维图,并与样本地区来源相比较,分析我国不同地区样本的遗传结构特征。以CKB队列数据为基础,生成存在遗传结构差异、亲缘关系等队列样本特征的模拟数据集,探索并评价不同分析策略对膨胀因子(λ)的控制效果。结果 我国不同地区人群存在显著的群体遗传结构差异,人群遗传结构主成分分布与项目地区的地理分布基本一致,第一主成分对应不同地区的纬度,第二主成分对应不同地区的经度。生成的模拟数据集,直接进行关联分析假阳性率较高(λ=1.16),即使调整遗传结构主成分或根据地区进行亚组分析仍无法有效控制λ(λ>1.05);使用混合线性模型引入亲属关系矩阵作为随机效应量后,无论是否进一步调整遗传结构主成分,λ均得到有效控制(λ=0.99)。结论 我国不同地区人群遗传结构存在较大差异,在分子流行病学研究中需要谨慎处理群体遗传结构造成的研究偏倚;针对大队列数据遗传结构复杂、亲缘关系广泛等特征,需要使用混合线性模型进行关联分析。
英文摘要:
      Objective To describe the genetic structure of populations in different areas of China, and explore the effects of different strategies to control the confounding factors of the genetic structure in cohort studies. Methods By using the genome-wide association study (GWAS) on data of 4 500 samples from 10 areas of the China Kadoorie Biobank (CKB), we performed principal components analysis to extract the first and second principal components of the samples for the component two-dimensional diagram generation, and then compared them with the source of sample area to analyze the characteristics of genetic structure of the samples from different areas of China. Based on the CKB cohort data, a simulation data set with cluster sample characteristics such as genetic structure differences and extensive kinship was generated; and the effects of different analysis strategies including traditional analysis scheme and mixed linear model on the inflation factor (λ) were evaluated. Results There were significant genetic structure differences in different areas of China. Distribution of the principal components of the population genetic structure was basically consistent with the geographical distribution of the project area. The first principal component corresponds to the latitude of different areas, and the second principal component corresponds to the longitude of different areas. The generated simulation data showed high false positive rate (λ=1.16), even if the principal components of the genetic structure was adjusted or the area specific subgroup analysis was performed, λ could not be effectively controlled (λ>1.05); while, by using a mixed linear model adjusting for the kinship matrix, λ was effectively controlled regardless of whether the genetic structure principal component was further adjusted (λ=0.99). Conclusions There were large differences in genetic structure among populations in different areas of China. In molecular epidemiology studies, bias caused by population genetic structure needs to be carefully treated. For large cohort data with complex genetic structure and extensive kinship, it is necessary to use a mixed linear model for association analysis.
查看全文   Html全文     查看/发表评论  下载PDF阅读器
关闭