朱猛,吕筠,余灿清,靳光付,郭彧,卞铮,Robin Walters,Iona Millwood,陈铮鸣,沈洪兵,胡志斌,李立明.中国不同地区群体遗传结构差异及调整策略研究[J].中华流行病学杂志,2019,40(1):20-25 |
中国不同地区群体遗传结构差异及调整策略研究 |
Study on genetic structure differences and adjustment strategies in different areas of China |
收稿日期:2018-08-09 出版日期:2019-01-14 |
DOI:10.3760/cma.j.issn.0254-6450.2019.01.006 |
中文关键词: 分子流行病学 群体遗传结构 地区差异 混合线性模型 |
英文关键词: Molecular epidemiology Population genetic structure Area differences Linear mixed model |
基金项目:国家自然科学基金(81390540,81390541,81390543);国家重点研发计划精准医学研究重点专项(2016YFC0900500,2016YFC0900501,2016YFC0900504);中国香港Kadoorie Charitable基金;英国Wellcome Trust(202922/Z/16/Z,088158/Z/09/Z,104085/Z/14/Z) |
|
摘要点击次数: 4515 |
全文下载次数: 1950 |
中文摘要: |
目的 描述中国不同地区群体遗传结构特征,探索并评价不同分析方案控制队列样本群体遗传结构混杂因素的效果。方法 通过中国慢性病前瞻性研究(CKB)队列10个地区4 500例样本的全基因组关联研究数据,通过主成分分析提取样本第一、二主成分,绘制主成分二维图,并与样本地区来源相比较,分析我国不同地区样本的遗传结构特征。以CKB队列数据为基础,生成存在遗传结构差异、亲缘关系等队列样本特征的模拟数据集,探索并评价不同分析策略对膨胀因子(λ)的控制效果。结果 我国不同地区人群存在显著的群体遗传结构差异,人群遗传结构主成分分布与项目地区的地理分布基本一致,第一主成分对应不同地区的纬度,第二主成分对应不同地区的经度。生成的模拟数据集,直接进行关联分析假阳性率较高(λ=1.16),即使调整遗传结构主成分或根据地区进行亚组分析仍无法有效控制λ(λ>1.05);使用混合线性模型引入亲属关系矩阵作为随机效应量后,无论是否进一步调整遗传结构主成分,λ均得到有效控制(λ=0.99)。结论 我国不同地区人群遗传结构存在较大差异,在分子流行病学研究中需要谨慎处理群体遗传结构造成的研究偏倚;针对大队列数据遗传结构复杂、亲缘关系广泛等特征,需要使用混合线性模型进行关联分析。 |
英文摘要: |
Objective To describe the genetic structure of populations in different areas of China, and explore the effects of different strategies to control the confounding factors of the genetic structure in cohort studies. Methods By using the genome-wide association study (GWAS) on data of 4 500 samples from 10 areas of the China Kadoorie Biobank (CKB), we performed principal components analysis to extract the first and second principal components of the samples for the component two-dimensional diagram generation, and then compared them with the source of sample area to analyze the characteristics of genetic structure of the samples from different areas of China. Based on the CKB cohort data, a simulation data set with cluster sample characteristics such as genetic structure differences and extensive kinship was generated; and the effects of different analysis strategies including traditional analysis scheme and mixed linear model on the inflation factor (λ) were evaluated. Results There were significant genetic structure differences in different areas of China. Distribution of the principal components of the population genetic structure was basically consistent with the geographical distribution of the project area. The first principal component corresponds to the latitude of different areas, and the second principal component corresponds to the longitude of different areas. The generated simulation data showed high false positive rate (λ=1.16), even if the principal components of the genetic structure was adjusted or the area specific subgroup analysis was performed, λ could not be effectively controlled (λ>1.05); while, by using a mixed linear model adjusting for the kinship matrix, λ was effectively controlled regardless of whether the genetic structure principal component was further adjusted (λ=0.99). Conclusions There were large differences in genetic structure among populations in different areas of China. In molecular epidemiology studies, bias caused by population genetic structure needs to be carefully treated. For large cohort data with complex genetic structure and extensive kinship, it is necessary to use a mixed linear model for association analysis. |
查看全文
Html全文
查看/发表评论 下载PDF阅读器 |
|
关闭 |