Abstract
朱猛,吕筠,余灿清,靳光付,郭彧,卞铮,Robin Walters,Iona Millwood,陈铮鸣,沈洪兵,胡志斌,李立明.中国不同地区群体遗传结构差异及调整策略研究[J].Chinese journal of Epidemiology,2019,40(1):20-25
中国不同地区群体遗传结构差异及调整策略研究
Study on genetic structure differences and adjustment strategies in different areas of China
Received:August 09, 2018  
DOI:10.3760/cma.j.issn.0254-6450.2019.01.006
KeyWord: 分子流行病学  群体遗传结构  地区差异  混合线性模型
English Key Word: Molecular epidemiology  Population genetic structure  Area differences  Linear mixed model
FundProject:国家自然科学基金(81390540,81390541,81390543);国家重点研发计划精准医学研究重点专项(2016YFC0900500,2016YFC0900501,2016YFC0900504);中国香港Kadoorie Charitable基金;英国Wellcome Trust(202922/Z/16/Z,088158/Z/09/Z,104085/Z/14/Z)
Author NameAffiliationE-mail
Zhu Meng Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China  
Lyu Jun Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
Key Laboratory of Molecular Cardiovascular Sciences, Ministry of Education, Peking University, Beijing 100191, China 
 
Yu Canqing Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China  
Jin Guangfu Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China  
Guo Yu Chinese Academy of Medical Sciences, Beijing 100730, China  
Bian Zheng Chinese Academy of Medical Sciences, Beijing 100730, China  
Robin Walters Clinical Trial Service Unit and Epidemiological Studies Unit(CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK  
Iona Millwood Clinical Trial Service Unit and Epidemiological Studies Unit(CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK  
Chen Zhengming Clinical Trial Service Unit and Epidemiological Studies Unit(CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK  
Shen Hongbing Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China  
Hu Zhibin Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China zhibin_hu@njmu.edu.cn 
Li Liming Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China lmlee@pumc.edu.cn 
Hits: 4297
Download times: 1888
Abstract:
      目的 描述中国不同地区群体遗传结构特征,探索并评价不同分析方案控制队列样本群体遗传结构混杂因素的效果。方法 通过中国慢性病前瞻性研究(CKB)队列10个地区4 500例样本的全基因组关联研究数据,通过主成分分析提取样本第一、二主成分,绘制主成分二维图,并与样本地区来源相比较,分析我国不同地区样本的遗传结构特征。以CKB队列数据为基础,生成存在遗传结构差异、亲缘关系等队列样本特征的模拟数据集,探索并评价不同分析策略对膨胀因子(λ)的控制效果。结果 我国不同地区人群存在显著的群体遗传结构差异,人群遗传结构主成分分布与项目地区的地理分布基本一致,第一主成分对应不同地区的纬度,第二主成分对应不同地区的经度。生成的模拟数据集,直接进行关联分析假阳性率较高(λ=1.16),即使调整遗传结构主成分或根据地区进行亚组分析仍无法有效控制λ(λ>1.05);使用混合线性模型引入亲属关系矩阵作为随机效应量后,无论是否进一步调整遗传结构主成分,λ均得到有效控制(λ=0.99)。结论 我国不同地区人群遗传结构存在较大差异,在分子流行病学研究中需要谨慎处理群体遗传结构造成的研究偏倚;针对大队列数据遗传结构复杂、亲缘关系广泛等特征,需要使用混合线性模型进行关联分析。
English Abstract:
      Objective To describe the genetic structure of populations in different areas of China, and explore the effects of different strategies to control the confounding factors of the genetic structure in cohort studies. Methods By using the genome-wide association study (GWAS) on data of 4 500 samples from 10 areas of the China Kadoorie Biobank (CKB), we performed principal components analysis to extract the first and second principal components of the samples for the component two-dimensional diagram generation, and then compared them with the source of sample area to analyze the characteristics of genetic structure of the samples from different areas of China. Based on the CKB cohort data, a simulation data set with cluster sample characteristics such as genetic structure differences and extensive kinship was generated; and the effects of different analysis strategies including traditional analysis scheme and mixed linear model on the inflation factor (λ) were evaluated. Results There were significant genetic structure differences in different areas of China. Distribution of the principal components of the population genetic structure was basically consistent with the geographical distribution of the project area. The first principal component corresponds to the latitude of different areas, and the second principal component corresponds to the longitude of different areas. The generated simulation data showed high false positive rate (λ=1.16), even if the principal components of the genetic structure was adjusted or the area specific subgroup analysis was performed, λ could not be effectively controlled (λ>1.05); while, by using a mixed linear model adjusting for the kinship matrix, λ was effectively controlled regardless of whether the genetic structure principal component was further adjusted (λ=0.99). Conclusions There were large differences in genetic structure among populations in different areas of China. In molecular epidemiology studies, bias caused by population genetic structure needs to be carefully treated. For large cohort data with complex genetic structure and extensive kinship, it is necessary to use a mixed linear model for association analysis.
View Fulltext   Html FullText     View/Add Comment  Download reader
Close