中国不同地区群体遗传结构差异及调整策略研究

朱猛; 吕筠; 余灿清; 靳光付; 郭彧; 卞铮; Robin Walters; Iona Millwood; 陈铮鸣; 沈洪兵; 胡志斌; 李立明

Abstract

朱猛,吕筠,余灿清,靳光付,郭彧,卞铮,Robin Walters,Iona Millwood,陈铮鸣,沈洪兵,胡志斌,李立明.中国不同地区群体遗传结构差异及调整策略研究[J].Chinese journal of Epidemiology,2019,40(1):20-25

中国不同地区群体遗传结构差异及调整策略研究

Study on genetic structure differences and adjustment strategies in different areas of China

Received:August 09, 2018

DOI：10.3760/cma.j.issn.0254-6450.2019.01.006

KeyWord: 分子流行病学群体遗传结构地区差异混合线性模型

English Key Word: Molecular epidemiology Population genetic structure Area differences Linear mixed model

FundProject:国家自然科学基金（81390540，81390541，81390543）；国家重点研发计划精准医学研究重点专项（2016YFC0900500，2016YFC0900501，2016YFC0900504）；中国香港Kadoorie Charitable基金；英国Wellcome Trust（202922/Z/16/Z，088158/Z/09/Z，104085/Z/14/Z）

Author Name	Affiliation	E-mail
Zhu Meng	Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China
Lyu Jun	Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China Key Laboratory of Molecular Cardiovascular Sciences, Ministry of Education, Peking University, Beijing 100191, China
Yu Canqing	Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
Jin Guangfu	Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China
Guo Yu	Chinese Academy of Medical Sciences, Beijing 100730, China
Bian Zheng	Chinese Academy of Medical Sciences, Beijing 100730, China
Robin Walters	Clinical Trial Service Unit and Epidemiological Studies Unit(CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
Iona Millwood	Clinical Trial Service Unit and Epidemiological Studies Unit(CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
Chen Zhengming	Clinical Trial Service Unit and Epidemiological Studies Unit(CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
Shen Hongbing	Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China
Hu Zhibin	Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China	zhibin_hu@njmu.edu.cn
Li Liming	Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China	lmlee@pumc.edu.cn

Hits: 4299

Download times: 1888

Abstract:

目的描述中国不同地区群体遗传结构特征，探索并评价不同分析方案控制队列样本群体遗传结构混杂因素的效果。方法通过中国慢性病前瞻性研究（CKB）队列10个地区4 500例样本的全基因组关联研究数据，通过主成分分析提取样本第一、二主成分，绘制主成分二维图，并与样本地区来源相比较，分析我国不同地区样本的遗传结构特征。以CKB队列数据为基础，生成存在遗传结构差异、亲缘关系等队列样本特征的模拟数据集，探索并评价不同分析策略对膨胀因子（λ）的控制效果。结果我国不同地区人群存在显著的群体遗传结构差异，人群遗传结构主成分分布与项目地区的地理分布基本一致，第一主成分对应不同地区的纬度，第二主成分对应不同地区的经度。生成的模拟数据集，直接进行关联分析假阳性率较高（λ=1.16），即使调整遗传结构主成分或根据地区进行亚组分析仍无法有效控制λ（λ>1.05）；使用混合线性模型引入亲属关系矩阵作为随机效应量后，无论是否进一步调整遗传结构主成分，λ均得到有效控制（λ=0.99）。结论我国不同地区人群遗传结构存在较大差异，在分子流行病学研究中需要谨慎处理群体遗传结构造成的研究偏倚；针对大队列数据遗传结构复杂、亲缘关系广泛等特征，需要使用混合线性模型进行关联分析。

English Abstract:

Objective To describe the genetic structure of populations in different areas of China, and explore the effects of different strategies to control the confounding factors of the genetic structure in cohort studies. Methods By using the genome-wide association study (GWAS) on data of 4 500 samples from 10 areas of the China Kadoorie Biobank (CKB), we performed principal components analysis to extract the first and second principal components of the samples for the component two-dimensional diagram generation, and then compared them with the source of sample area to analyze the characteristics of genetic structure of the samples from different areas of China. Based on the CKB cohort data, a simulation data set with cluster sample characteristics such as genetic structure differences and extensive kinship was generated; and the effects of different analysis strategies including traditional analysis scheme and mixed linear model on the inflation factor (λ) were evaluated. Results There were significant genetic structure differences in different areas of China. Distribution of the principal components of the population genetic structure was basically consistent with the geographical distribution of the project area. The first principal component corresponds to the latitude of different areas, and the second principal component corresponds to the longitude of different areas. The generated simulation data showed high false positive rate (λ=1.16), even if the principal components of the genetic structure was adjusted or the area specific subgroup analysis was performed, λ could not be effectively controlled (λ>1.05); while, by using a mixed linear model adjusting for the kinship matrix, λ was effectively controlled regardless of whether the genetic structure principal component was further adjusted (λ=0.99). Conclusions There were large differences in genetic structure among populations in different areas of China. In molecular epidemiology studies, bias caused by population genetic structure needs to be carefully treated. For large cohort data with complex genetic structure and extensive kinship, it is necessary to use a mixed linear model for association analysis.

View Fulltext Html FullText View/Add Comment Download reader