赵振平,李艳,王丽敏,张梅,黄正京,张德韬,刘江美,毛凡,周宇畅,刘亚宁,聂超,周脉耕.自组织神经网络在长寿基因研究设计中的应用:巢式病例对照研究样本选择[J].Chinese journal of Epidemiology,2023,44(2):326-334 |
自组织神经网络在长寿基因研究设计中的应用:巢式病例对照研究样本选择 |
Application of self-organizing maps in the design of longevity genetic research: sample selection in a nested case-control study |
Received:June 16, 2022 |
DOI:10.3760/cma.j.cn112338-20220616-00536 |
KeyWord: 长寿 队列 巢式病例对照 全基因组关联研究 |
English Key Word: Longevity Cohort Nested case-control Genome-wide association studies |
FundProject:国家自然科学基金专项(81941025);国家重大公共卫生服务项目 |
Author Name | Affiliation | E-mail | Zhao Zhenping | National Center for Chronic and Non-communicable Diseases Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China | | Li Yan | BGI Shenzhen, Shenzhen 518083, China | | Wang Limin | National Center for Chronic and Non-communicable Diseases Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China | | Zhang Mei | National Center for Chronic and Non-communicable Diseases Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China | | Huang Zhengjing | National Center for Chronic and Non-communicable Diseases Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China | | Zhang Detao | BGI Shenzhen, Shenzhen 518083, China | | Liu Jiangmei | National Center for Chronic and Non-communicable Diseases Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China | | Mao Fan | National Center for Chronic and Non-communicable Diseases Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China | | Zhou Yuchang | National Center for Chronic and Non-communicable Diseases Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China | | Liu Yaning | National Center for Chronic and Non-communicable Diseases Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China | | Nie Chao | BGI Shenzhen, Shenzhen 518083, China | | Zhou Maigeng | National Center for Chronic and Non-communicable Diseases Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China | zhoumaigeng@ncncd.chinacdc.cn |
|
Hits: 2553 |
Download times: 859 |
Abstract: |
目的 应用自组织神经网络为长寿研究选择对照组,以改进长寿基因研究设计。方法 本研究基于2013年中国慢性病及其危险因素监测与全国死因监测数据融合形成的自然人群队列,纳入年龄≥90岁的老年人或年龄<80岁且已死亡的汉族人群(对照组),排除死于伤害、传染病、寄生虫病和恶性肿瘤的个案,利用自组织神经网络方法,通过多次迭代和自组织聚类,选取人口学特征、患病、生活习惯、社会行为、精神心理等多方面因素相似的≥90岁老年人和对照组,开展全基因组测序。研究采用PLINK 1.9软件评估测序数据质量,开展常染色体上的单核苷酸多态性(SNPs)和长寿的logistic回归,用Q-Q图可视化SNPs与长寿关联的P值。结果 研究从基线177 099例调查对象中筛选出1 019例人群基因组样本开展全基因组测序,其中长寿组517例、对照组502例。长寿组和对照组在吸烟、饮酒、饮食、睡眠时长、血脂水平和自评口腔健康状况总体相似,在社会经济状况、身体活动时间、BMI和自评健康状况差异较大。全基因组测序结果经质控,4 618 216个SNPs进入关联分析。长寿组相关SNPs分析结果P值的Q-Q图显示在P值1e-4的区域有明显小于预期P值的富集,P<1e-7区域也检出了显著信号。结论 自组织神经网络可综合考虑社会经济和生活行为方式的影响,从大规模自然人群队列中有真实死亡年龄和死亡原因的样本中选取长寿对照样本,提高长寿基因组关联分析检验效能。本研究为大规模自然人群队列筛选样本开展巢式病例研究提供了方法学参考。 |
English Abstract: |
Objective To improve the longevity genetic research study design by applying self-organizing maps to select a control group for longevity study. Methods This study included the Han population aged 90 years and above or less than 80 years who have died (control group) from the natural population-based cohort formed by the fusion of the Chinese Chronic Diseases and Risk Factors Surveillance in 2013 and the China Death Surveillance System. The subjects who died of injury, infectious diseases, parasitic diseases, and malignant tumors were excluded. The self-organizing maps method, with multiple iterations and self-organizing clustering, was used to select similar factors among the population aged 90 years and above and the control group, including demographic characteristics, diseases, living habits, social behaviors, and mental and psychological factors. The study used PLINK 1.9 software to evaluate the quality of whole genome sequencing and to conduct logistic regression of single nucleotide polymorphisms (SNPs) and longevity on autosomes. Q-Q plots were used to visualize the P value associated with SNPs and longevity. Results There were 1 019 samples selected from the baseline of 177 099 survey participants for genome sequencing, including 517 in the longevity group and 502 in the control group. The longevity and the control groups are generally similar in smoking, drinking, diet, sleep duration, blood lipid level, and self-assessment oral health status but differ significantly in socio-economic status, physical activity time, BMI, and self-assessment health status. The whole genome sequencing results were controlled, and 4 618 216 SNPs were involved in association analysis. The Q-Q plot of longevity-related SNPs analysis results showed that the enrichment of P value 1e-4 was significantly lower than the expected P value, and significant signals were also detected among P<1e-7 regions. Conclusions The self-organizing maps can comprehensively consider the influence of socioeconomic and behavioral risk factors and select longevity control samples among samples with actual age and cause of death in a large-scale natural population cohort to improve the efficiency of longevity genome association analysis. This study provides a methodological reference for nested case-control study sample selection from the large-scale natural population cohort. |
View Fulltext
Html FullText
View/Add Comment Download reader |
Close |
|
|
|