呼吸系统疾病专病队列研究的标准制定与数据共享

孙一鑫; 裴正存; 詹思延

Abstract

孙一鑫,裴正存,詹思延.呼吸系统疾病专病队列研究的标准制定与数据共享[J].Chinese journal of Epidemiology,2018,39(2):233-239

呼吸系统疾病专病队列研究的标准制定与数据共享

Data harmonization and sharing in study cohorts of respiratory diseases

Received:July 10, 2017

DOI：10.3760/cma.j.issn.0254-6450.2018.02.019

KeyWord: 呼吸系统疾病队列研究通用数据标准数据整合数据共享

English Key Word: Cohort study of respiratory diseases Common data model Data harmonization Data sharing

FundProject:国家重点研发计划项目（2016YFC0901100）

Author Name	Affiliation	E-mail
Sun Yixin	Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
Pei Zhengcun	Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
Zhan Siyan	Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China	siyan-zhan@bjmu.edu.cn

Hits: 7206

Download times: 3206

Abstract:

目的慢性阻塞性肺疾病、哮喘、间质性肺疾病和肺血栓栓塞症是重大呼吸系统疾病，严重危害我国居民健康，整合并开展大规模人群队列研究有助于观察疾病的暴露、发病与转归情况。本研究针对我国社区与临床队列资源的多源异构现状，制定呼吸系统疾病专病队列（呼吸专病队列）数据标准，为解决多源异构数据所致共享障碍，以及项目最大程度开展数据交换、整合、共享、储存与利用提供思路与方法。方法呼吸专病队列数据标准制定思路：①学习、参考国际标准，包括临床数据交换标准协会（CDISC）的CDASH模型，观察性医疗结果合作组织（OMOP）的CDM通用数据模型；②整理、归纳所纳入的4个呼吸专病队列资源，评估各队列资源间的同质性与整合的可能性；③专家讨论，建立呼吸专病队列数据标准。结果研究纳入的现有呼吸专病队列变量模块同质性较好，基本结构相似，具有数据整合的可行性。参考国际标准，经专家讨论，项目组构建呼吸专病队列的数据标准概念框架，由呼吸专病队列通用数据标准及疾病特异数据标准两部分构成，其中通用数据标准针对各专病队列中均有涉及、能够统一标准的问题或研究变量；特异数据标准则为各疾病特有的问题。经映射匹配，认为该标准与各现有专病队列的变量模块匹配良好，标准可行。结论数据标准建立后，在回顾性整合现有队列资源的同时，使不同项目以相同的定义和标准开展长期随访，收集核心数据集，为未来开展多中心研究扫除因数据标准不一导致的数据共享障碍，更有利于多源的整合与共享。

English Abstract:

Objective Chronic obstructive pulmonary disease, asthma, interstitial lung disease and pulmonary thromboembolism are the most common and severe respiratory diseases, which seriously jeopardizing the health of the Chinese citizens. Large-scale prospective cohort studies are needed to explore the relationships between potential risk factors and respiratory disease outcomes and to observe disease prognoses through long-term follow-ups. We aimed to develop a common data model (CDM) for cohort studies on respiratory diseases, in order to harmonize and facilitate the exchange, pooling, sharing, and storing of data from multiple sources to serve the purpose of reusing or uniforming those follow-up data appeared in the cohorts. Methods The process of developing this CDM of respiratory diseases would follow the steps as:①Reviewing the international standards, including the Clinical Data Interchange Standards Consortium (CDISC), Clinical Data Acquisition Standards Harmonization (CDASH) and the Observational Medical Outcomes Partnership (OMOP) CDM; ②Summarizing four cohort studies of respiratory diseases recruited in this research and assessing the data availability; ③Developing a CDM related to respiratory diseases. Results Data on recruited cohorts shared a few similar domains but with various schema. The cohorts also shared homogeneous data collection purposes for future follow-up studies, making the harmonization of current and future data feasible. The derived CDM would include two parts:①thirteen common domains for all the four cohorts and derived variables from disparate questions with a common schema, ②additional domains designed upon disease-specific research needs, as well as additional variables that were disease-specific but not initially included in the common domains. Conclusion Data harmonization appeared essential for sharing, comparing and pooled analyses, both retrospectively and prospectively. CDM was needed to convert heterogeneous data from multiple studies into one harmonized dataset. The use of a CDM in multicenter respiratory cohort studies would make the constant collection of uniformed data possible, so to guarantee the data exchange and sharing in the future.

View Fulltext Html FullText View/Add Comment Download reader