基于机器学习算法的大于胎龄儿风险预测模型

白皙; 罗云云; 周智博; 苏明亮; 杨柳青; 陈适; 阳洪波; 朱惠娟; 潘慧

Abstract

白皙,罗云云,周智博,苏明亮,杨柳青,陈适,阳洪波,朱惠娟,潘慧.基于机器学习算法的大于胎龄儿风险预测模型[J].Chinese journal of Epidemiology,2021,42(12):2143-2148

基于机器学习算法的大于胎龄儿风险预测模型

Development and evaluation of a machine learning prediction model for large for gestational age

Received:August 24, 2021

DOI：10.3760/cma.j.cn112338-20210824-00677

KeyWord: 机器学习大于胎龄儿风险预测模型

English Key Word: Machine learning Large for gestational age Risk prediction model

FundProject:

Author Name	Affiliation	E-mail
Bai Xi	Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China
Luo Yunyun	Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China
Zhou Zhibo	Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China
Su Mingliang	DHC Mediway Technology Co., Ltd, Beijing 100190, China
Yang Liuqing	DHC Mediway Technology Co., Ltd, Beijing 100190, China
Chen Shi	Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China
Yang Hongbo	Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China
Zhu Huijuan	Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China
Pan Hui	Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China	panhui20111111@163.com

Hits: 3823

Download times: 1296

Abstract:

目的开发和验证基于机器学习算法的孕期大于胎龄儿（LGA）风险预测模型，并比较其与传统逻辑回归方法建模的性能差异。方法研究对象来自"中国免费孕前优生健康检查项目"，于2010-2012年在全国31个省市的220个县开展，覆盖全部农村计划妊娠夫妇，本研究选取分娩新生儿胎龄在24~42周内，单胎活产的所有育龄期夫妇及其新生儿为研究对象。应用10种机器学习算法分别建立LGA预测模型，评估模型对LGA的预测性能。结果最终纳入104 936名新生儿，男婴54 856例（52.3%），女婴50 080例（47.7%），LGA的发生率为11.7%（12 279例）。经过下采样数据平衡处理后，机器学习方法建立模型的整体效能出现明显提高，其中以CatBoost模型在预测LGA风险方面表现最佳，模型的受试者工作特征曲线的曲线下面积（AUC）为0.932；逻辑回归模型表现最差，AUC仅为0.555。结论与传统的逻辑回归方法相比，通过机器学习算法可建立更有效的孕期LGA风险预测模型，具有潜在的应用价值。

English Abstract:

Objective To develop and validate a useful predictive model for large gestational age (LGA) in pregnancy using a machine learning (ML) algorithm and compare its performance with the traditional logistic regression model. Methods Data were obtained from the National Free Preconception Health Examination Project in China, carried out in 220 counties of 31 provinces from 2010 to 2012, covering all rural couples with a planned pregnancy. This study included all teams of childbearing age who delivered newborns within 24-42 weeks of gestational age and their newborns. Ten different ML algorithms were used to establish LGA prediction models, and the prediction performance of these models was evaluated. Results A total of 104 936 newborns were included, including 54 856 boys (52.3%) and 50 080 girls (47.7%). The incidence of LGA was 11.7% (12 279). The imbalance between the two groups was addressed by the under- sampling technique, after which the overall performance of the ML models was significantly improved. The CatBoost model achieved the highest area under the receiver-operating-characteristic curve (AUC) value of 0.932. The logistic regression model had the worst performance, with an AUC of 0.555. Conclusions In predicting the risk for LGA in pregnancy, the ML algorithms outperform the traditional logistic regression method. Compared to other ML algorithms, CatBoost could improve the performance, and it deserves further investigation.

View Fulltext Html FullText View/Add Comment Download reader