Abstract
白皙,罗云云,周智博,苏明亮,杨柳青,陈适,阳洪波,朱惠娟,潘慧.基于机器学习算法的大于胎龄儿风险预测模型[J].Chinese journal of Epidemiology,2021,42(12):2143-2148
基于机器学习算法的大于胎龄儿风险预测模型
Development and evaluation of a machine learning prediction model for large for gestational age
Received:August 24, 2021  
DOI:10.3760/cma.j.cn112338-20210824-00677
KeyWord: 机器学习  大于胎龄儿  风险预测模型
English Key Word: Machine learning  Large for gestational age  Risk prediction model
FundProject:
Author NameAffiliationE-mail
Bai Xi Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China  
Luo Yunyun Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China  
Zhou Zhibo Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China  
Su Mingliang DHC Mediway Technology Co., Ltd, Beijing 100190, China  
Yang Liuqing DHC Mediway Technology Co., Ltd, Beijing 100190, China  
Chen Shi Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China  
Yang Hongbo Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China  
Zhu Huijuan Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China  
Pan Hui Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission/State Key Laboratory of Complex Severe and Rare Diseases/Peking Union Medical College Hospital/Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China panhui20111111@163.com 
Hits: 3827
Download times: 1296
Abstract:
      目的 开发和验证基于机器学习算法的孕期大于胎龄儿(LGA)风险预测模型,并比较其与传统逻辑回归方法建模的性能差异。方法 研究对象来自"中国免费孕前优生健康检查项目",于2010-2012年在全国31个省市的220个县开展,覆盖全部农村计划妊娠夫妇,本研究选取分娩新生儿胎龄在24~42周内,单胎活产的所有育龄期夫妇及其新生儿为研究对象。应用10种机器学习算法分别建立LGA预测模型,评估模型对LGA的预测性能。结果 最终纳入104 936名新生儿,男婴54 856例(52.3%),女婴50 080例(47.7%),LGA的发生率为11.7%(12 279例)。经过下采样数据平衡处理后,机器学习方法建立模型的整体效能出现明显提高,其中以CatBoost模型在预测LGA风险方面表现最佳,模型的受试者工作特征曲线的曲线下面积(AUC)为0.932;逻辑回归模型表现最差,AUC仅为0.555。结论 与传统的逻辑回归方法相比,通过机器学习算法可建立更有效的孕期LGA风险预测模型,具有潜在的应用价值。
English Abstract:
      Objective To develop and validate a useful predictive model for large gestational age (LGA) in pregnancy using a machine learning (ML) algorithm and compare its performance with the traditional logistic regression model. Methods Data were obtained from the National Free Preconception Health Examination Project in China, carried out in 220 counties of 31 provinces from 2010 to 2012, covering all rural couples with a planned pregnancy. This study included all teams of childbearing age who delivered newborns within 24-42 weeks of gestational age and their newborns. Ten different ML algorithms were used to establish LGA prediction models, and the prediction performance of these models was evaluated. Results A total of 104 936 newborns were included, including 54 856 boys (52.3%) and 50 080 girls (47.7%). The incidence of LGA was 11.7% (12 279). The imbalance between the two groups was addressed by the under- sampling technique, after which the overall performance of the ML models was significantly improved. The CatBoost model achieved the highest area under the receiver-operating-characteristic curve (AUC) value of 0.932. The logistic regression model had the worst performance, with an AUC of 0.555. Conclusions In predicting the risk for LGA in pregnancy, the ML algorithms outperform the traditional logistic regression method. Compared to other ML algorithms, CatBoost could improve the performance, and it deserves further investigation.
View Fulltext   Html FullText     View/Add Comment  Download reader
Close