文章摘要
张勇晶,陈坤,金明娟,范春红.应用分类树模型筛选恶性肿瘤危险因素的研究[J].中华流行病学杂志,2006,27(6):540-543
应用分类树模型筛选恶性肿瘤危险因素的研究
Study on the application of classification tree model in screening the risk factors of malignant tumor
收稿日期:2005-11-03  出版日期:2014-09-12
DOI:
中文关键词: 分类树模型  乳腺肿瘤  危险因素  卡方自动交互检测法
英文关键词: Classification tree model  Breast neoplasm  Risk factor  Exhaustive chi-square automatic interaction detection method
基金项目:国家自然科学基金资助项目(30471492)
作者单位E-mail
张勇晶 浙江大学公共卫生学院流行病学与卫生统计学教研室, 杭州 310031  
陈坤 浙江大学公共卫生学院流行病学与卫生统计学教研室, 杭州 310031 ck@zju.edu.cn 
金明娟 浙江大学公共卫生学院流行病学与卫生统计学教研室, 杭州 310031  
范春红 浙江大学公共卫生学院流行病学与卫生统计学教研室, 杭州 310031  
摘要点击次数: 3519
全文下载次数: 1428
中文摘要:
      目的介绍分类树模型筛选恶性肿瘤危险因素基本原理、运算法则和应用价值。方法以浙江省嘉善县乳腺癌现场调查数据为例,采用Exhaustive CHAID法建立分类树模型对调查结果进行危险因素筛选,使用错分概率Risk值和ROC曲线下面积对模型进行评价。结果分类树模型从全部105个候选变量中筛选出9个危险因素,其中职业是最重要的影响因素,工人、教师及退休人员的乳腺癌发生概率显著高于其他人员。另外,模型显示经常参加体育锻炼在不同人群中对乳腺癌的影响效果有所不同。模型错分概率Risk值为0.174,利用预测概率绘制的ROC曲线下面积为0.872,与0.5比较具有显著的统计学意义,模型拟合效果很好。结论分类树模型不仅可以有效挖掘筛选出主要的影响因素,还可以对研究变量科学定义分界点,展示变量间复杂的相互作用,在流行病学研究中具有较高的应用价值。
英文摘要:
      Objective To introduce the partitioning algorithm of classification tree model, and to explore the value of this data mining technique applied in data analysis of multifactorial diseases as malignant tumors. Methods Data was analyzed from a survey that conducted on 84 breast cancer patients and 273 cancer-free controls selected randomly in Jiashan county. The classification tree model was constructed using Exhaustive CHAID method and evaluated by the Risk statistics and the area under the ROC curve. Results 9 out of 105 effect risks factors were selected, in which career was the most important factor indicating that workers, teachers and retirees suffered much more risks than others. Nevertheless, the number of pregnancies, breast examination, reasons for menopause, age at menarche, intake of shrimp, crab, kipper, kelp and laver etc were also risk factors on breast cancer. However, physical exercise played different roles on different people. The Risk statistics of model was 0. 174, and the area under the ROC curve was 0.872 which was significantly different from 0.5, suggesting that the classification tree model fit the actuality very well. Conclusion The classification tree model could screen out the major affecting factors quickly and effectively and could also identify the cutting-points for continuous and ordinal variables,as well as revealing the complex interaction among the factors at many levels. This model might become a powerful tool to explore the complexities of the risks on diseases.
查看全文   Html全文     查看/发表评论  下载PDF阅读器
关闭