文章摘要
周江杰,王胜锋,李立明.Python爬虫技术在信息流行病学中的应用[J].中华流行病学杂志,2020,41(6):952-956
Python爬虫技术在信息流行病学中的应用
Application of Python web crawler technology in infodemiology
收稿日期:2019-09-01  出版日期:2020-06-16
DOI:10.3760/cma.j.cn112338-20190901-00643
中文关键词: Python爬虫技术;信息流行病学;公共卫生监测;健康干预;智慧寻医
英文关键词: Python web crawler technology;Infodemiology;Public health surveillance;Health intervention;Smart doctor seeking
基金项目:
作者单位E-mail
周江杰 北京大学公共卫生学院流行病与卫生统计学系 100191  
王胜锋 北京大学公共卫生学院流行病与卫生统计学系 100191  
李立明 北京大学公共卫生学院流行病与卫生统计学系 100191 lmlee@bjmu.edu.cn 
摘要点击次数: 103
全文下载次数: 76
中文摘要:
      Python网络爬虫技术是一种通过模拟用户的网络浏览行为以实现从网络中自动、大量提取信息的技术,是信息流行病学研究收集并整合多源异构信息数据的关键基础。Python网络爬虫可分为简单爬虫与大型爬虫,集数据采集与数据库构建于一体,语法简洁、灵活性高、学习成本低、维护成本低。它适用于信息流行病学的各种应用场景,通过对互联网中健康相关信息的分析,实现多种公共卫生监测、健康干预实施及效果评价、智慧寻医方略优化等目标。近年,我国政府开始鼓励对含互联网信息在内的多源大数据的整合利用,在此背景下,Python爬虫技术的应用场景势必会越来越多,相应的人才培养、技术革新建议纳入到公共卫生教育和科研体系之中。
英文摘要:
      Python web crawler technology, which automatically and massively getting information from the Internet by mimicking net users’ browsing behavior, is a basic supporting technique to extract and integrate multi-source heterogeneous data in the field of Infodemiology. There are two types of Python web crawler: simple and massive-scale, both collect information simultaneously from the database establishment. Advantages of this technique are characterized as: being simple syntax, in high flexibility and low cost in learning and maintenance. Contents of the current application scenarios include surveillance, implementation and evaluation of health intervention programs on public health issues, as well as on smart doctor seeking. For the last two years, the Chinese government started to encourage the integration and utilization of multi-source heterogeneous data including internet information. Hence, the number of application scenarios for Python web crawler technology are bound to increase in the foreseeable future. Corresponding matched talent cultivations and technical innovations are suggested to add to the current education and research systems on public health issues.
查看全文   Html全文     查看/发表评论  下载PDF阅读器
关闭