期刊文献+

面向网络新闻的爬虫开发与热点新闻事件分析 被引量:2

Online News Crawler Development and Hot News Event Analysis
下载PDF
导出
摘要 Python平台开发了网络新闻爬虫,通过实验对比常用的网页数据获取方法,本文提出一种适合新闻网页的解析方式,克服了获取动态网页时源码不完整、单一方法解析网页效率低下等弊端,满足了新闻网页抓取、解析、结构化和入库存储等需求。并将新浪新闻中心作为网络新闻数据采集的目标,分析发现新浪新闻近几年的发展情况。此外,利用词频统计等手段对十九大会议新闻进行分析,直观地反映了十九大的核心人物、关键党派、热点话题变化等重要信息。 This paper develops a web news crawler on Python platform,and presents a web page analysis method suitable for news web pages by comparing common web data acquisition methods. The crawler overcomes the disadvantages such as incomplete source code and low efficiency of web page,meets the needs of news web page grabbing,parsing,structuring and warehousing.This paper takes Sina news center as the target of network news data collection,and analyzes the development of Sina news in recent years.In addition,using the word frequency statistics and other means to analyze the news of the 19th congress,it intuitively reflects the important information such as the core figures,key parties and the change of hot topics.
作者 陈思雯 刘海砚 CHEN Siwen;LIU Haiyan(Institute of Geospatial Information,Information Engineering University,Zhengzhou 450001 China)
出处 《测绘与空间地理信息》 2019年第3期100-103,108,共5页 Geomatics & Spatial Information Technology
基金 国家自然科学基金项目(41501446) 地理信息工程国家重点实验室开放基金项目(SKLGIE2015-M-4-3)资助
关键词 网络爬虫 网络新闻 事件分析 十九大 crawler online news evens analysis the 19th CPC National Congress
  • 相关文献

参考文献6

二级参考文献37

  • 1.《中国共产党第十六次全国代表大会文件汇编》[M].人民出版社,2002年版.第31、35-36、31页.
  • 2.《列宁选集》第3卷[M].人民出版社,1995年版.第766页.
  • 3Broder A, Fontoura M, Gabrilovich E, et al. Robust classification of rare queries using Web knowledge [C] //Proc of ACM SIGIR 2007. New York: ACM, 2007: 231-238.
  • 4Bennett P N, Krysta S, Dumais S T. Classification enhanced ranking [C] //Proe of ACM WWW 2010. New York: ACM, 2010:111-120.
  • 5Ryen W W, Peter B, Chen L. Predicting user interests from contextual information [C]//Proc of ACM SIGIR 2009. New York, ACM, 2009 : 363-370.
  • 6Broder A. A taxonomy of web search [J]. ACM SIGIR Forum, 2002: 36(2): 3-10.
  • 7Shen Dou, Pan Rong, Sun Jiantao, et al. Query enrichment for Web-query classification [J]. ACM Trans on Information Systems, 2006, 24(3): 320-352.
  • 8Li Ying, Zheng Zijian, Dai Honghua. KDD CUP-2005 report, Facing a great challenge [J]. ACM SIGKDD Explorations, 2005, 7(2): 91-99.
  • 9Beitzel S M, Jensen E C, Lewis D D, et al. Automatic classification of web queries using labeled and unlabeledtraining data[J]. ACM Trans on Information Systems, 2007, 25(2) (Article No. 9).
  • 10Li Xiao, Wang Yeyi, Acero A. Learning query intent from regularized click graphs [C] //Proc of ACM SIGIR 2008. New York: ACM, 2008: 339-346.

共引文献98

同被引文献18

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部