期刊文献+

基于混沌序列的网页信息关键词爬取方法优化

Optimization of Web Information Keyword Crawling Method Based on Chaotic Sequence
下载PDF
导出
摘要 常规的网页信息关键词爬取方法通过提取网页信息的统一资源定位器(Uniform Resource Locator,URL)来获得网页信息,提取关键词局限于文本字段,导致爬取准确率较低。对此,提出基于混沌序列的网页信息关键词爬取方法。首先,分析信息爬取流程,提取更加详细的全部信息;其次,根据提取原理的不同,划分网页信息提取板块;最后,分析网页信息混沌序列,提取所需网页信息关键词。实验结果表明,采用所提方法时,爬取准确率约为96.8%,相比传统方法提高了6.92%,相对来说,具有较高的准确性。 In conventional web page information keyword methods, web page information is obtained by extracting the Uniform Resource Locator(URL) of the web page information. The extraction of keywords is limited to text fields, resulting in insufficient crawling accuracy. Therefore, a method for crawling web information keywords based on chaotic sequences is proposed. In the research of keyword crawling methods, firstly, analyze the information crawling process and extract more detailed and complete information. Secondly, according to the different extraction principles, divide the webpage information extraction section. Finally, analyze the chaotic sequence of web page information and extract the required web page information keywords. From the experimental results, it can be seen that the crawling accuracy of the proposed method is about 96.8%, which is 6.92% higher than traditional methods. Relatively speaking, the designed crawling method has high accuracy.
作者 王晓宇 王培 WANG Xiaoyu;WANG Pei(School of Computer&Software Engineering,SIAS University,Xinzheng Henan 451150,China)
出处 《信息与电脑》 2023年第23期69-71,共3页 Information & Computer
基金 河南省2021年民办普通高等学校学科专业建设资助项目(项目编号:教办政法[2020]179号,软件工程)。
关键词 PYTHON 网页信息 信息爬取 关键词提炼 Python web page information information crawl keyword extraction
  • 相关文献

参考文献5

二级参考文献26

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部