摘要
为解决军事交通运输民用运力数据获取难的问题,采用主题网络爬虫的方法获取相关数据,在Shark-Search算法基础上,考虑URL结构对主题相关度的影响,使用朴素贝叶斯模型,结合TF-IDF算法对主题相关度进行计算.实验表明,该方法可靠有效,可以作为民用运力数据获取的一种补充手段.
In order to solve the problem that it is difficult to get the data of civil capacity in military transportation,this paper uses the method of topic web crawler to obtain relevant data.It firstly considers the influence of URL structure on topic relevance based on Shark-Search algorithm.Then,it uses Naive Bayesian model to calculate topic relevance combined with TF-IDF algorithm.The experiment shows that the proposed method is reliable and effective,and can be used as a supplementary means of civil capacity data acquisition.
作者
王鹏
郑贵省
郭强
贾蓓
WANG Peng;ZHENG Guixing;GUO Qiang;JIA Bei(General Courses Department,Army Military Transportation University,Tianjin 300161,China)
出处
《军事交通学院学报》
2020年第1期87-90,共4页
Journal of Military Transportation University