期刊文献+

基于网络爬虫的民用运力数据获取 被引量:1

Data Acquisition of Civil Capacity Based on Web Crawler
下载PDF
导出
摘要 为解决军事交通运输民用运力数据获取难的问题,采用主题网络爬虫的方法获取相关数据,在Shark-Search算法基础上,考虑URL结构对主题相关度的影响,使用朴素贝叶斯模型,结合TF-IDF算法对主题相关度进行计算.实验表明,该方法可靠有效,可以作为民用运力数据获取的一种补充手段. In order to solve the problem that it is difficult to get the data of civil capacity in military transportation,this paper uses the method of topic web crawler to obtain relevant data.It firstly considers the influence of URL structure on topic relevance based on Shark-Search algorithm.Then,it uses Naive Bayesian model to calculate topic relevance combined with TF-IDF algorithm.The experiment shows that the proposed method is reliable and effective,and can be used as a supplementary means of civil capacity data acquisition.
作者 王鹏 郑贵省 郭强 贾蓓 WANG Peng;ZHENG Guixing;GUO Qiang;JIA Bei(General Courses Department,Army Military Transportation University,Tianjin 300161,China)
出处 《军事交通学院学报》 2020年第1期87-90,共4页 Journal of Military Transportation University
关键词 民用运力数据 主题网络爬虫 主题相关度 civil capacity data topic web crawler topic relevance
  • 相关文献

参考文献2

二级参考文献19

  • 1Fung B C M,Wang K,Ester M.Hierarchical document clustering//Wang John ed.The Encyclopedia of Data Warehousing and Mining,idea Group.2005:970-975.
  • 2Salton G.The SMART Retrieval System-Experiments in Automatic Document Processing.Englewood Cliffs,New Jersey:Prentice Hall Inc,1971.
  • 3Wang Y,Julia H.Document clustering with semantic analysis//Proceedings of the 39th Hawaii International Conferences on System Sciences.Hawaii,US,2006:54-63.
  • 4Hotho A,Staab S,Stumme G.Wordnet improves text document clustering//Proceedings of the Semantic Web Workshop at SIGIR-2003,26th Annual International ACM SIGIR Conference.Toronto,Canada,2003:541-550.
  • 5Hall P,Dowling G.Approximate string matching.Computing Survey,1980,12(4):381-402.
  • 6Coelho T,Calado P,Souza L,Ribeiro-Neto B,Muntz R.Image retrieval using multiple evidence ranking.IEEETransactions on Knowledge and Data Engineering,2004,16(4):408-417.
  • 7Ko Y,Park J,Seo J.Improving text categorization using the importance of sentences.lnformation Processing and Management,2004,40(1):65-79.
  • 8Erkan G,Radev D.Lexrank:Graph-based lexical centrality as salience in text summarization.Journal of Artificial Intelligence Research,2004,22(7):457-479.
  • 9Theobald M,Siddharth J,Paepcke A.SpotSigs:Robust and efficient near duplicate detection in large Web collections//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Singapore,2008:563-570.
  • 10Han J,Kamber M.Data Mining:Concept and Techniques.2nd Edition.San Francisco,CA,USA:Elsevier Inc,2006.

共引文献212

同被引文献6

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部