期刊文献+

基于Lucene搜索引擎的涉恐信息检索模块设计与实现 被引量:3

Design and realization of terrorism information retrieve module based on Lucene search engine
下载PDF
导出
摘要 互联网中存在大量涉恐信息,加强对这些信息的组织与利用,在防恐、反恐中起着重要的作用。针对网络涉恐信息零散情况,通过网络爬虫技术收集互联网上的涉恐信息,构建涉恐信息数据库;在此基础上,引入中文分词器进行合理粒度分词,使用Lucene构建全文搜索引擎以提升检索效率。同时,在建立索引时根据文档包含涉恐信息特征词汇的数量改变权重,查询时包含多特征词汇的涉恐信息排序更靠前。系统采用Python进行信息采集和数据结构化,使用MySQL构建涉恐信息数据库,通过Lucene构建全文检索引擎,测试表明,该引擎能够快速、准确地完成信息检索。 There is great amount of terrorism information on the Internet.It is of great significance in the fight against terrorism to strengthen the organization and utilization of terrorism information.Aiming at the problem of scattered distribution of terrorism information on the Internet,a terrorism database is constructed by collecting terrorism information from the Internet with web crawler technology and on this foundation,a full-text search engine is built to reach quicker query with Chinese word segmenter,which facilitates a more rational word segmentation.Especially,the weight of document is changed according to the count of terrorism-related vocabulary when building the index to make a more advanced ranking for those with more terrorism-related vocabulary.The system collects and structurizes data with Python,constructs the terrorism database by MySQL and full-text search engine by Lucene,which achieved quick and accurate search.
作者 彭世亮 周欣 卿粼波 熊淑华 何小海 Peng Shiliang;Zhou Xin;Qing Linbo;Xiong Shuhua;He Xiaohai(College of Electronic Information,Sichuan University,Chengdu 610065,China;China Information Technology Security Evaluation Center,Beijing 100085,China)
出处 《信息技术与网络安全》 2019年第11期23-28,共6页 Information Technology and Network Security
关键词 LUCENE 搜索引擎 分词 涉恐 Lucene search engine word segmenter terrorism-related
  • 相关文献

参考文献6

二级参考文献69

  • 1杨安华.民族地区突发事件与危机管理机制研究[J].公共管理学报,2005,2(3):42-48. 被引量:51
  • 2高颖.各国建立反恐情报数据库情况综述[J].国际资料信息,2005(8):10-12. 被引量:7
  • 3骆正清,陈增武,胡上序.一种改进的MM分词方法的算法设计[J].中文信息学报,1996,10(3):30-36. 被引量:28
  • 4石路.论民族地区突发公共事件的预警与防范机制[J].贵州民族研究,2007,27(1):45-49. 被引量:20
  • 5梅建明.反恐情报与危机管理[M].北京:群众出版社,2007.
  • 6Nianwen Xue.Chinese word segmentation as character tagging[J]. International Journal of Computational Linguistics and Chinese Language Processing,2003,8(1):29-48.
  • 7Huihsin Tseng,Pichuan Chang,Galen Andrew,et al.A conditional random field word segmenter for sighan bakeoff 2005[C]//Proceedings of the fourth SIGHAN workshop.2005:168-171.
  • 8Yue Zhang,Stephen Clark.Chinese segmentation with a word-based perceptron algorithm[C]//Proceedings of the 45th ACL.2007:840-847.
  • 9Xu Sun,Yaozhong Zhang,Takuya Matsuzaki,et al.A discriminative latent variable chinese segmenter with hybrid word/character information[C]//Proceedings of NAACL.2009:56-64.
  • 10Hai Zhao,Chang-Ning Huang,Mu Li.An Improved Chinese Word Segmentation System with Conditional Random Field[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 2006:162-165.

共引文献100

同被引文献48

引证文献3

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部