期刊文献+

基于Web的DCI垂直搜索引擎的研究与设计 被引量:7

Research and design of vertical search engine for DCI based on web
下载PDF
导出
摘要 为了解决用户能够快速、准确的搜索互联网上数字作品信息的问题,分析设计了一个对数字作品版权唯一标识符(Digital Copyright Identifier简称DCI)数字作品的垂直搜索引擎。首先基于Heritrix网络爬虫技术,对互联网上的数字作品进行数据采集和正文信息抽取,并将抽取的数据保存到本地;然后基于Lucene的全文检索工具包,对本地数据进行分词、倒排索引、索引检索和改进的相关度排序等处理,最终设计实现了一个通用可扩展的DCI垂直搜索引擎。实验结果表明,该搜索引擎在很大程度上提高了网页信息抽取的准确度和数据的检索效率。 In order to solve the users' problem for searching digital works information quickly and correctly, a vertical search engine about digital work's Copyright Identifier is analyzed and designed. In the first place, based on the Heritrix web crawler, the network digital work's data acquisition and text information extraction are presented and the extracted data is saved to the local; In the second place, on the basis of the Lucene's full-text retrieval toolkit, segmentation, inverted index, index retrieval and im- proved sorting algorithm technology are taken to handle the collected data. a general and extensible DCI vertical search engine is designed and achieved. The experimenal results show that this search engine does enhance web page information extraction accuracy and data indexing efficiency in great degree.
出处 《计算机工程与设计》 CSCD 北大核心 2013年第4期1481-1487,共7页 Computer Engineering and Design
基金 国家科技部支撑计划课题基金项目(2012BAH04f03)
关键词 数据采集 倒排索引 垂直搜索引擎 信息抽取 相关度排序 data acquisition inverted index vertical search engine information acquisition relevance sorting algorithm
  • 相关文献

参考文献8

二级参考文献23

共引文献55

同被引文献61

  • 1苏菲,王丹力,戴国忠.基于标记的规则统计模型与未登录词识别算法[J].计算机工程与应用,2004,40(15):43-45. 被引量:13
  • 2张茂元,卢正鼎,邹春燕.一种基于语境的中文分词方法研究[J].小型微型计算机系统,2005,26(1):129-133. 被引量:8
  • 3张李义,李亚子.基于反序词典的中文逆向最大匹配分词系统设计[J].现代图书情报技术,2006(8):42-45. 被引量:12
  • 4张体首,蔡明.语义搜索引擎概念模型[J].微电子学与计算机,2007,24(3):171-173. 被引量:10
  • 5杨丹波.应用Web数据挖掘的主题元搜索引擎设计与实现[D].北京:清华大学,2008.
  • 6Park K, Jee H, Lee T, et al. Automatic extraction of user's search intention from web search togs [J]. Multimedia Tools and Applications, 2012, 61 (1): 145-162.
  • 7Gupta V, Garg N, Gupta T. Search bot: Search intention based filtering using decision tree based technique [C] //Inter- national Conference on Intelligent Systems, Modelling and Simulation. IEEE, 2012 : 49-54.
  • 8Gu C, Zhang S, Xue X. Network intrusion detection based on improved proximal SVM [J]. Advances in Information Sciences and Service Sciences, 2011, 3 (4): 132-140.
  • 9Liu L, Yang S, Wang D. Particle swarm optimization with composite particles in dynamic environments [J]. IEEE Tran- sactions on Systems, Man, and Cybernetics, Part B: Cyber- netics, 2010, 40 (6): 1634-1648.
  • 10Ojeda F, Suykens J A K, De Moor B. Low rank updated LS- SVM classifiers for fast variable selection [J]. Neural Net- works, 2008, 21 (2).. 437-449.

引证文献7

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部