期刊文献+

结合本体筛选和文本挖掘的垂直搜索引擎研究 被引量:10

Research of Vertical Search Engine Incorporating with Ontology Filtering and Text Mining
下载PDF
导出
摘要 针对垂直搜索引擎研究领域的关键技术问题,提出了一个结合本体筛选和文本挖掘的垂直搜索引擎构建思想。首先探讨了作为研究基础的本体和文本挖掘技术,讨论了两者的作用;之后阐述了垂直搜索引擎构建的关键技术,包括基于本体筛选的智能搜索器、结合文本挖掘的网页信息分析及抽取、索引器及查询处理器的构造;最后,对提出的思想进行了实现验证,构造一个面向高校毕业生招聘的垂直搜索引擎原型。 This paper presents a construction method for vertical search engine utilizing ontology filtering and text mining towards existing problems in the domain. Firsdy, it discusses ontology and text mining as well as their appllcations. Then, we provide a set of key techniques for the construction of vertical search engine which include ontology-based Web crawling, Web page analyzing combined with text mining, indexer and searcher constructing. Finally, an evaluation of our proposed ideas is presented by implementing a prototype of job hunting search engine towards college students.
出处 《计算机科学》 CSCD 北大核心 2008年第2期188-190,共3页 Computer Science
基金 国家自然科学基金资助项目(编号60573084) 武器装备预研基金(9140A15050106HK0114)
关键词 垂直搜索 本体 本体筛选 文本挖掘 Vertical search, Ontology, Ontology filtering, Text mining
  • 相关文献

参考文献11

  • 1Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. China Machine process, 2004.
  • 2Hearst M A. Next Generation Web Search: Setting Our Sites. In: Luis Gravano , ed. IEEE Data Engineering Bulletin, Special issue on Next Generation Web Search, September 2000.
  • 3Broder A. A taxonomy of Web search. In; SIGIR Forum, 2002, 36(2); 3-10.
  • 4Gruber TR. A translation approach to portable ontology specification. Knowledge Acquisition, 1993, 5;199-220.
  • 5MIZOGUCHI R, IKEDA M. Towards Ontology Engineering. The Insti.tute of Scientific and Industrial Research, Osaka University, 1998.
  • 6曹勇刚 曹羽中 金茂忠 刘超.基于Web的双语本体构建系统[J].计算机科学,2005,32(9):60-63.
  • 7Tan Ah-Hwee. Text mining.. The state of the art and the challenges. In: Proceedings, PAKDD'99 Workshop on Knowledge discovery from Advanced Databases ( KDAD' 99 ), Beijing, April 1999. 71-76.
  • 8曹勇刚,曹羽中,金茂忠,刘超.面向信息检索的自适应中文分词系统[J].软件学报,2006,17(3):356-363. 被引量:48
  • 9曹勇刚,曹羽中,金茂忠,刘超.提取、索引和挖掘中文学术论文[J].南京大学学报(自然科学版),2005,41(z1):845-852. 被引量:1
  • 10Chang C, Hsu C, Lui S. Automatic information extraction from semi-structured Web pages by pattern discovery. Decis. Support Syst, 2003,35(1) : 129-147.

二级参考文献8

  • 1[1]Lawrence S. Online or invisible. Nature, 2001,411(6 837) :521.
  • 2[2]Giles C L, Bollacker K, Lawrence S. Digital Libraries and Autonomous Citation Indexing. IEEE Computer, 1999,32: 67~71.
  • 3[3]Bollacker K, Lawrence S, Giles C L. GteSeer: An Autonomous Web Agent for Automatic Retrieval and Identification of Interesting Publications. Proceedings of the Second International Conference on Autonomous Agents. ACM Press, 1998:116 ~123.
  • 4[4]Lawrence S, Bollacker K, Giles C L. Indexing and Retrieval of Scientific Literature. Eighth International Conference on Information and Knowledge Management, CIKM 99, 1999:139~146.
  • 5[9]Zhang H P, Yu H K, Xiong D Y, etal.HHMMbased Chinese Lexical Analyzer ICTCLAS. Preceedings of the 2nd SigHan Workshop, 2003:184~ 187.
  • 6[10]Khare R, Cutting D, Sitaker K, etal. Nutch: A Flexible and Scalable Open-Source Web Search Engin, CommerceNet Labs, CN - TR - 04 - 04,2004, 1~ 12.
  • 7[11]China National Knowledge Infrastructure( 中国期刊网 CNKI 数字图书馆), http://www. cnki.net/zyjs/zyjs02-2. htm, 2005.
  • 8张华平,刘群.基于N-最短路径方法的中文词语粗分模型[J].中文信息学报,2002,16(5):1-7. 被引量:99

共引文献47

同被引文献86

引证文献10

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部