期刊文献+

基于关键词筛选分词算法的企业级搜索引擎

Enterprise Search Engine Based on Keyword Selected Split-word Algorithm
下载PDF
导出
摘要 随着计算机技术与数据库学科不断发展,数字化信息已经成为当今存储数数据的首要选择,并且借助大型搜索引擎,使用户可以快速找到对应信息。应用于企业级的高效搜索引擎成为当前研究的重要课题。本文提出了基于关键词筛选KWS(Key Word Selection)的搜索引擎机制,针对电网与大型发电厂智能管理系统的数据结构,通过构建双字哈希词典和双字耦合消歧分词与结果的语义筛选,将筛选后的分词结果放入Sphinx和MySQL数据库进行全文搜索并加以缓存,既提高了搜索速度又提高搜索的准确度。 With the development of computer science and database subject,digital information is becoming the first choice of data forms.Nowadays,with the help of large scale search engine,users could find valuable information rapidly.The improvement of search engine with high efficiency which used in enterprise is now a hot subject.This paper describes a search engine based on Keyword Selection(KWS) which aimed to enterprise data structure.By using dictionary based on Hash Structure and measures of Coupling Degree of Double Characters,keyword strings would be splitted into pieces and results would be cached as well.Meanwhile,Sphinx and MySQL database ensure high accuracy and quick response.
作者 吴亮 李树广
出处 《微型电脑应用》 2010年第7期37-40,5,共4页 Microcomputer Applications
基金 上海电力电网系统重点攻关项目(编号08100542)
关键词 企业级数据库 哈希结构 双字耦合 分词算法 缓存 Enterprise Search Engine Hash Structure Coupling Degree of Double Characters Cache
  • 相关文献

参考文献5

二级参考文献19

  • 1黄萱菁,吴立德,王文欣,叶丹瑾.基于机器学习的无需人工编制词典的切词系统[J].模式识别与人工智能,1996,9(4):297-303. 被引量:24
  • 2孙茂松,黄昌宁,邹嘉彦,陆方,沈达阳.利用汉字二元语法关系解决汉语自动分词中的交集型歧义[J].计算机研究与发展,1997,34(5):332-339. 被引量:66
  • 3孙膑.现代汉语文本的词语切分技术.[2007-12-01].http://www.tinko.com/Lunwen/86087.htm.
  • 4殷人昆.数据结构(用面向对象方法与C++描述).北京:清华大学出版社,2005:344-388.
  • 5Sartaj Sahni. Data Structures Algorithms and Application in C++.北京:机械工业出版社,2006:218-243.
  • 6Ando R,Lee L.Mostly-Unsupervised Statistical Segmentation of Japanese[J].Application to Kanji ANLP-NAACL,2000:145~148.
  • 7Sproat R., Shih C.L.. A statistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages, 1993, 4(4): 336~249
  • 8Sun Mao-Song, Shen Da-Yang, Tsou B K. Chinese word segmentation without using lexicon and hand-crafted training data. In: Proceedings of the 36th Annual Meeting of Association of Computational Linguistics and the 17th International Conference on Computational Linguistics, Montreal, Canada, 1998, 1265~1271
  • 9Nie J.Y., Jin W.Y.. A hybrid approach to unknown word detection and segmentation of Chinese. In: Proceedings of International Conference on Chinese Computing, Singapore, 1994, 405~412
  • 10Church K.W., Gale W., Hanks P., Hindle D.. Using statistics in lexical analysis. In: Zernik U. ed.. Lexical Acquisition: Exploiting On-line Resources to Build a Lexicon. Hillsdale NJ: Lawrence Erlbaum Associates, 1991, 115~164

共引文献105

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部