摘要
提出一种分组并具有三级索引结构的词库组织体系,并就词库的扩充问题,考虑一种基于词频统计并具有过滤功能的关键词自动抽取和小词条添加方法。仿真实验结果表明,采用该方法可较大提高对中文文本的切词速度,保证系统具有较高的信息查全和查准率。
This paper gives a method of organizing words library using three level index ,and aims at the expansion of words library , it consider the method of key words auto extraction and small words addition basing on words frequency statistics and having filtration function. The simulation experiments show that this method can improve the speed of Chinese word segmentation and also improve the recall ration and precision ration of information.
出处
《计算机与数字工程》
2007年第7期47-49,共3页
Computer & Digital Engineering
基金
国家自然科学基金项目(编号:40344022)资助
关键词
词库
索引结构
中文切词
全文检索
Key words words library, index structure,Chinese word segmentation,full-text retrieval