摘要
提出了一种分组并具有三级索引结构的词库组织体系,给出了合适的索引密度间隔;针对系统基本词库的扩充问题,考虑了一种基于词频统计并具有过滤功能的关键词自动抽取和小词条添加方法。大量仿真实验结果表明,采用该方法可较大提高中文文本的切词速度及信息的查全查准率。
In this article, we'll give a method of organizing words library using three level index, and also give the appropriate index density interval; Aim at the expansion of words library, we consider the method of key words auto extraction and small words addition basing on word frequency statistics and having filtration function. A large number of simulation experiments show that this method can improve the speed of Chinese word segmentation and the recall ratio and precision ratio of information.
出处
《计算机应用研究》
CSCD
北大核心
2006年第8期49-51,共3页
Application Research of Computers
基金
国家自然科学基金资助项目(60473051)
关键词
中文切词
正向最大匹配
词库
索引密度
全文检索
Chinese Word Segmentation
Forward Maximum Method
Words Library
Index Density
Full-text Retrieval