摘要
本文对中文文本挖掘中的词汇处理技术进行了较深入的探讨,提出了针对汉语语言特点的发现所有最长频繁序列的算法.该算法基于"找最长字共现"的原则,可以准确地将文本中的词汇切分出来.
The dealing technology of word in Chinese text mining was discussed in this article. The arithmetic of finding all maximal frequent sequences in Chinese text was put forward. This arithmetic which can cut words accurately was based on appearing of all maximal frequent sequences in texts at same time.
出处
《中央民族大学学报(自然科学版)》
2004年第1期36-42,共7页
Journal of Minzu University of China(Natural Sciences Edition)