摘要
对中文文本挖掘中的词汇处理技术进行了较深入的探讨 ,提出了针对汉语语言特点的无词典分词算法。该算法基于“找最长字共现”的原则 ,可以准确地将文本中的词汇切分出来。
The dealing technology of words in Chinese text mining is discussed,and an arithmetic of 'No Dictionary Cutting Word' is brought forward. This arithmetic which is based on finding all maximal frequent sequences in text can cut words accurately.
关键词
文本挖掘
中文分词
无词典分词
text mining
cutting Chinese word
no dictionary cutting word