摘要
设计了一个基于W eb文本挖掘的分词系统。具体介绍了如何将H tm l格式的文档转化为Txt格式文本,以及利用M M法来实现对文档的汉语自动分词。并采用最大匹配加回退一字方法,处理交段长度为1的交集型歧义字段。
This paper designed and realized a Chinese automatic word-cut system in Web text mining. It begins to introduce a method of how to change Html form text into Txt form text in detail. Then, it analyzes the use of MM method to realize the Chinese automatic word-cut in Web text. Finally, this paper examines the method of the most match with back to a word to dispel word ambiguity.
出处
《三明学院学报》
2005年第2期197-200,共4页
Journal of Sanming University
关键词
文本挖掘
中文自动分词
消歧
text mining
Chinese automatic word-cut
dispel ambiguity