摘要
针对中文文本的自动分类问题,提出了一种逆向匹配算法。该算法的基本思路是构造一个带权值的分类主题词表,然后用词表中的关键词在待分类的文档中进行逆向匹配,并统计匹配成功的权值和,以权值和最大者作为分类结果。本算法可以避开中文分词的难点和它对分类结果的影响。理论分析和实验结果表明,该技术分类结果的准确度和时间效率都比较高,其综合性能达到了目前主流技术的水平。
Concerning Chinese text categorization, a reverse matching algorithm was proposed. The basic idea was to construct a weighted value of classification subject terms list firstly, then use keywords in the list to reverse match in documentations. After that, the sum of weights of these key words that had been matched successfully was calculated, in the end the maximum was taken as the result of the classification. The algorithm can avoid the difficulty of Chinese word segmentation and its influence on accuracy of result. Theoretical analysis and experimental results indicate that the accuracy and the time efficiency of the algorithm are higher, whose comprehensive performance reaches to the level of current major technology.
出处
《计算机应用》
CSCD
北大核心
2008年第4期945-947,共3页
journal of Computer Applications
基金
国家自然科学基金资助项目(60673193)
湖南省教育厅一般项目(07C750)
湖南省教育厅划块项目(06C870)
关键词
文本分类
逆向匹配算法
增益权值
主题词表
text categorization
reverse matching algorithm
gain weight
subject terms list