摘要
文本分类是处理电子可读文本的重要手段,本文提出了基于标题的文本分类机制.其基本思想是:鉴于文本标题的重要性和简洁性,利用汉语语义分类树寻求概念上的扩充,利用语料库的关联矩阵,进行关联扩充,以丰富标题的语义内涵,从而获取较高精度的文本分类结果.该方法不依赖于汉语分析器和相应的领域知识库,速度较快,应用面较广.
Text classification plays an important role in processing readable online texts. Text classification approach based on text titles is presented. Its main idea is shown as follows: considering the significance and concision of text titles, Concept expansion is performed with Chinese semantic classified tree; and association expansion is executed with the associated matrix derived from corpus. These expansions aim at enriching the meanings of text titles in synonymous and collocation relationships. The similarities between expanded feature vectors of classes and that of titles are used to determine the classes which texts belong to. It is independent of Chinese parser and domain knowledge bases, and it is easy to apply in wide range and its speed is fast.
出处
《小型微型计算机系统》
CSCD
北大核心
2005年第5期732-734,共3页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目 (60 3 73 0 95 )资助
关键词
文本分类
概念扩充
关联扩充
向量空间模型
text classification
conceptual expansion
associated expansion
vector space model