摘要
文本分类是海量文本组织和管理的重要方法,文章提出了基于段落匹配的文本分类机制。其基本思想是:对于文本特征向量进行概念扩充,减少特征项之间的相关性,增强特征项的表现能力。选取文本段落作为分类的基本要素,通过段落匹配的约束,防止由发散特征引起的假相关现象,从而获取较高精度的文本分类结果。
Text categorization plays an important role in organizing and managing the huge amount of texts.The mecha-nism for text categorization based on passages is presented.It applies the concept and association expansion to text fea-ture vectors in order to reduce the relevant degree of terms and enhance the ability to represent the text theme.In ad-dition,it selects the passages of texts as the basic elements for text categorization in order to avoid the phenomena of false correlation between texts and classes.As a result,it can make high categorization precision,and it is independent of Chinese parser and domain knowledge bases,and it is easy to apply in wide range and its speed is fast.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第28期174-176,共3页
Computer Engineering and Applications
关键词
文本分类
概念扩充
段落匹配
text categorization,conceptual expansion,passage match