基于统计语义方法的文本分类

Text Categorization Based on Statistic Semantic Theory

下载PDF

导出

摘要文本分类是指在给定的分类体系下,根据文本的内容自动地确定文本所属的类别。与当前的文本分类技术相比,统计语义方法描述了语义元的相互关系,定义了语义元间的亲和力、语义元集的松散度等。基于上述定义,给出了一种选取关键词集的方法,并用所获得的关键词集构造了关键词集树,完成了映射类别未知的文本的词集到关键词集树的分类过程。 Text Categorization means under a defined classifying system, texts are classified automatically according to the contents. Compare with the traditional technologies of text categorization, the statistic semantic theory describes the relationship between semantic elements and defines the affinity between semantic elements and the loose degree of the semantic density. Based on the above definition, a way of selecting a key-word group, therefore a key-word group tree is presented, leading to a classifying process reflecting the word group whose category is unknown to a key-word group tree.

作者钱民郭祥文

机构地区昆明冶金高等专科学校计算机与信息工程系中国农业银行云南省分行科学技术处

出处《昆明冶金高等专科学校学报》 CAS 2004年第4期11-15,共5页 Journal of Kunming Metallurgy College

关键词文本分类统计语义方法亲和力松散度关键词集关键词集树 text categorization statistic semantic theory affinity loose degree key-word group tree

分类号 TP274 [自动化与计算机技术—检测技术与自动化装置]