摘要
为增强向量空间模型(VSM)中项的语义描述性,克服VSM中各语义单元相互独立的缺陷,提出一种基于短语的特征粒度描述方法。该方法从文本的表示及特征项之间的组织方式入手,通过句法规则识别基本短语,构建特征与中心动词的关系树,利用基本短语代替BOW中的词。实验结果表明,采用基本短语的文本表示可提高分类的性能,增加项之间的联系,克服特征项相互独立的缺陷,在特征数量较少的情况下仍能保持良好的分类效果。
In order to improve the semantic description of items, and minify impact by mutual independence of terms in Vector Space Model (VSM), this paper proposes a phrase-based text representation. This model analyzes the relationship of the feature items, recognizes basic phrases by development of syntactic rules, and forms the related tree which contains feature items and head verb. It uses phrase-based to describe text instead of words in BOW, thereby the shortcoming of mutual independence is overcome. Experimental result indicates that the new approach improves the performance of the classifier, increases links between terms, and keeps classifying texts correctly, even if the number of feature items is small.
出处
《计算机工程》
CAS
CSCD
北大核心
2011年第3期58-60,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60873247)
山东省高新自主创新专项工程基金资助项目(2008ZZ28)
关键词
特征项
短语
句法规则
关系树
文本表示
feature item
phrase
rules of syntactic
relationship tree
text representation