摘要
针对现有的空间向量模型在进行文档表示时忽略词条之间的语义关系的不足,提出了一种新的基于关联规则的文档向量表示方法。在广义空间向量模型中分析词条的频繁同现关系得到词条同现语义,根据关联规则分析词条之间的关联相关性,挖掘出文档中词条之间的潜在关联语义关系,将词条同现语义和关联语义线性加权对文档进行表示。实验结果表明,与BOW模型和GVSM模型相比,采用关联规则文档向量表示的文档聚类结果更准确。
For the existing vector space model to omit making insufficient semantic relationships between terms in documents representation, this paper proposes a novel document vector representation approach based association relationship.In terms of generalized vector space model, it captures the frequent co-occurrence semantic relations between terms, and then analyzes the correlation between related terms based on association rules, digging out the potential relevance of semantic relationships between terms in the document. It represents documents with linear weighting co-occurrence semantic relations with association semantic. Experimental results show that, compared with the BOW model and GVSM model, the clustering results using association rules document vector represented are more accurate.
出处
《计算机工程与应用》
CSCD
北大核心
2016年第7期86-90,共5页
Computer Engineering and Applications
基金
国家青年科学基金(No.61003162)
辽宁省教育厅一般项目(No.L2013131)
关键词
文档聚类
关联关系
词条同现
文档相似度
潜在语义
document clustering
association
terms co-occurrence
document similarity
latent semantic