摘要
【目的】基于语义增量对向量空间模型文本分类方法进行改进,并进行实验验证。【方法】梳理目前文本表示中语义向量引入和改进的相关研究,提出文本的语义向量表示实现框架。根据主题词和词汇分别与领域本体中概念之间的映射关系,构建概念层次树和定位词汇,计算概念语义相似度,结合语义增量实现文本的语义向量构建。【结果】通过文本分类的对比实验发现,本文所提方法可行且有效,在宏平均准确率、宏平均召回率和宏平均F_1方面优于其他方法。【局限】在向量空间模型基础上的改进,语义信息的表达不够充分,应继续探索文本建模的真正语义化实现方法;应对多种类型数据进行实验验证,以提高方法的适用性。【结论】探索原始向量空间模型的语义化问题,对当前文本分类及其语义关联等研究具有现实意义。
[Objective] This paper improves the methods of text classification based on VSM using semantic increment, and the model is verified by experiments. [Methods] Combing the studies of semantic vector and its improvement in text representation, this paper improves VSM based on semantic increment, and proposes an implementation frame of semantic vector representation of texts. Furthermore, based on the mapping relationships between words and concepts in domain Ontology, the construction of concept hierarchy tree and words positioning are constructed, semantic similarity of concepts is calculated, and the semantic vector model of texts' representation is achieved. [Results] The comparative experiments of texts classification demonstrate that the proposed method is feasible and effective, and the performance of this method is better than traditional methods from the perspectives of Precison, Recall and F1-Measure. [Limitations] The description of text semantic information is not good enough, and it is necessary to explore the authentic semantic methods in text modeling. In addition, more comparative experiments on several datasets should be conducted in order to obtain more accurate results. [Conclusions] The semantic improvement on traditional VSM is explored which is important for further text classification and semantic association.
出处
《现代图书情报技术》
CSSCI
北大核心
2014年第10期49-55,共7页
New Technology of Library and Information Service
基金
国家自然科学基金青年项目"社会网络环境下基于用户-资源关联的信息推荐研究"(项目编号:71303178)
武汉大学人文社会科学研究项目"社会网络环境下基于关系社区发现的用户建模研究"(项目编号:274013)的研究成果之一
关键词
文本建模
语义向量空间模型
语义增量
语义相似度
Text modeling Semantic Vector Space Model Semantic increment Semantic similarity