期刊文献+

基于知识图谱扩展的短文本分类方法 被引量:5

Short Text Classification Based on Knowledge Graph Extension
下载PDF
导出
摘要 概念图谱是微软根据对用户搜索日志的统计分析构建的一个大型知识图谱。为了解决文本分类中短文本的数据稀疏、易受噪声影响和主题不明确等问题,本文提出了一种基于概念图谱的短文本语义扩展表示方法。首先,计算文本特征词与概念图谱中各概念的关联度,选取关联度高的概念构成当前文本的概念词典。然后,将概念词典加入特征词集合得到短文本的语义扩展表示。对来自Twitter的短文本进行了扩展前与扩展后的分类实验,实验涉及5种分类算法和6种关联度计算方法。结果显示,概念化语义扩展表示可以提高短文本的分类效果,且包含可以扩展的特征越多的文本,分类结果提升越显著。 The Concept Graph is a large-scale knowledge graph constructed by Microsoft based on statistical analysis of user search logs. In order to solve the problem of sparse data, vulnerability to noise, and unclear topic in short text classification, this paper proposes a short text semantic extension representation method based on the Concept Graph. Firstly, the relevance degree between the feature words and the concepts in the Concept Graph is calculated. Top k concepts with the highest relevance are selected as the concept dictionary of the current text. Then, the concept dictionary is combined with the feature words to obtain the semantic representation of the short text. Dataset from Twitter is adopted to evaluate our method. 5 classification algorithms and 6 correlation calculation methods are involved in the experiments. The experiment results show that the semantic representation through conceptualized extension can enhance the classification of short text. We also find the more the feature words that can be expanded in the short text, the better the classification result is.
作者 丁连红 孙斌 张宏伟 DING Lianhong;SUN Bin;ZHANG Hongwei(School of Information,Beijing Wuzi University,Beijing 101149,China)
出处 《情报工程》 2018年第5期38-46,共9页 Technology Intelligence Engineering
基金 北京市社会科学基金项目青年项目"社交电商中消费行为演化机制及引导措施研究"(17GLC066) 北京物资学院高级别培养项目(GJB20162002)
关键词 短文本分类 语义扩展 知识图谱 知识推理 Short text classification semantic extension knowledge graph knowledge inference
  • 相关文献

参考文献14

二级参考文献154

共引文献329

同被引文献88

引证文献5

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部