期刊文献+

基于词向量及术语关系抽取方法的文本分类方法 被引量:1

Text Categorization Method Based on Relation Extraction of Word Vector and Term
下载PDF
导出
摘要 针对当前中文文本特征词提取不完备以及由于互联网海量文本呈现新特征而导致文本分类不准确的问题,提出基于词向量及术语关系抽取方法的文本分类方法。考虑了词语之间的语义关联关系,将神经网络训练得到的向量空间模型与卡方检验算法结合,形成基于词向量的文本特征选择方法,扩充特征词集合,构成候选术语网络;然后根据特征词之间的位置关系、词汇信息特征考察特征词之间的内部结合紧密度;再次,采用词语的左熵或右熵规则实现术语抽取,形成特定学科领域内能够反映文本表示的特征词抽取方法;最后通过卷积神经网络来判断文本的类别。实验表明,对特征词按照一定规则的扩充,能够使扩充后的特征词集合更具有表征能力,在分类过程中提供更准确的信息;按照特征词内部结合紧密度进行术语的抽取,能够更有效表达文本的主题,提高分类的精度。 In view of the problems of incomplete extraction of Chinese text feature words and the inaccuracy of text categorization due to the new features of mass text in the Internet,a text categorization method based on word vector and term relation extraction is proposed.The semantic association between words is considered,and the vector space model trained by neural network is combined with the chi square test algorithm to form a text feature selection method based on word vector,which extends the set of feature words to form a candidate term network.Then,according to the position relationship and the lexical information characteristics,the characteristics of the feature words are investigated.Moreover,the term extraction is realized by using the left entropy or the right entropy rule of words to form the extraction method of feature words which can refl ect the text representation in a particular subject area.Finally,the categorization of text is judged by the convolution neural network.The experiment shows that the expansion of characteristic words can make the extended feature word set more expressive,and provide more accurate information in the categorization process.The extraction of terms according to the internal tightness of the feature words can more effectively express the main text of the text and improve the accuracy of the categorization.
作者 侯庆霖 HOU Qinglin(GCI Science&Technology Co.,Ltd.,Guangzhou 510310,China)
出处 《移动通信》 2018年第7期12-17,23,共7页 Mobile Communications
关键词 文本分类 特征选择 词向量 术语关系抽取 text categorization feature selection word vector relation extraction of term
  • 相关文献

同被引文献7

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部