期刊文献+

基于层级类别信息的标题自动分类研究 被引量:6

Headlines automatic classification method based on hierarchical category information
下载PDF
导出
摘要 针对标题文本特征少、特征维度高和分布不均匀导致分类性能不佳的问题,提出了一种利用分类体系结构信息的双向特征选择算法,并在该算法的基础上实现标题分类。该方法以具有严格层级关系的分类体系为应用前提,利用类别与词的同现及分布关系进行特征词和候选类别的双向选择,构建类别向量空间;通过分析标题文本特征词在层级类别向量空间的分布所表现出的类别语义信息,确定文本所在层级以及所在层级的候选类别;之后利用分类器对未能成功分类的标题进行分类。在人工标引数据集上的实验结果表明,该方法在不进行语料扩展和外部知识库添加的基础上仍可有效地确定文本所在层级,实现多级学科的分类;并可在识别类别语义信息的基础上,降低候选类别数目,提高分类效率。 This paper proposed an efficient headlines classification method which used the structure of classification system, to solve the poor classification performance causing by headlines' feature fewer, high class feature dimensions and uneven distri- bution of the samples. This method was on the premise of strict hierarchy of classification system. First ,it used feature selection method based on hierarchical category information to build multilayer vector space. Second,it analysed feature word in the vector space distribution to determine which level the headlines located and which categorys the headlines belonged. At last, it used calssifier to classify the title which failed to classification. Experimental in artificial indexing data sets show that using multilayer vector space can effective determine which level the headlines locate on, realize classification at muhilayer level,improve headlines classification accuracy based on identifying category semantic information.
出处 《计算机应用研究》 CSCD 北大核心 2016年第7期2030-2033,共4页 Application Research of Computers
基金 省部级实验室/开放基金资助项目(B2014)
关键词 标题分类 特征选择 层级结构分类体系 同现分析 向量空间 headlines classification feature selection hierarchical classification system co-occurrence analysis vector space
  • 相关文献

参考文献19

  • 1Kim K,Chung B S,Choi Y R,et al.Semantic pattern tree kernels for short-text classification[C]//Proc of the 9th IEEE International Conference on Dependable:Autonomic and Secure Computing.[S.l.]:IEEE Press,2011:1250-1252.
  • 2王强,关毅,王晓龙.基于标题类别语义识别的文本分类算法研究[J].电子与信息学报,2007,29(12):2885-2890. 被引量:6
  • 3邱均平,赵岩杰,罗力.科学评价中的论文分类方法研究[J].情报学报,2011,30(5):554-560. 被引量:5
  • 4Kirange D K.Emotion classification of news headlines using SVM[J].Asian Journal of Computer Science & Information Technology,2013,2(5):104-106.
  • 5中国人民共和国国家质量监督检验检疫总局;中国国家标准化管理委员会.GB/ T13745-2009 学科分类与代码[S].北京:中国标准出版社,2009.
  • 6何力,贾焰,韩伟红,谭霜,陈志坤.大规模层次分类问题研究及其进展[J].计算机学报,2012,35(10):2101-2115. 被引量:14
  • 7姚长青,杜永萍.降维技术在专利文本聚类中的应用研究[J].情报学报,2014,33(5):491-497. 被引量:12
  • 8Salton G,Wong A,Yang C S.A vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613.
  • 9Liu Xueqing,Song Yangqiu,Liu Shixia,et al.Automatic taxonomy construction from keywords[C]//Proc of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2012:1433-1441.
  • 10Xu Yongdong,Quan Guangri,Xu Zhiming,et al.Research on text hierarchical topic identification algorithm based on the dynamic diverse thresholds clustering[C]//Proc of International Conference on Asian Language Processing.2009:206-210.

二级参考文献148

共引文献103

同被引文献52

引证文献6

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部