摘要
针对短文本的特点,提出一种自动识别短文本特征词的方法。根据短文本中的名词或动词之间语义相似度构造图的邻接矩阵,在图的邻接矩阵基础上提出一种候选特征词的特征度计算方法,选择特征度较大的一些词作为特征词。实验结果表明,所提出的特征提取方法较传统的特征提取方法更适合短文本的分类。
In view of the characteristics of short text, the paper proposes a method to automatically recognize short text feature words. According to adjacent matrix of semantic similarity structural graph between nouns or verbs from short text, on the basis of graph' s adjacent matrix, the paper proposes a feature degree calculation method for candidate feature words, which selects words of greater feature degrees as feature words. Experiment results show that the proposed feature extraction method is more suitable for short text classification compared to traditional feature extraction methods.
出处
《计算机应用与软件》
CSCD
北大核心
2014年第6期162-164,212,共4页
Computer Applications and Software
基金
河南省科技攻关计划项目(102102210509)
云南省科技计划项目(2011FZ074)
关键词
短文本
特征提取
连接强度
邻接矩阵
特征度
Short text Feature extraction Adjacent strength Adjacent matrix Feature degree