期刊文献+

面向短文本分类的特征提取与算法研究 被引量:2

Research on different feature extraction and algorithms for ultra-short text classification
下载PDF
导出
摘要 近年来以大数据为中心的人工智能技术得到蓬勃发展,自然语言处理成为了人工智能时代最突出的前沿研究领域之一。然而,在自然语言处理领域的短文本分类中,不同的特征提取方法与机器学习算法集成时,处理效果差异明显。针对短文本分类精度较低的问题,基于组合的方式和预设的评价指标,通过将不同特征提取方法与不同机器学习算法进行组合,探究其在超短文本分类中的效果以寻求最优组合模型进而获得最佳分类效果。实验结果表明,在所选取的四种最优组合方法中,以词频-逆文件频率为特征提取方法、以逻辑回归为算法的组合模型在公开数据集中取得最好的实验效果,精度为92. 13%,查全率为90. 12%,适合应用于超短文本的分类应用场景。 In recent years,artificial intelligence technology centered on big data has been booming,natural language processing has become one of the most prominent frontier research areas in the era of artificial intelligence.However,in the short text classification of natural language processing,when different feature extraction methods are integrated with machine learning algorithms,the processing effects are significantly different.For the problem of low precision of short text classification,this paper combines different feature extraction methods with different machine learning algorithms based on the combination method and preset evaluation indicators to explore its effect in ultra-short text classification to seek the most excellent combination model to get the best classification effect.The experimental results show that among the four optimal combination methods selected,the method that the word frequency-reverse file frequency is used as the feature extraction method and the logistic regression algorithm is used as the combined model can obtain the best experimental results in the public data set with an accuracy of 92.13%, the recall rate is 90.12%,which is suitable for the classification application scene of ultra- short text.
作者 刘晓鹏 杨嘉佳 卢凯 田昌海 唐球 Liu Xiaopeng;Yang Jiajia;Lu Kai;Tian Changhai;Tang Qiu(National Computer System Engineering Research Institute of China,Beijing 100083,China;Information Research Center of Military Science,PLA Academy of Military Science,Beijing 100142,China)
出处 《信息技术与网络安全》 2019年第5期48-52,共5页 Information Technology and Network Security
关键词 自然语言处理 文本分类 超短文本 natural language processing text classification ultra short text
  • 相关文献

同被引文献12

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部