摘要
在海量短文本中由于特征稀疏、数据维度高这一问题,传统的文本分类方法在分类速度和准确率上达不到理想的效果。针对这一问题提出了一种基于Topic N-Gram(TNG)特征扩展的多级模糊最小-最大神经网络(MLFM-MN)短文本分类算法。首先通过使用改进的TNG模型构建一个特征扩展库并对特征进行扩展,该扩展库不仅可以推断单词分布,还可以推断每个主题文本的短语分布;然后根据短文本中的原始特征,计算这些文本的主题倾向,根据主题倾向,从特征扩展库中选择适当的候选词和短语,并将这些候选词和短语放入原始文本中;最后运用MLFM-MN算法对这些扩展的原始文本对象进行分类,并使用精确率、召回率和F1分数来评估分类效果。实验结果表明,本文提出的新型分类算法能够显著提高文本的分类性能。
Due to the problems of sparse features and high data dimension in short text,traditional text classification methods cannot achieve the desired classification rate and accuracy.Aiming at this problem,we propose a multi-level fuzzy minimum and maximum neural network(MLFM-MN)short text classification algorithm based on topic N-Gram(TNG)feature extension.The algorithm first constructs a feature extension library and extends the features by using the improved TNG model.The extension library can not only infer the word distribution,but also infer the phrase distribution of each topic text,and then calculate these based on the original features in the short text.Appropriate candidate words and phrases are selected from the feature extension library according to topic tendencies,and put into the original text.Finally,the extended text objects are classified by the MLFM-MN algorithm.We use accuracy rate,recall rate and F1 score to evaluate the classification effect.The results show that the proposed algorithm can significantly improve text classification performance.
作者
文武
李培强
郭有庆
WEN Wu;LI Pei-qiang;GUO You-qing(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065;Research Center of New Communication Technology Applications,Chongqing University of Posts and Telecommunications,Chongqing 400065;Chongqing Xinke Design Co.Ltd.,Chongqing 401121,China)
出处
《计算机工程与科学》
CSCD
北大核心
2019年第11期2071-2078,共8页
Computer Engineering & Science