摘要
针对当前垃圾邮件账户撰写虚假在线评论,降低评论网站可信度的问题,提出一种基于自然语言处理和机器学习的短文本作者识别算法,该算法将自然语言处理技术(Natural Language Processing,NLP)与不同的机器分类器相结合,根据多个不同的语言特征解决了简短嘈杂的评论文本的作者识别问题.实验结果表明,相对于基线模型而言,本文算法在引入NLP技术后,仅采用一元语法和一元与二元语法相结合的两个N-gram模型的分类精度均有明显提高,充分说明本文算法的有效性.
In order to reduce the credibility of comment websites,an author identification algorithm of short text based on natural language processing and machine learning has been proposed.This algorithm combines natural language processing(NLP)with different machine classifiers,and solves the author recognition problem of short and noisy comment text according to different language features.The experimental results show that,compared with the baseline model,the proposed algorithm combines with either unigram only or both unigram and bigram by introducing NLP technology,and the classification accuracy is significantly improved,which fully shows the effectiveness of the algorithm.
作者
吴桂玲
WU Gui-ling(College of Information Engineering, Xinyang Agriculture and Forestry University, Xinyang Henan 464007, China)
出处
《西南师范大学学报(自然科学版)》
CAS
2021年第1期32-37,共6页
Journal of Southwest China Normal University(Natural Science Edition)
基金
河南省科技攻关计划项目(182102210533).