摘要
为了实现电子商务和社交网络中文在线评论有效性的自动化检测,提出了一种单一主题环境下基于逻辑回归的垃圾评论检测模型.中文在线评论有效性的检测可以归结为分类问题,结合中文在线评论的特点提取了9个特征以构建分类模型;为获取核心特征主题的相关度,采用基于关联规则的评论名词模式优化了ICTCLAS中文分词系统的主题识别,进而利用交叉语言模型获取在线评论主题相关度.实验中采取了人为标定的1 000条评论作为样本,把支持向量机分类模型作为对比进行试验,利用数据挖掘工具Weka进行计算.结果表明,采用优化评论名词模式下基于逻辑回归的垃圾评论检测模型结果的准确率达到83.54%,比支持向量机分类模型计算得到的准确率高2.10%.
In order to realize automated detection of the effectiveness of Chinese online reviews in the context of e-commerce and social networks,a spam detection model based on logistic regression to solve single topic classification problem is proposed. The detection of effectiveness of Chinese online reviews can be regarded as a classification problem. According to the characteristics of Chinese online reviews,nine features are extracted to build the classification model. In order to extract the core feature-topic relevance,an association rule based reviewterm mode is utilized to optimize the topics identification in ICTCLAS( Institute of Computing Technology,Chinese Lexical Analysis System). The cross language model is then used to retrieve relevancy between online reviewtopics. In the experiment,a sample of 1 000 human-labeled reviews is used,and the support vector machine( SVM) classification model is adopted as a comparison. The calculation results of the data mining tool Weka demonstrate that the accuracy rate of the proposed logistic regression classification model based on the optimized reviewterm classification mode is 83. 54%,which is 2. 10% higher than that of the SVM classification model.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2015年第3期433-437,共5页
Journal of Southeast University:Natural Science Edition
基金
国家自然科学基金资助项目(60803057)
国家高技术研究发展计划(863计划)资助项目(2015AA015904)
关键词
在线评论有效性
逻辑回归
关联规则
effectiveness of online review
logistic regression
association rule