摘要
随着新兴技术与新的商业交易模式不断涌现,网络交易激增,网络交易评论也呈现出爆发式增长态势。针对大数据环境下网络评论文本空间高维的现象,提出借助商品标题和商品描述进行二重筛选的网络评论文本特征表示方法。该方法借助种子词而不是主题词典描述文本特征,降低了文档维度,减少了迭代次数,提高了在线评论文本分类的速度;同时,在文本映射时经过直接映射和间接映射二重筛选,减少了文本分类的疏漏,提高了文本分类的精度。
With the emergence of new technologies and new commercial transaction modes, the online transaction and the online transaction reviews has presented an unprecedented explosion growth. Aiming at the high dimensional phenomenon of the text space of online reviews in big data environment, this paper proposed a representation method, the text features of network reviews based on the double screening with the help of the title and the description of goods. The proposed method uses seed words to describe text features, does not need to use the theme dictionaries, thus lowers the dimension of the document, reduces the number of iterations, then improves the speed of the text classification of the online reviews;meanwhile, when mapping the text, namely direct mapping and indirect mapping, after the double screening, this method reduces the omissions in text categorization and improves the accuracy of text classification.
作者
王倩倩
陈康
WANG Qian-qian;CHEN Kang(Jinling Institute of Technology, Nanjing 210038, China)
出处
《金陵科技学院学报(社会科学版)》
2019年第1期56-60,共5页
Journal of Jinling Institute of Technology(Social Sciences Edition)
基金
金陵科技学院博士科研启动基金项目(jit-b-201622)
江苏高校哲学社会科学研究基金项目(2017SJB0488)
关键词
文本表示
种子词
词聚类
文本分类
降维
在线评论
文本特征
text representation
the seed word
word clustering
text categorization
dimension reduction
online reviews
text features