摘要
随着电子商务业务的迅猛发展,基于用户网上评论的文本研究也成为热点课题.用户在进行购买决策时,不仅需要了解该商品的整体评价,同时需要知道商品各个特征的情感态度倾向,故文章的目的在于研究在线评论中产品特征的自动提取的问题.实验选择满足BNP(base noun phrase)模式的N-Gram作为候选项,并利用N-Gram的边界平均信息熵的指标以及子串依赖关系对候选项进行过滤,提取最终的产品特征.与仅采取BNP模式直接作为产品特征的参照条件相比,当前方法选取的过滤条件可以有效提高产品特征提取的准确率.文中的方法不依赖于外部的领域语料且不需进行人工干预,其最终输出的结果具有子串依赖的层次性,可以作为领域知识构建的有效的参考数据结构.
With the rapid development of e-commerce business, the research of text mining with online reviews has become a prevalence topic. While an end-user is making a purchasing decision, he is not only interested in whether the product is recommended, he also cares about the sentiment orientation corresponds to the product's detailed features. So this paper aims to solve the problem of automatically extracting the products features of the online reviews. In his paper, we choose the N-Grams that are in the pattern of BNP (base noun phrase) as candidate feature items. Additionally, we take advantage of the boundary average entropy of N-Grams and the substring dependency relationships among the items to filter the result. Referring to the final experiment outcomes, we conclude that the current filtering condition improves the accuracy of the result comparing with the baseline method, which directly designate the BNP as feature items. The current method does not rely on the outside domain corpus for training and is free from manual intervention. Also, one more meaningful aspect of the research is that the output result is in a hierarchical presentation of tree form and it will be beneficial for the further research oil the construction of domain knowledge ontology as a nice reference data structure.
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2016年第9期2416-2423,共8页
Systems Engineering-Theory & Practice
关键词
在线评论
产品特征
边界平均信息熵
online reviews
product feature
boundary average entropy