摘要
产品特征抽取是文本观点抽取和倾向性分析中的重要研究课题之一,提出了一种基于无监督学习的产品特征自动抽取方法。该方法从产品评论语句中抽取文本模式,以文本模式作为特征,将产品评论中所有的名词和名词短语(除产品名称)表示为向量,采用聚类算法将表示为向量的名词和名词短语聚为两类,以产品名称作为外部知识,利用表示"整体-部件"关系的文本模式识别产品特征集合。实验结果表明,该方法在电子产品领域的产品评论语料上取得了较好的实验效果。
The extraction of product feature is one of the important topics in text opinion extraction and sentiment analysis. This paper proposes a method based on unsupervised learning to extract product features. Text patterns are extracted from product review sentences; all the nouns and noun phrases(except product names)in product reviews are expressed as vectors by the feature set constructed by text patterns. All the nouns and noun phrases expressed as vectors are grouped into two sets. The product feature set is identified from the two sets by part-of relation text pat-terns with the help of product names. The experimental results indicate that, the method achieves good result in the corpus of electronic product reviews.
出处
《计算机工程与应用》
CSCD
2012年第10期160-163,共4页
Computer Engineering and Applications
基金
国家科技重大专项(No.2008ZX06315-001)