
二次剪枝算法在评论特征提取中的应用 被引量:2

Application of secondary pruning algorithm in commentary feature extraction
摘要 针对序列模式挖掘(GSP)算法在中文产品评论特征提取中准确率不够高的问题,提出了一种二次剪枝算法,即利用GSP算法产生候选特征集,然后采用词对共现度作为阈值对其进行进一步筛选,从而达到提高准确率的目的.利用定制化的爬虫工具从京东网站上抓取摄像头产品的中文评论,选取其中1 000条作为试验数据,采用分词工具ICTCLAS对评论进行分词和数据预处理,并将所提算法与GSP算法、交叉语言模型(CLM)和似然比检验(LRT)进行对比试验.结果表明,利用所提算法获得的中文产品评论特征提取准确率达到76.37%,较GSP算法、CLM和LRT的准确率分别提高2.94%,5.77%和7.57%. Aiming at the lowaccuracy rate of the generalized sequence pattern( GSP) algorithm on product feature extraction from Chinese online reviews,a secondary pruning algorithm is proposed.In this algorithm,based on the candidate collection of the output of the GSP algorithm,the term pair co-occurrence weight( TPCW) is used as the threshold for further filtering to improve the accuracy rate. The customized tools are used to crawl the product Chinese reviews of cameras from Jingdong website. 1 000 reviews are selected as the experimental data and the segmentation tool ICTCLAS is used on the word segmentation and data preprocessing. The proposed algorithm is compared with the GSP algorithm,the cross language model( CLM),and the likelihood ratio test( LRT). The results showthat the accuracy rate of the proposed algorithm on product feature extraction from Chinese online reviews is 76. 37%,which is higher than those of the GSP algorithm,CLMand LRT by2. 94%,5. 77% and 7. 57%,respectively.
出处 《东南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2016年第3期513-517,共5页 Journal of Southeast University:Natural Science Edition
基金 中央高校基本科研业务费专项资金资助项目 国家高技术研究发展计划(863计划)资助项目(2015AA015904)
关键词 特征提取 二次剪枝 词对共现度 似然比检验 交叉语言模型 feature extraction secondary pruning term pair co-occurrence weight likelihood ratio test cross language model
  • 相关文献



  • 1朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 2POPESCU A M,YATES A,ETZIONI Q.Class extraction from the World Wide Web[C] //Proc of AAAI-04 Workshop on Adaptive Text Extraction and Mining.San Jose,CA:American Association for Artificial Intelligence,2004:1-6.
  • 3HU Ming-qing,LIU Bing.Mining and summarizing customer reviews[C] //Proc of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2004:168-177.
  • 4LIU Bing,HU Ming-qing,CHENG Jun-sheng.Opinion observer:analyzing and comparing opinions on the Web[C] //Proc of the 14th International Conference on World Wide Web.New York:ACM Press,2005:342-351.
  • 5KOBAYASHI N,INUI K,MATSUMOTO Y,et al.Collecting evalua-tive expressions for opinion extraction[C] //Proc of the 1st International Joint Conference on Natural Language Processing.Berlin:Springer,2005:596-605.
  • 6POPESCU A M,ETZIONI Q.Extracting product features and opi-nions from reviews[C] //Proc of HLT-EMNLP.Morristown,NJ:Association for Compatational Linguistics,2005:339-346.
  • 7LIU Jian,WU Geng-feng,YAO Jian-xin.Opinion searching in multi-product reviews[C] //Proc of the 6th IEEE International Conference on Computer and Information Technology.Washington DC:IEEE Computer Society,2006:25-30.
  • 8SHI Bin,CHANG Kui-yu.Mining Chinese reviews[C] //Proc of the 6th IEEE International Conference on Data Mining.Washington DC:IEEE Computer Society,2006:585-589.
  • 9LITVIN S W,GOLDSMITH R E,PAN Bing.Electronic word-of-mouth in hospitality and tourism management[J].Tourism Management,2008,29(3):458-468.
  • 10PANG B O,LEE L,VAITHYANATHAN S.Thumbs up? sentiment classification using machine learning techniques[C] //Proc of Conference on Empirical Methods in Natural Language Processing.Morristown,NJ:Assuciation for Computational Linguistics,2002:79-86.












使用帮助 返回顶部