期刊文献+

面向跨领域情感分类的特征选择方法 被引量:3

Feature Selection for Cross-Domain Sentiment Classification
下载PDF
导出
摘要 数据标记的难以获取使得跨领域适应成为一种有效的途径.然而情感分类具有较强的领域依赖性,利用传统的特征选择方法在原始领域构建的特征空间不能体现领域间的共性,难以适用于目标领域.为此,提出一种面向跨领域情感分类的特征选择方法(LLRTF),利用对数似然比选取在原始领域富有判别力的特征,并通过对照两个领域的统计信息,选出其中在目标领域影响较大的特征.基于该方法构建的公共特征空间,能减少领域间数据分布的差异.实验结果表明,LLRTF优于基准算法. The data is usually unlabeled in application, which makes the adaptation of cross-domain effective. However, the sentiment classification is domain-dependent. The feature space of source domain, .gotten by feature selection, can not represent the common character of both domains and is not suitable for the classification of target domain. Therefore, an approach of feature selection for cross-domain sentiment classification, Log-Likelihood Ratio-Term Frequency (LLRTF) is proposed. The log likelihood ratios (LLR) of features are computed in source domain, by which the discriminative feature space is gotten. Then, the statistic information term frequency of both domains is added to the LLR, and the features which are more important in target domain are selected. The feature space construction based on the LLRTF reduces the difference between source domain and target domain. The experimental result shows that the LLRTF is superior to the baselines.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2013年第11期1068-1072,共5页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.61273292,61273297) 国家863计划项目(No.2012AA011005) 安徽省自然科学基金项目(No.1208085QF122)资助
关键词 特征选择 跨领域 情感分类 Feature Selection, Cross-Domain, Sentiment Classification
  • 相关文献

参考文献16

  • 1Blitzer J , Dredze M , Pereira F. Biographies , Bollywood , Boom- Boxes and Blenders: Domain Adaptation for Sentiment Classification // Proc of the 45th Annual Meeting of the Association of Computa- tional Linguistics. Prague, Czech Republic, 2007:440-447.
  • 2Blitzer J, McDonald R, Pereira F. Domain Adaptation with Struc- tural Correspondence Learning//Proc of the Conference on Empiri- cal Methods in Natural Language Processing. Sydney, Australia, 2006:120-128.
  • 3Daum6 III H, Marcu D. Domain Adaptation for Statistical Classifiers. Journal of Artificial |nteUigence Research, 2006, 26 (1) : 101-126.
  • 4Tan Songbo, Wang Yuefen, Cheng Xueqi. An Efficient Feature Ranking Measure for Text Categorization//Proc of the ACM Sym- posium on Applied Computing. Fortaleza, Brazil, 2008:407-413.
  • 5Whitehead M, Yaeger L. Building a General Purpose Cross-Domain Sentiment Mining Model /! Proc of the WRI World Congress on Computer Science and Information Engineering. Los Augeles, USA, 2009 : 472-476.
  • 6Church K W, Hanks P. Word Association Norms, Mutual Information and Lexicography. Computational Linguistics, 1990, 16 ( 1 ) : 22-29.
  • 7Pan Weike, Zhong Erheng, Yang Qiang. Transfer Eearning for Text Mining//Aggarwal C C, Zhai Chengxiang, eds. Mining Text Data. Berlin, Germany : Springer-Verlag, 2012 : 223-257.
  • 8Pan S J, Ni Xiaochun, Sun Jiantao, et al. Cross-Domain Sentiment Classification via Spectral Feature Alignment//Proc of the 19th In- ternational Conference on World Wide Web. Raleigh, USA, 2010: 75 1-760.
  • 9Yoshida Y, Hirao T, Iwata T, et al. Transfer Learning for Multiple- Domain Sentiment Analysis-ldentifying Domain Dependent/Inde- pendent Word Polaritys//Proc of the 25th AAAI Conference on Ar- tificial Intelligence. San Francisco, USA. 2011 : 1286-1291.
  • 10Zhuang Fuzhen, Luo Ping, Shen Zhiyong, et al. Collaborative Dual-PLSA: Mining Distinction and Commonality across Multiple Domains for Text Classification// Proc of the 19th ACM Interna- tional Conference on Information and Knowledge Management. Toronto, Canada, 2010:359-368.

二级参考文献56

  • 1de Sa Marques J P. Pattern Recognition Concepts, Methods and Applications. Berlin, Germany: Springer-Verlag, 2002
  • 2Ganeshanandam S, Krzanowski W J. On Selecting Variables and Assessing Their Performance in Linear Discriminant Analysis. Australian Journal of Statistics, 1989, 31(3):433-447
  • 3Theodoridis S, Koutroumbas K. Pattern Recognition. 2nd Edition. New York, USA:Elsevier, 2003
  • 4Dougherty E R. Small Sample Issues for Microarray-Based Classification. Comparative and Functional Genomics, 2001, 2 (1) : 28-34
  • 5Dougherty E R, Shmulevich I, Bittner M L. Genomic Signal Processing: The Salient Issues. EURASIP Journal on Applied Signal Processing, 2004, 4(1): 146-153
  • 6Kim S, Dougherty E R, Barrera J, et al. Strong Feature Sets from Small Samples. Journal of Computational Biology, 2002, 9 (1): 127-146
  • 7Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, USA: Springer-Verlag, 2001
  • 8Webb R A. Statistical Pattern Recognition. New York, USA: John Wiley & Son, 2002
  • 9Dudoit S, Fridlyand J, Speed T P. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of the American Statistical Association,2002, 97(457):77-87
  • 10Adam B L, Vlahou A, Semmes O J, et al. Proteomic Approaches to Biomarker Discovery in Prostate and Bladder Cancers. Proteomics, 2001, 1(10): 1264-1270

共引文献94

同被引文献25

引证文献3

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部