期刊文献+

一种适应短文本的相关测度及其应用 被引量:7

Relevancy Coefficient and Its Application Adapted to Short Texts
下载PDF
导出
摘要 针对博客社区和BBS论坛充斥Web垃圾信息的问题,提出相关度向量空间模型cVSM,并以此作为评论的特征,采用支持向量机分类算法自动识别垃圾评论。cVSM包括一种适合短文本的相关测度,用于衡量评论和文章的语义相关程度。在中文博客测试集和中文BBS测试集上的实验结果表明,相比纯粹使用评论文本特征的方法,应用该模型时F1至少提高6%。 A relevancy coefficient vectort space model named cVSM is proposed to aim at Web spams which flood in blogosphere and forums. The cVSM whose components are employed as features of comments and the support vector machine classification algorithms are used to automatically identify comment spams. The relevancy coefficient included in the cVSM is presented, which is used to evaluate relevancy grade of posts and comments. Chinese blog dataset and Chinese BBS dataset are tested. Experimental results show that compared with traditional method the FI has been improved at least 6% by this way.
作者 何海江
出处 《计算机工程》 CAS CSCD 北大核心 2009年第6期88-90,96,共4页 Computer Engineering
基金 长沙学院科研基金资助项目(CDJJ-07010110)
关键词 博客 垃圾评论 支持向量机 文本挖掘 相关测度 blog comment spam support vector machine text mining relevancy coefficient
  • 相关文献

参考文献6

  • 1Brooks C H, Montanez N. Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering[C]//Proc. of the 15th International Conference on World Wide Web. New York, USA: ACM Press, 2006: 625-632.
  • 2黄萱菁,夏迎炬,吴立德.基于向量空间模型的文本过滤系统[J].软件学报,2003,14(3):435-442. 被引量:92
  • 3Kolari E Detecting Spam Blogs: A Machine Learning Approach[C]//Proc. of the 21st National Conference on Artificial Intelligence. Maryland, USA: [s. n.], 2006: 1351-1356.
  • 4Niu Yuan. A Quantitative Study of Forum Spamming Using Context-based Analysis[C]//Proc. of the 14th Annual Network and Distributed System Security Symposium. San Diego, CA, USA: [s. n.], 2007: 79-92.
  • 5Hoad T, Zobel J. Methods for Identifying Versioned and Plagiarised Documents[J]. Journal of the American Society of Information Science and Technology, 2003, 54(3): 203-215.
  • 6代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究[J].中文信息学报,2004,18(1):26-32. 被引量:228

二级参考文献5

共引文献317

同被引文献61

  • 1黄永光,刘挺,车万翔,胡晓光.面向变异短文本的快速聚类算法[J].中文信息学报,2007,21(2):63-68. 被引量:17
  • 2王永恒,贾焰,杨树强.基于频繁词集聚类的海量短文分类方法[J].计算机工程与设计,2007,28(8):1744-1746. 被引量:6
  • 3Liu Yang, Huang Xiangji, An Aijun, et al.ARSA: A sentiment-aware model for predicting sales performance using blogs[C]//Proceedings SIGIR 2007,2007:607-614.
  • 4Brief summary of the workshop on new text Wikis and blogs and other dynamic text sottrces[C/OL]//EACL-2006,2006.http://www.sics. se/jussi/newtext/.
  • 5Pena-Shaff J B, Nicholls C.Analyzing student interactions and meaning construction in computer bulletin board discussions[J]. Computers & Education,2004,42:243-265.
  • 6Hatzivassiloglou V, McKeown K R.Predicting the semantic ori- entation of adjectives[C]//35th Annual Meeting of the Association for Computational Linguistics, 1997 : 174-181.
  • 7Zelikovitz S.Transductive LSI for short text classification problems[J/OL].American Association for Artificial Intelligence, 2004. http://www.aaai.org.
  • 8Zelikovitz S,Marquez F.Transductive learning for short-text classification problems using latent semantic indexing[J].Intemational Journal of Pattern Recognition and Artificial Intelligence, 2005, 19(2) : 143-163.
  • 9Turney P D,Thumbs up or thumbs down?Semantic orientation applied to unsupervised classification of reviews[C]///nstitute for Information Technology National Research Council of Canada, 2002.
  • 10Nasukawa T, Yi J.Sentirnent analysis: Capturing favorability using natural language processing[C]//K-CAP'03,0ctober,2003:70-77.

引证文献7

二级引证文献68

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部