期刊文献+

中文微博作者身份识别研究 被引量:9

Authorship Attribution of Chinese Microblog
下载PDF
导出
摘要 本文针对中文微博篇幅短小、无间隔标记等特点,建立了由词汇特征、浅层句法和深层句法特征集组成的中文微博作者文体特征模型,选取支持向量机、序列最小优化支持向量机、朴素贝叶斯和决策树算法在公开微博语料上进行算法对照实验、特征集组合实验和各组文体特征的作者身份识别实验。实验结果验证了本文模型在中文微博作者身份识别任务中的高准确率、召回率和时间效率。 In order to meet the the urgent demand of Chinese Microblog authorship attribution,we established a multidimensional stylistic features model consists of the lexical features, shallow syntactic features and deep syntactic features. This Chinese Microblog stylistic features model has been verified through control experiments and grouping experiments using LibSVM, Sequential Minimal Optimization SVM, Naive Bayesian and Decision Tree algorithm on public microblog corpus. The experimental outcome verified the contribution of each feature-dimension and the good performance of our model in the precision, recall and computing time.
出处 《情报学报》 CSSCI CSCD 北大核心 2017年第1期72-78,共7页 Journal of the China Society for Scientific and Technical Information
基金 国家社会科学基金一般项目(15BYY028) 国家教育部回国人员科研启动基金(教外司[2015]1098) 教育部人文社科青年基金项目(11YJCZH131) 大连外国语大学科研项目(2013XJQN20 2014XJQN15)
关键词 中文 微博 作者身份识别 Chinese microblog authorship attribution
  • 相关文献

参考文献2

二级参考文献32

  • 1李贤平.《红楼梦》成书新说[J].复旦学报(社会科学版),1987,29(5):3-16. 被引量:66
  • 2武晓春,黄萱菁,吴立德.基于语义分析的作者身份识别方法研究[J].中文信息学报,2006,20(6):61-68. 被引量:25
  • 3孙晓明,马少平.基于写作风格的作者识别[C]//中国中文信息学会第五届全国会员代表大会暨成立二十周年学术会议论文集.北京:清华大学出版社,2001.
  • 4Efron B, Thisted R. Estimating the Number of Unseen Species: How Many Words did Shakespeare Know? [ J ]. Biometrika, 1976, 63(3) :435 -447.
  • 5De Vel O, Anderson A, Corney M, et al. Mining E - mail Content for Author Identification Forensics [ J]. ACM S1GMOD Record, 2001,30(4) :55 -64.
  • 6Zheng R, Li J, Huang Z, et al. A Framework for Authorship Identi- fication of Online Messages: Writing - style Features and Classifi- cation Techniques[ J ]. Journal of the American Society for Informa- tion Science and Technology,2006,57 ( 3 ) : 378 - 393.
  • 7Abbasi A, Chen H. Identification and Comparison of Extremist - group Web Forum Messages Using Authorship Analysis [ J ]. IEEE Intelligent Systems,2005,20 ( 5 ) : 67 - 75.
  • 8Holmes D I,Forsyth R S. The Federalist Revisited:New Directions in Authorship Attribution [ J ]. Literary and Linguistic Computing, 1995,10(2) :111 - 127.
  • 9Juola P, Baayen H. A Controlled Corpus Experiment in Authorship Identification by Cross -entropy[ J]. Literary and Linguistic Com- puting,2005,20(S) :59 -67.
  • 10Abbasi A, Chen H. Writeprints:A Stylometric Approach to Identity -level Identification and Similarity Detection in Cyberspace [ J ]. ACM Transactions on Information Systems ,2008,26 (2) :1 -29.

共引文献22

同被引文献19

引证文献9

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部