摘要
针对微博消息往往会不同程度表现出性别倾向性的特点,从消息内容挖掘的角度出发提出了一种基于粗糙集的微博用户性别识别算法。设计了一种基于容差粗集的微博消息表示模型(TRSRM),有效地刻画微博消息的性别特征。实验结果表明,在1000个真实微博用户的微博消息的测试集下,所提模型的准确率比特征项频数表示模型平均提高了7%,取得了更好的识别效果。
Concerning gender tendency hidden in microblog messages posted by microblog users, a novel approach based on rough set theory was proposed to identify microblog user gender. In the proposed approach, a new Representation Model based on Tolerance Rough Set (TRSRM) was devised, which can effectively represent gender characteristics of microblog messages. The experimental results show that the accuracy rate of the proposed approach is 7% higher than frequency model approach by testing messages of 1000 real microblog users, and so the TRSRM achieves better recognition performance.
出处
《计算机应用》
CSCD
北大核心
2014年第8期2209-2211,共3页
journal of Computer Applications
基金
教育部人文社会科学研究青年基金资助项目(12YJCZH074)
福建省教育厅科技项目(JA13077)
关键词
微博挖掘
性别识别
粗糙集
K近邻分类器
网络安全
microblog mining
gender identification
rough set
k-Nearest Neighbor (kNN) classifier
network security