期刊文献+

基于Web日志的性格预测与群体画像方法研究 被引量:10

Personality Prediction and Group Profiling Method Based on Web Log
下载PDF
导出
摘要 提出一种用户性格预测与群体画像方法。该方法将数据挖掘、机器学习和画像技术相结合,首先改进了传统TF-IDF算法没有考虑文章结构的问题,提高网页主题挖掘的准确率;其次根据大五类性格构建“性格-主题-关键词”(PTK)模型,归结不同用户的兴趣属性特征和性格属性特征,并结合用户的基础属性对用户进行综合画像;然后运用K-means方法将拥有相同属性特征的人群进行聚类,描绘在社会中拥有相似特征人群的群体面貌;最后通过实验证明,该方法使用改进的TF-IDF方法对网页文本进行挖掘效果要优于LDA主题模型,而且可以有效对用户的性格进行预测与群体画像。 A method of user personality prediction and group profiling was proposed.Data mining,machine learning and user profiling techniques were combined.Firstly,the problem of article structure not considered in traditional TF-IDF algorithm was solved,and the accuracy rate of topic mining was improved.Secondly,the“personality-theme-keywords”(PTK)model was constructed according to the big five character.The comprehensive profiling of the user was formed according to the user’s interest attribute and personality attribute.Finally,the K-means method was used to cluster the groups with the same attribute charactics and describe the group appearance of the groups with similar characteristics in the society.In addition,experiments showed that the improved TF-IDF method was better than LDA topic model for web text mining,and the user’s personality was effectively predicted and the group profiling was effectively formed.
作者 康海燕 李昊 KANG Haiyan;LI Hao(School of Information Management,Beijing Information Science and Technology University,Beijing 100192,China;School of Computer Science,Beijing Information Science and Technology University,Beijing 100192,China)
出处 《郑州大学学报(理学版)》 CAS 北大核心 2020年第1期39-46,共8页 Journal of Zhengzhou University:Natural Science Edition
基金 北京信息科技大学科研水平提高项目(5211910933) 国家自然科学基金项目(61370139)
关键词 WEB日志 数据挖掘 用户画像 性格预测 TF-IDF K-MEANS Web logs data mining user profile personality prediction TF-IDF K-means
  • 相关文献

参考文献12

二级参考文献161

  • 1戴晓阳,吴依泉.NEO-PI-R在16~20岁人群中的应用研究[J].中国临床心理学杂志,2005,13(1):14-18. 被引量:21
  • 2王晓晔,王正欧.K-最近邻分类技术的改进算法[J].电子与信息学报,2005,27(3):487-491. 被引量:24
  • 3王国胤,Rough集理论与知识获取[M].西安:西安交通大学出版社,1999.
  • 4CALINSKI R,HARABASZ J.A dendrite method for cluster analysis[J].Communications in Statistics,1974,3(1):1 -27.
  • 5DAVIES D L,BOULDIN D W.A cluster separation measure[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1979,1(2):224-227.
  • 6DUDOIT S,FRIDLYAND J.A prediction-based resampling method for estimating the number of clusters in a dataset[J].Genome Biology,2002,3(7):1-21.
  • 7DIMITRIADOU E,DOLNICAR S,WEINGESSEL A.An examination of indexes for determining the number of cluster in binary data sets[J].Psychometrika,2002,67(1):137-160.
  • 8KAPP A V,TIBSHIRANI R.Are clusters found in one dataset present in another dataset?[J].Biostatistics,2007,8(1):9-31.
  • 9ROUSSEEUW P J.Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J].Journal of Computational and Applied Mathematics,1987,20(1):53 -65.
  • 10DEMB(E)L(E) D,KASTNER P.Fuzzy C-means method for clustering microarray data[J].Bioinformatics,2003,19(8):973-980.

共引文献564

同被引文献150

引证文献10

二级引证文献97

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部