摘要
互联网已经成为人们发布、获取、共享信息的首选方法,大量多语言媒体信息蕴含着人们关注的热点话题及情感倾向。因此,多语言文本聚类研究对于了解民意倾向、引导舆论具有重要意义。文中提出融合时间影响因子的多语言文本复合聚类算法,用以研究互联网环境下,时间维度对聚类分析的影响。通过采集网络媒体英语、西班牙语、德语、法语新闻信息4000多条,实验证实,该算法取得了较好的聚类效果。
The Internet has become the preferred method for people to release, access and share information. Most muhilingual information contains hot topics and emotional tendencies concerned by the people. Therefore, multi- language text clustering research is important in understanding the tendency of the public and guiding the public opinion as well. This paper proposes the integration of time variable in the complex multi-language text clustering algorithm for better understanding the impact of time dimensions on the cluster analysis. The experiments by collecting more than 4000 pieces of English, Spanish, German, French news from authoritative online media confirm that the proposed clustering algorithm could achieve fairly good results significance.
出处
《信息安全与通信保密》
2014年第5期103-107,110,共6页
Information Security and Communications Privacy
基金
国家科技支撑计划资助项目(编号:2012BAH38B04)
西安交通大学机械制造系统工程国家重点实验室开放课题(编号:sklms2012005)
关键词
多语言文本
文本聚类
时间因子
复合聚类算法
multi-language text
text clustering
time variable
composite clustering algorithm