期刊文献+

基于群体智能的跨语言网络舆情文本聚类模型 被引量:4

Multi-Language Text Clustering Model for Internet Public Opinion Based on Swarm Intelligence
下载PDF
导出
摘要 跨语言的互联网文本信息在中国多个民族构成中非常普遍,但当前文本聚类模型主要针对单一语言,跨语言文本挖掘的研究较少。群体智能算法具有自组织、启发式、自适应和鲁棒性的特点,提出一种基于群体智能的跨语言网络舆情文本聚类模型SI-Cluster(swarm-intelligence-based text clustering model),应用3种优化策略。梯度下降法弱化智能体拾取文本的能力,避免陷入局部最优解,添加信息素引导智能体移动并有效避免信息素挥发过快的问题,智能体从当前位置选择下一位置考虑信息素感应浓度和方向权重因子。在中文、英文和藏文文本数据集上进行实验,从聚类准确性上看应用优化策略的SI*-Cluster算法的F-measure值达到0. 862,相比于k-means算法提高44. 1%;从收敛性上看SI*-Cluster算法在聚类效果明显的前提下迭代500次收敛,相比SI-Cluster算法900次收敛,具有更快的收敛速度。模拟展示了SICluster和SI*-Cluster算法进行文本聚类的迭代过程,证明所提优化策略的有效性。 Multi-language text from the Internet is ubiquitous in China which is a very huge country composed of many nationalities. Existing text clustering models is mainly applied for one single language,and there are few studies on multi-language text mining. Swarm intelligence algorithms have the characteristics of self-organizing,heuristic,adaptive and robust. A multi-language text clustering model for Internet public opinion based on swarm intelligence is proposed,which is called SI-Cluster( swarm-intelligence-based text clustering model). Three optimization strategies are applied: a gradient descent method is applied to degrade agents’ capability of picking up texts in order to avoid falling into the local optimal solution;the pheromone is used to guide agents to move,which can effectively avoid the problem of excessive volatilization of pheromones;the agent selects the next position from the current position by taking into consideration the pheromone concentration of sensing and the weight factor of directions. Experiments were conducted on Chinese,English and Tibetan text datasets. In terms of clustering accuracy,the F-measure of the improved SI*-Cluster algorithm with optimization strategies can reach to 0. 862,which is 44. 1% higher than that of the k-means algorithm. In terms of convergence,SI*-Cluster can converge after 500 times of iterations with obviously good clustering results,which is faster than that of the SI-Cluster algorithm converging after900 times of iterations. Simulation shows the iterative process of SI-Cluster and SI*-Cluster for text clustering,and the results prove the effectiveness of the proposed optimization strategies.
作者 韩楠 乔少杰 黄萍 彭京 周凯 HAN Nan;QIAO Shaojie;HUANG Ping;PENG Jing;ZHOU Kai(School of Management,Chengdu University of Information Technology, Chengdu 610225 , China;School of Software Engineering,Chengdu University of Information Technology, Chengdu 610225 , China;Sichuan Provincial Department of Public Security, Chengdu 610014, China)
出处 《重庆理工大学学报(自然科学)》 CAS 北大核心 2019年第9期99-108,共10页 Journal of Chongqing University of Technology:Natural Science
基金 国家自然科学基金资助项目(61802035,61772091,61962006) 四川省科技计划项目(2019YFG0106,2018JY0448,2019YFS0067) 四川高校科研创新团队建设计划(18TD0027) 成都市软科学研究项目(2017-RK00-00053-ZF) 广西自然科学基金项目(2018GXNSFDA138005) 成都信息工程大学中青年学术带头人科研基金项目(J201701) 成都信息工程大学科研基金项目(KYTZ201715,KYTZ201750)
关键词 群体智能 跨语言 文本聚类 网络舆情 swarm intelligence multi-language text clustering Internet public opinion optimization
  • 相关文献

参考文献3

二级参考文献33

共引文献27

同被引文献28

引证文献4

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部