基于集成学习的中文文本欺骗检测研究被引量：7

Chinese Text Deception Detection Based on Ensemble Learning

下载PDF

导出

摘要欺骗信息检测是信息安全领域中的重要研究内容.现有的研究表明,三分之一的人际交往中会涉及到潜在的欺骗,大量的欺骗信息充斥在各种各样的通信媒介中,在海量的网络信息中欺骗性数据的规模通常远小于非欺骗性数据的规模,已有方法还不能很好地适应于准确高效地欺骗检测,迫切期望提出一种能高效地检测欺骗信息的方法.针对具有非平衡性的海量网络信息,提出了一种基于集成学习的欺骗行为检测方法.通过改进的二分k-means划分方法对训练样本集进行分解,分别在每对正负样本集上学习各自独立的分类器,然后利用每个独立分类器分别计算待测样本的类别输出值,并采用结合个体分类器分类正确率的最小最大模块化方法集成每个判别结果.实验结果验证了该方法的有效性. Deception detection is important in the field of information security. Existing researches show that one third of the interpersonal communication involves the potential deceptions, and there are large amounts of deceptive messages in the more and more Web information. If the deception is potentially dangerous to people＇s life, the survival of enterprise and the stability of the country, then the negligence of deception may lead to incalculable loss. In the massive amounts of information the scale of the non-deceptive texts is much larger than the scale of the deceptive texts, so people remain unsuccessful and inefficient in detecting those deceptive messages by the existing methods, and it is desirable to create an automated method which could help people flag the possible deceptive messages. In this paper, we built a deception detection model based on ensemble learning to solve the imbalance of the existing data sets. Firstly a novel bisecting k-means method is proposed to cut the training sample set, and the separate classifiers are trained by using each pair of positive and negative samples, and then each test sample category value is calculated by the classifiers, and finally a novel min-max modular approach is used to integrate each category result. Experimental results verify the effectiveness of this method.

作者张虎谭红叶钱宇华李茹陈千

机构地区山西大学计算机与信息技术学院

出处《计算机研究与发展》 EI CSCD 北大核心 2015年第5期1005-1013,共9页 Journal of Computer Research and Development

基金国家自然科学基金项目(61005053 61100138 61373082 61322211) 国家"八六三"高技术研究发展计划基金项目(2015AA015407) 新世纪优秀人才支持计划基金项目(20121401110013) 山西省回国留学人员科研资助项目(2013-022) 山西省高等学校科技创新项目(2015104) 中国民航大学信息安全评测中心开放课题基金项目(CAAC-ISECCA-201402)

关键词欺骗欺骗检测集成学习样本划分最小最大模块化支持向量机 deception deception detection ensemble learning cutting samples min-max modular support vector machine （M3-SVM）

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献31

1Buller D, Burgoon J. Strategic Interpersonal Communication [M]. Mahwah, NJ : Lawrence Erlbaum Associates Publishers, 1994:191-223.
2Daft R, LengeI R. Organizational information requirements, media richness, and structural design [J]. Management Science, 1986, 32(5): 554-570.
3Short J, Williams E, Christie B. The Social Psychology of Telecommunications [M]. New York: Wiley Publisher, 1976.
4Carlson J, Zmud R. Channel expansion theory and the experiential nature of media richness perceptions [J]. Academy of Management Journal, 1999, 42 (2) : 153-170.
5Buller D, Burgoon J. Interpersonal deception theory [J]. Communication Theory, 1996, 6 (3): 203-242.
6Blair J, Burgoon J, Strom 19. Heuristics and modalities in determining truth versus deception [C] //Proc of the 38th Annual Hawaii Int Conf on System Sciences. Los Alamitos, CA: IEEE Computer Society, 2005:19-25.
7George J, Marett K, Tilley P. Deception detection under varying electronic media and warning conditions [C] //Proc of the 37th Annual Hawaii Int Conf on System Sciences. Los Alamitos, CA: IEEE Computer Society, 2004:327-336.
8George J, Marett K. Inhibiting detection and its detection [C] //Proc of the 37th Annual Hawaii Int Conf on System Sciences. Los Alamitos, CA: IEEE Computer Society, 2004:337-346.
9Zhou L, Sung Y. Cues to deception in online Chinese groups [C]//Proc of the 41st Annual Hawaii Int Conf on System Sciences. Los Alamitos, CA: IEEE Computer Society, 2008:146-153.
10Qin T, Burgoon J, Nunamaker J, et al. An exploratory study on promising cues in deception detection and application of decision tree [C] //Proc of the 37th Annual Hawaii Int Conf on System Sciences. Los Alamitos, CA: IEEE Computer Society, 2004:357-366.

二级参考文献123

1王明春,王正欧,张楷,郝玺龙.一种基于CHI值特征选取的粗糙集文本分类规则抽取方法[J].计算机应用,2005,25(5):1026-1028. 被引量：8
2姜远,周志华.基于词频分类器集成的文本分类方法[J].计算机研究与发展,2006,43(10):1681-1687. 被引量：22
3王丽丽,苏德富.基于群体智能的选择性决策树分类器集成[J].计算机技术与发展,2006,16(12):55-57. 被引量：3
4ZHOU L,TWITCHELL D P,QIN T,Burgoon J. K,NUNAMAKER J F. An Exploratory Study into Deception Detection in Text-Based Computer Mediated Communication[C]//Proceedings of the 36th Annual Hawaii International Conference on System Sciences(HICSS'03),2003.
5ZHOU L,ZHANG D. Can Online Behavior Unveil Deceivers? [C]//Proceedings of the 37th Annual Hawaii International Conference on System Sciences(HICSS'04), 2004.
6CARLSON J R,ZMUD R W. Channel Expansion Theory and the Experiential Nature of Media Richness Persceptions[J]. Academy of Management Journal, 1999,42(2) : 153-170.
7BULLER D,BURGOON J. Interpersonal Deception Theory[J]. Communication Theory ,1996,6:203-242.
8BLAIR J,BURGOON J,STROM R. Heuristics and Modalities in Determining Truth Versus Deception[C]//Proceedings of the 38th Annual Hawaii International Conference on Ststem Sciences(Hicss'05),2005.
9GEORGE J F, MARETT K,Tilley P. Deception Detection Under Varying Electronic Media and Warning Conditions[C]// Proceedings of the 37th Annual Hawaii International Conference on System Sciences(Hicss'04).
10GEORGE J F,MARETT K. Inhibiting Detection and Its Detection[C]//Proceedings of the 37th Annual Hawaii International Conference on System Sciences(HICSS'04), 2004.

共引文献247

1王茂光,冀昊悦,王天明.一种基于层次聚类和模拟退火的选择性集成算法的风控模型研究[J].计算机科学,2022,49(S02):201-207. 被引量：1
2崔宇,侯慧娟,苏磊,钱涛,盛戈皞,江秀臣.考虑不平衡案例样本的电力变压器故障诊断方法[J].高电压技术,2020,46(1):33-41. 被引量：33
3徐艺萍,邓辉文,徐永刚.一种改进的模糊C—均值聚类算法[J].徐州工程学院学报,2008(4):34-36. 被引量：2
4姜桂艳,郭海锋,吴超腾.基于感应线圈数据的城市道路交通状态判别方法[J].吉林大学学报（工学版）,2008,38(S1):37-42. 被引量：29
5裴志永,李文彬.树木生长量远程遥测数据失真支路识别方法[J].农机化研究,2012,34(2):28-30. 被引量：2
6张强.论FCM在城市社会公共服务设施规划中的应用[J].求索,2014(8):107-111. 被引量：2
7张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量：176
8李翠霞,于剑.一种模糊聚类算法归类的研究[J].北京交通大学学报,2005,29(2):17-21. 被引量：12
9王璐,蔡自兴.改进的快速FCM算法[J].小型微型计算机系统,2005,26(10):1774-1777. 被引量：7
10于剑,李翠霞.Novel Cluster Validity Index for FCM Algorithm[J].Journal of Computer Science & Technology,2006,21(1):137-140. 被引量：6

同被引文献62

1高琰,谷士文,唐琎,蔡自兴.机器学习中谱聚类方法的研究[J].计算机科学,2007,34(2):201-203. 被引量：31
2孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量：1079
3雷小锋,谢昆青,林帆,夏征义.一种基于K-Means局部最优性的高效聚类算法[J].软件学报,2008,19(7):1683-1692. 被引量：114
4张亭玉,张雨青.说谎行为及其识别的心理学研究[J].心理科学进展,2008,16(4):651-660. 被引量：35
5毕华,梁洪力,王珏.重采样方法与机器学习[J].计算机学报,2009,32(5):862-877. 被引量：36
6薛贞霞,刘三阳,刘万里.改进的渐进直推式支持向量机算法[J].系统工程理论与实践,2009,29(5):142-148. 被引量：7
7彭新俊,王翼飞.双模糊渐进直推式支持向量机算法[J].模式识别与人工智能,2009,22(4):560-566. 被引量：9
8张俊,姜扬,王国良.情报分析人员的批判性思维研究[J].情报杂志,2010,29(1):54-58. 被引量：16
9王安娜,李云路,赵锋云,史成龙.一种新的半监督直推式支持向量机分类算法[J].仪器仪表学报,2011,32(7):1546-1550. 被引量：22
10周涛,陆惠玲.数据挖掘中聚类算法研究进展[J].计算机工程与应用,2012,48(12):100-111. 被引量：145

引证文献7

1李艳,沈卓,陈嘉钰.情报分析的基本问题及研究进展[J].情报学进展,2020(1):120-164. 被引量：4
2贾培灵,樊建聪,彭延军.一种基于簇边界的密度峰值点快速搜索聚类算法[J].南京大学学报（自然科学版）,2017,53(2):368-377. 被引量：6
3徐鸿雁,靳亮,林涛,彭舰.基于中文自由文本击键特征的自动欺骗检测模型[J].四川大学学报（自然科学版）,2017,54(3):487-492. 被引量：1
4杜红乐,张燕.基于Tri-training直推式支持向量机算法[J].河南科学,2017,35(7):1032-1036.
5刘方园,王水花,张煜东.支持向量机模型与应用综述[J].计算机系统应用,2018,27(4):1-9. 被引量：76
6郭颖婕,刘晓燕,吴辰熙,郭茂祖,李傲.基于U统计量和集成学习的基因互作检测方法[J].计算机研究与发展,2018,55(8):1683-1693. 被引量：1
7李蓟涛,梁永全.基于最小生成树的分割区域密度聚类算法[J].计算机辅助设计与图形学学报,2019,31(9):1628-1635. 被引量：6

二级引证文献94

1胡柳,张四平,肖瑶星,邓慈云,卢艳芝.网络化软件异常行为特征分析与识别研究[J].智能计算机与应用,2020(7):253-256. 被引量：1
2李涛,张艳珍,黎华,欧宗瑛.基于SQL SERVER的POS系统的开发与实现[J].计算机应用研究,2000,17(2):82-83. 被引量：2
3岳金柱,王德来.对易县“两山”划分和“四荒”拍卖的思考[J].河北林果研究,2000,15(1):20-23. 被引量：3
4周世波,徐维祥.密度峰值快速搜索与聚类算法及其在船舶位置数据分析中的应用[J].仪器仪表学报,2018,39(7):152-163. 被引量：10
5王若贤.基于向量机的学生评语自动生成研究[J].福建电脑,2018,34(10):129-131. 被引量：1
6钟超文,花君,严珂,陆慧娟,叶敏超.半监督支持向量机的空气处理机组夏季故障诊断[J].中国计量大学学报,2018,29(3):311-316. 被引量：1
7周世波,徐维祥.一种基于相对密度和决策图的聚类算法[J].控制与决策,2018,33(11):1921-1930. 被引量：8
8赵治羽,马磊,孙永奎.基于足底压力传感器的步态识别方法研究[J].电子测量技术,2019,42(13):26-31. 被引量：8
9张建鑫.基于聚类与句子加权的欺骗性评论检测[J].软件导刊,2019,18(2):34-37. 被引量：1
10兰欣,卫荣,蔡宏伟,郭佑民,侯梦薇,邢磊,那天,陆亮.机器学习算法在医疗领域中的应用[J].医疗卫生装备,2019,40(3):93-97. 被引量：64

1石祥滨,宋立强,刘芳.P2PMMOG中基于状态差的欺骗检测机制[J].计算机工程,2007,33(19):264-266.
2张静.ARP欺骗检测方法的研究[J].黑龙江科技信息,2009(31):83-83.
3赵研,李云.基于MapReduce的并行化最小最大模块化支持向量机研究[J].计算机研究与发展,2014,51(S2):110-115. 被引量：1
4余艺,吴家皋,李云.基于随机子空间的最小最大模块化支持向量机[J].模式识别与人工智能,2014,27(2):153-159.
5叶芗芸,戚飞虎,朱国霞.一种多级分类器集成的字符识别方法[J].电子学报,1998,26(11):15-19. 被引量：2
6常小刚,李强,武凌.基于PKI的认证欺骗检测及日志记录系统研究[J].信息通信,2016,29(8):103-105.
7杨冬,吕明娥.FNN在欺骗检测中的应用研究[J].计算机安全,2008(6):59-62. 被引量：1
8林晓帆,丁晓青,吴佑寿.独立分类器集成理论及其在字符识别中的应用[J].模式识别与人工智能,1998,11(4):403-411. 被引量：4
9汪伟.PLM:从概念走向应用[J].CAD/CAM与制造业信息化,2009(1):46-47.
10丁文博.网络信息处理与安全方面的计算机应用[J].信息与电脑（理论版）,2015(9):87-88. 被引量：4

计算机研究与发展

2015年第5期

浏览历史

内容加载中请稍等...

基于集成学习的中文文本欺骗检测研究被引量：7

参考文献31

二级参考文献123

共引文献247

同被引文献62

引证文献7

二级引证文献94

相关作者

相关机构

相关主题

浏览历史

基于集成学习的中文文本欺骗检测研究 被引量：7

参考文献31

二级参考文献123

共引文献247

同被引文献62

引证文献7

二级引证文献94

相关作者

相关机构

相关主题

浏览历史

基于集成学习的中文文本欺骗检测研究被引量：7