摘要
近年来互联网的快速发展使我国网民人数急剧增加,部分网民通过微博、论坛、贴吧、bbs等应用程序使用文字、音频、视频、图片等发布自杀倾向信息。本文提出一种基于网民发言信息的自杀倾向行为检测技术,利用自杀倾向用户在网络上发布的文本信息,使用TF-IDF算法分析出自杀倾向的关键词,通过这些关键词在网络上爬取命中关键词的文本,利用机器学习算法将这些文本信息中具有自杀倾向的用户检测出来。分别对比SVM、KNN、NB、LR、DT、RF模型效果,实验结果表明RF模型效果最好精确率达85.1%。研究发现具有自杀倾向的用户敏感发言时间等信息对于检测自杀倾向也有很好的效果,通过加入这些指标,再次进行实验对比发现,自杀倾向检测的精确率达到了86.8%。
with the rapid development of the Internet in recent years,the number of netizens in China has increased dramatically.Some netizens use text,audio,video and pictures to send messages of suicide intention through microblog,forum,post bar,BBS and other applications.This paper proposes a suicidal behavior detection based on netizens speech information technology,the use of suicide user text information released on the Internet,use the TF-IDF algorithm analysis suicidal keywords,through these keywords crawl in the network of the key words of the text,using machine learning algorithms to the text information is suicidal detected by the user.Comparing the model effects of SVM,KNN,NB,LR,DT and RF,the experimental results show that the RF model has the best accuracy rate of 85.1%.The study found that users with suicidal tendencies,such as sensitive speech time,also had a good effect on detecting suicidal tendencies.By adding these indicators and comparing experiments again,it was found that the accurate rate of detecting suicidal tendencies reached 86.8%.
作者
王宗杰
彭艳兵
姚方来
WANG Zong-jie;PENG Yan-bing;YAO Fang-lai(Wuhan Institute of Posts and Telecommunications,Wuhan 430000,China;Nanjing Fenghuo World Communication Technology Co.,Ltd.,Nanjing 210000,China)
出处
《电子设计工程》
2020年第18期30-33,共4页
Electronic Design Engineering