期刊文献+

基于机器学习的多标签盗窃犯罪类型识别方法研究 被引量:1

Research on the Recognition Methods for Multi-Label Theft Crime Type Based on Machine Learning
下载PDF
导出
摘要 基于文本挖掘技术实现犯罪类型的自动识别分类是公安机关警务数据治理的有效方法。以某市盗窃犯罪数据为研究对象,首先利用结巴分词工具对文本进行分词、去停用词、脱密,然后利用TF-IDF算法对样本数据进行特征提取,最后利用XGBoost、KNN、Na6ve Bayes、SVM、GBDT算法对样本数据进行训练和并测试分类性能,结果表明XGBoost算法的分类性能最优,准确率、召回率、F1-值分别达到92.3%、91.6%和91.9%,均高于GBDT、Na6ve Bayes、 KNN、SVM算法。在不影响刑事实务工作的前提下对犯罪类别微调以提高分类性能,各算法分类性能均有大幅提高,XGBoost分类性能提升较高(分别达到97.7%、97.1%和97.4%)。研究表明XGBoost算法分类性能最优,数据质量对分类器的分类性能的影响同样重要。经过分类的犯罪数据可作为各类型犯罪预测的高质量数据,且大幅提高了公安机关数据管理的效率和水平。 Text mining technology is an effective method to automatically identify and classify crime types for police data governance and management.The theft crime data of a city was taken as the research object.Firstly,‘Jieba’ chinese text segmentation was used to segment the text,remove the stop words and declassify the text.Secondly,TF-IDF algorithm was applied to extract the features of the sample data.Finally,XGBoost,KNN,Na6ve Bayes,SVM and GBDT algorithms were compared to train the sample data and test the effect.The results showed that XGBoost algorithm had the best classification performance and its accuracy,recall and F1 value were 92.3%,91.6% and 91.9%,which were better than GBDT,Na6ve Bayes,KNN and SVM algorithms.On the premise of not affecting police work,crime type was fine tunned to improve the classification accuracy.Moreover,the accuracy of all kinds of algorithms has been greatly improved,and the classification performance of XGBoost algorithm was the highest(97.7%,97.1% and 97.4%).Furthermore,XGBoost algorithm had the best classification performance,and the influence of data quality on classification accuracy was also important.The classified crime data could be used as high-quality data for various types of crime prediction,and the efficiency and accuracy of data management for public security organs could be greatly improved.
作者 张齐 李雪琛 ZHANG Qi;LI Xuechen(Beijing Police College,Beijing 102202,China;People's Public Security University of China,Beijing 102600,China)
出处 《中国人民公安大学学报(自然科学版)》 2023年第1期88-93,共6页 Journal of People’s Public Security University of China(Science and Technology)
基金 国家社会科学基金项目(21CSH005)。
关键词 犯罪类型 机器学习 TF-IDF XGBoost types of crime machine learning TF-IDF XGBoost
  • 相关文献

参考文献10

二级参考文献144

共引文献681

同被引文献9

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部