期刊文献+

基于社交网络数据的交通突发事件识别方法 被引量:5

A Method to Identify Traffic Incidents Based on Social Network Data
下载PDF
导出
摘要 为了从社交网络数据中挖掘出交通突发事件,研究了基于机器学习的文本识别方法。通过关键词和地点定位,利用网页爬虫“Beautiful Soup”爬取到原始文本。采用正则匹配、重复度计算以及“0-1”标记预处理原始文本。基于预处理后文本特征,研究了基于特征权重的特征词选取方法;其中,特征权重的计算综合了词语的出现频率和含有该词语的文本所占比例,通过将二者归一化并加权合并,获得训练集突发事件文本中各个无重复词语的特征权重;依据此值选择确定特征词,并用于后续分类器的输入。测试对比了不同的分类器以及特征词选择方法,结果表明,所提特征词选取方法与XGBoost分类器结合,在交通突发事件识别上具有最好的综合表现,精确率为0.6796,召回率为0.6481,F1值为0.6635,AUC值为0.7594。 A text classification method based on machine learning is studied to identify traffic incidents by mining the data from the social networks.The original texts are crawled by web crawler“Beautiful Soup”based on the keywords and location.These texts are preprocessed using regular expression matching,duplicate removing,and“0-1”mark⁃ing.According to the features of preprocessed texts,the paper proposes a method to select feature words based on fea⁃ture weights.The feature weight is calculated by normalizing,weighting,and combining the word frequency and the ratio of the text containing that word.Accordingly,the feature weight of each unique word in the training set of the traf⁃fic incident text can be achieved,used as a criterion for selecting feature words,and as the inputs of classifiers.A test is conducted to compare different classifiers and methods to select feature words.The results show that the proposed method to select feature words combined with the XGBoost classifier has the optimal performance,with a precision rate of 0.6796,a recall rate of 0.6481,an F1 value of 0.6635,and an AUC value of 0.7594.
作者 刘昭 何赏璐 刘英舜 LIU Zhao;HE Shanglu;LIU Yingshun(School of Automation,Nanjing University of Science and Technology,Nanjing 210094,China)
出处 《交通信息与安全》 CSCD 北大核心 2021年第2期53-60,共8页 Journal of Transport Information and Safety
基金 江苏省自然科学基金项目(BK20180486) 中国博士后科学基金项目(2018M642257) 中央高校基本科研业务费专项资金(30920021140)资助。
关键词 智能交通 社交网络数据 交通突发事件识别 文本分类 机器学习 intelligent transportation social network data traffic incident identification text classification machine learning
  • 相关文献

参考文献12

二级参考文献94

共引文献125

同被引文献113

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部