期刊文献+

开放领域新闻中基于自适应决策边界的突发事件识别和分类研究 被引量:2

Research on the Recognition and Classification of Emergency Events Based on Adaptive Decision Boundaries in Open Domain News
下载PDF
导出
摘要 [目的/意义]网络新闻是获取突发事件情报的重要来源之一,提高海量网络新闻中突发事件的识别准确率和分类效果,并减少非突发事件新闻造成的开放集识别问题和降低人工标注非突发事件新闻的成本,这是当前突发事件识别与分类研究的重要课题。[方法/过程]选择BERT预训练模型获得文本的特征表示,融合不同层级之间的语义信息增强文本表示的质量,采用自适应决策边界模型,学习各突发事件类别在高维语义表示空间上的球形最佳决策边界,根据新闻样本的文本表示和各突发事件类别的球形最佳决策边界的欧几里得距离,检测出突发事件新闻并判断突发事件的类别,并在CEC公开数据集和实时爬取的中文新闻数据集CEN上对模型的有效性进行验证。[结果/结论]实验结果表明,本文模型在CEC数据集和CEN数据集上的宏F1值分别为98.46%和95.80%,与基准模型相比,本文模型的宏F1值分别提升了5.15%和19.69%。模型应用展示了提出方法在解决实际问题时的有效性。[局限]未考虑突发事件新闻可能存在多标签的情况。 [Purpose/significance]Online news is one of the important sources to obtain emergency news Intelligently.Research on emergency news recognition and classification is focused on increasing the accuracy of recognition and classification,reducing the open set recognition interference arising from non-emergency news,and reducing the cost of labeling non-emergency news manually.[Method/process]The BERT pre-training model is selected to obtain the feature representation of the text,and the quality of the text representation is enhanced by fusing the semantic information between different levels.On this basis,the adaptive decision boundary model is proposed to learn the spherical best decision boundary of each breaking news category on the high-dimensional semantic representation space,based on the Euclidean distance between the text representation of the news samples and the spherical best decision boundary of each breaking news category.Then,detecting emergency news and determining their category,and validating the effectiveness of the model on the dataset CEC and CEN.[Result/conclusion]The experimental results show that the Macro-F1 values of this model are 98.46%and 95.80%on the CEC and CEN dataset respectively,and the Macro-F1 values of this model are 5.15%and 20.36%enhanced respectively compared with the benchmark model.The application of the model demonstrates the effectiveness of the proposed method.[Limitations]The possible existence of multiple labels for breaking news was not considered.
出处 《情报理论与实践》 北大核心 2023年第2期194-200,共7页 Information Studies:Theory & Application
基金 国家社会科学基金西部项目“情报流程重构视角下的应急过程多目标优化研究”的成果,项目编号:19XTQ010。
关键词 突发事件 自适应决策边界 开放集识别 文本分类 emergency events adaptive decision boundary open set recognition text classification
  • 相关文献

参考文献9

二级参考文献156

共引文献106

同被引文献25

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部