英俄语虚假新闻共性计量特征挖掘与跨语言聚类研究

Mining Common Quantitative Features and Cross-Linguistic Clustering of English and Russian Fake News

导出

摘要【目的】挖掘不同语言虚假新闻的共性特征,为跨语言虚假新闻检测提供参考。【方法】以英语和俄语为例建立数据集,挖掘不同语言虚假新闻在词、句、可读性和情感层面的共性计量特征,将其用于主成分分析、K-means聚类、层次聚类和二阶聚类实验。【结果】34个共性计量特征用于真假新闻跨语言聚类效果良好,提出的19个新计量特征发挥了更大作用;发现虚假新闻有语言简化和经济化的趋势,倾向于使用短句和简单搭配传达信息,文本更易理解且包含负面表达更少。【局限】由于当前数据集限制,未能找到同一主题的真假新闻样本进行平行测试。【结论】不同语言的虚假新闻的确存在同语种无关的共性特征可用于自动聚类,为跨语言虚假新闻检测和甄别研究提供了借鉴。 [Objective]This study examines the common features of fake news in different languages to provide a reference for cross-language fake news detection.[Methods]Using English and Russian as examples,we established datasets to extract common quantitative features of fake news across different languages at word,sentence,readability,and sentiment levels.Then,we used these features in principal component analysis,K-means clustering,hierarchical clustering,and second-order clustering experiments.[Results]The 34 common quantitative features demonstrated good performance in cross-language clustering of real and fake news.The proposed 19 quantitative features played a more significant role.The study found a tendency for fake news to exhibit language simplification and economization.It favors short sentences and simple collocations to convey information,making the text easier to understand and containing fewer negative expressions.[Limitations]The current dataset's limitations made parallel testing with true and false news on the same topic impossible.[Conclusions]Fake news in different languages shares common language-independent features to be used for automatic clustering,providing insights for cross-language fake news detection research.

作者原伟刘海涛 Yuan Wei;Liu Haitao(School of Foreign Languages,National University of Defense Technology,Nanjing 210039,China;School of International Studies,Zhejiang University,Hangzhou 310058,China)

机构地区国防科技大学外国语学院浙江大学外国语学院

出处《数据分析与知识发现》 EI CSCD 北大核心 2024年第7期1-13,共13页 Data Analysis and Knowledge Discovery

基金国家社会科学基金重大项目(项目编号:20&ZD140,20AZD130) 河南省哲学社会科学规划项目(项目编号:2021BYY024)的研究成果之一。

关键词虚假新闻计量分析聚类 Fake News Quantitative Analysis Clustering

分类号 TP393 [自动化与计算机技术—计算机应用技术] G250 [文化科学—图书馆学]

引文网络
相关文献

1孙可心.媒体融合背景下新闻编辑工作的转型和坚守探究[J].西部广播电视,2024,45(2):179-182. 被引量：1
2卫东.网络语言对汉语言文学发展产生的影响分析[J].辽宁青年,2023(11):0139-0141.
3李雪晴.智能事实核查技术在新闻业中的应用与局限——以甄别假新闻为例[J].北方传媒研究,2023(6):89-91.
4周棻.媒体融合背景下新闻编辑工作的转型和坚守探究[J].传播力研究,2024,8(24):118-120.
5王鼎钧.初见是一场甜蜜终身的痛苦[J].青年文摘,2023(3):20-20.
6郑晓峰.广播电台播音主持语言创新路径研究[J].新闻文化建设,2023(24):166-168. 被引量：3
7原伟,罗卫萍.基于语料库的俄语虚假新闻词特征分析与自动聚类研究[J].外语导刊,2024,47(3):82-91.
8李霞.高中英语大单元整体教学设计研究——以“Art”单元教学为例[J].课堂内外（高中版）,2024(33):48-50.
9郭静,潘丽霞.网络媒介中女运动员形象的影响因素分析[J].中国科技经济新闻数据库教育,2016(8):252-253.
10沈涵雅,常震宇,崔亚利.激光衍射法测定烯烃聚合催化剂粒度分散体系探究[J].工业催化,2024,32(5):80-84.

数据分析与知识发现

2024年第7期

浏览历史

内容加载中请稍等...

英俄语虚假新闻共性计量特征挖掘与跨语言聚类研究

相关作者

相关机构

相关主题

浏览历史