期刊文献+

自杀意念原因抽取中的标签增强方法 被引量:2

Label Augmentation in Suicidal Ideation Cause Extraction
下载PDF
导出
摘要 自杀已成为全球重大的公共卫生和社会关注问题,自动在有自杀意念的社交文本中抽取出该意念产生的原因,可以为自杀预防提供支持.在自杀意念原因抽取任务中,由于人工标注的主观因素造成标注边界模糊或存在误差,同时人工标注成本大也导致训练样本量较小.本文针对这些问题探索和使用数据增强的方法,提出基于标签窗口缩放的标签增强方法LWS,LWS通过设计标签窗口缩放概率、缩放尺度、标签增强率等参数及其应遵循的原则,较好地解决了原训练集中人工标注较短和存在误差的问题,F1值比原训练集上的Char-BiLSTM-CRF模型平均提高了1.6%.实现了基于同义词替换SR、随机插入RI、随机交换RS和随机删除RD的EDA数据增强方法.实验结果表明,在基于EDA的数据增强中,单独和综合运用SR、RD都取得较好的效果,F1值比原训练集上的Char-BiLSTM-CRF模型平均提高了1.1%~1.6%.此外,当数据改变较少时,即增强率或改变率较小时,模型提升效果较明显,而过度增强反而会降低模型的性能. Suicide has become a global major public health and social concern.Automatically extract the causes for suicidal ideation from social texts can provide support for suicide prevention.In the task of suicidal ideation causes extraction,the subjective factors of manual labeling led to ambiguous labeling boundaries or errors,and the high cost of manual labeling led to small training sample size.This article explores and uses data enhancement approaches to address these issues.Specifically,there were two data augmentation algorithms,one was the label enhancement method based on the label window scaling named LWS;The second was the data enhancement method EDA,which is based on synonym replacement(SR),random insertion(RI),random swap(RS)and random deletion(RD).The experimental results show that in LWS,most of the problems of short manual labeling and error in the original training set can be solved by designing the parameters such as the scaling probability,scaling scale and label enhancement rate of the label window and the designed principles to be followed.It is about 1.6%higher than the original training set on Char-BiLSTM-CRF model when the comprehensive average F1 value reaches the local optimum.In EDA-based data augmentation,SR and RD are used alone and comprehensively to achieve better results,and the F1 value is improved by 1.1%~1.6%on average compared with the original training set on Char-BiLSTM-CRF model.In addition,the analysis also finds that when there is less data change,that is,when the augmentation rate or change rate is small,the model performance improvement is more significant,while excessive enhancement will reduce the model performance.
作者 付淇 刘德喜 邱祥庆 赵凤园 FU Qi;LIU De-xi;QIU Xiang-qing;ZHAO Feng-yuan(School of Information Management,Jiangxi University of Finance and Economics,Nanchang 330013,China;School of Communication and Electronics,Jiangxi Science and Technology Normal University,Nanchang 330013,China;Jiangxi Key Laboratory of Data and Knowledge Engineering,Jiangxi University of Finance and Economics,Nanchang 330013,China;School of Electronic Information Science,Fujian Jiangxi University,Fuzhou 350108,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2022年第6期1254-1264,共11页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61762042,61972184,62076112)资助 江西省主要学科学术和技术带头人培养计划领军人才项目(2021BCJL22041)资助 江西省自然科学基金项目(20212ACB202002)资助.
关键词 自杀意念原因 标签增强 EDA 数据增强 suicidal ideation cause label augmentation EDA data augmentation
  • 相关文献

参考文献2

二级参考文献21

  • 1施承孙,董燕,侯玉波,侯桂芝,周晓梅.应付方式量表的初步编制[J].心理学报,2002,34(4):414-420. 被引量:56
  • 2库少雄.心理分析与自杀研究[J].中南民族大学学报(人文社会科学版),2005,25(6):92-99. 被引量:5
  • 3董会芹,纪林芹.罪犯自杀的危险因素与自杀机制[J].山东师范大学学报(人文社会科学版),2006,51(2):130-133. 被引量:12
  • 4[1]Taylor,S.Durkheimand the study of suicide.NewYork:St.Martin'sPress.1982.
  • 5[3]ShneidmanESDefinitionofsuicideMNewYorkWiley1985.
  • 6[4]Stillion,J.M.,&McDowell,E.E.(1996).Suicide across the life span.Washington,D.C.:Taylor&Francis.
  • 7[7]OrbachI Children who don't want to live:Understandeing and treating the suicidal child San Francisco Jossey-Bass.1998.
  • 8[10]Taylor,S.Durkheimand the study of suicide.NewYork:St.Martin'sPress.1982.
  • 9[11]Cernat,V.Acoherenceoptimizationmodelofsuicide.nts.org/1081/00/coherencesuicide.htm,2000.
  • 10[12]Asberg,M.,Traskman,L.,&Thoren,P.5-HIAA in the cerebrospinal fluid:Abiochemical suicide predictor.Archive of General Psychiatry,1976:33,1193-1197.

共引文献48

同被引文献10

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部