摘要
汉语中的成语语言精练,典故源远流长,其趣味性是成语研究的一个重要方面.本文依据语音模板生成候选成语集合,从中提取语音、幽默、语义、情感和形态五大类11个特征,将这些特征融入到排序学习的相关算法中,从候选成语集合中检索趣味成语,进而构建趣味成语的生成模型.该模型将成语生成问题映射到信息检索领域,以查询及相关反馈的技术解决生成问题.经机器和人工的双重评估,实验结果表明五个维度的特征能够细致刻画趣味成语,生成质量较高,模型具有一定的实用价值.
Chinese idioms feature a concise form,most of which come from centuries-old allusions. Interestingness of idioms is an important aspect of idiom research. In this paper,we use phonetic templates to generate candidate sets of idioms,and extract eleven features of these idioms respectively from five dimensions,namely,phonetics,humor,semantics,sentiment and morphology. We integrate these features into the relevant algorithms of learning to rank so as to retrieve interesting idioms from the foregoing candidate set for an idiom generation model. The learned model transforms the problem of idiom generation into an information retrieval problem,and uses query and relevance feedback technology to solve the generation problem. Based on both manual and automatic evaluation,experimental results showthat interesting idioms can be precisely described and screened out based on these five-dimensional features. The idiom-generating model proves to have high quality and certain practical value.
作者
徐琳宏
林鸿飞
杨亮
徐博
XU Lin-hong;LIN Hong-fei;YANG Liang;XU Bo(Software School,Dalian University of Foreign Languages,Dalian 116044,China;Computer Department,Dalian University of Technology,Dalian,Dalian 116044,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第3期520-526,共7页
Journal of Chinese Computer Systems
基金
国家自然科学基金重点项目(61632011)资助
国家自然科学基金项目(61772103
61702080
61806038)资助
国家社会科学基金一般项目(15BYY028)资助
辽宁省自然基金项目(20170540230)资助
关键词
趣味成语
情感分析
排序学习
幽默
interesting idiom
sentiment analysis
learning to rank
Humor