摘要
分析表明,医学中应用大量使用自然文本语言,为方便计算机分析,往往需要对其中的关键信息进去抽取,并将其表达文字规范化,从而使文本匹配成为广泛关注的技术问题。基于深度学习技术,阐述embedding表达的方案,迭代演进的数据扩展等,提出一种文本匹配的模型框架和数据扩展方法,通过多次少量的增量标注大幅度扩展可靠标签数据,使得模型的准确率大幅提升,从而满足应用要求。
The analysis shows that Natural Language Processing is widely used in Medical applications,which makes it important to transfer words into standard expression format.This paper introduces a new framework for text matching,which includes aRoBERTa based representation network,an evolutionary data expansion process and a fast searching engine.The results of experiment shows that the whole solution can improve matching accuracy greatly with expanding high-quality annotation datasets simultaneously.
作者
徐盛
戴佳骏
李昱辰
XU Sheng;DAI Jiajun;LI Yuchen(Yijian(Shanghai)Information Technology Co.,Ltd.,Shanghai 200050,China)
出处
《集成电路应用》
2022年第5期28-31,共4页
Application of IC
关键词
智能算法
中文数据
文本匹配训练
标签数据
intelligent algorithm
Chinese data
text matching training
label data