摘要
针对科技资源中缩略语大量出现,但传统算法识别准确率不高,运行速度较慢问题,提出一种基于逆序扫描和共现分析相结合的术语缩略语快速提取算法。从科技资源中提取缩略语、候选全称及上下文信息;采用启发式模糊匹配算法,对缩略语及候选术语全称从右向左进行逆序扫描,在不要求缩略语中字母全部正确匹配的情况下,识别出规则的术语缩略语及其全称;最后对不规则候选缩略语及全称进行共现分析。同以往算法相比,该算法无论在时间复杂度,还是在准确率和召回率上都取得了明显进步。
Aiming at a large number of abbreviations was in the science and technology resources,but the traditional algorithm of recognition algorithms had problems,such as accuracy was not high,and the speed was slow.This paper proposed a fast dynamic synchronous scanning extraction algorithm about abbreviation.The algorithm firstly extracted candidate abbreviations of terms,full name of terms and the context information from science and technology resources.Then,the algorithm used the fuzzy matching algorithm to scan the candidate abbreviations and full name of term from right to left.It could identify acronyms name and full name of term,not all correct matching.Lastly,the algorithm uses cooccurrence analysis to extract abbreviation and full name of term.The experimental results show that the algorithm can improve the accuracy rate,recall rate and time complexity compared with the previous algorithms.
作者
王敬东
张智雄
Wang Jingdong;Zhang Zhixiong(Northeast Dianli University,Jilin Jilin 132012,China;China Science Library,Chinese Academy of Sciences,Beijing 100190,China;Wuhan Library,Chinese Academy of Sciences,Wuhan 430071,China)
出处
《计算机应用研究》
CSCD
北大核心
2018年第3期700-704,共5页
Application Research of Computers
基金
北京市自然科学基金资助项目(9174047)
国家科技图书文献中心(NSTL)软项目(科1421)
东北电力大学博士启动基金资助项目(BSJXM-2017213)
关键词
缩略语
逆序扫描
共现分析
abbreviation
reverse scanning
co-occurrence analysis