摘要
[目的/意义]面向专利文本进行更细粒度的技术实体识别和技术预测,利于更详细地把握专利技术布局与趋势。[方法/过程]首先利用深度学习方法自动识别专利技术术语类实体,通过实验对比多组深度学习算法的优劣。其次,提出新的半监督标注和自定义标注方案,提高人工标注效率。最后,执行训练得到的最优模型,结合链路预测方法,对合成生物技术进行细粒度的技术预测。[结果/结论]实证结果表明RoBERTa-BiLSTM-CRF模型更适用于语义复杂的专利技术实体识别,F1值可达到86.8%,技术识别结果比传统IPC分析方法更精细。同时,细粒度的技术预测结果表明,合成生物学的合成方法在不断改进创新,合成物研究向合成燃料发展。
[Purpose/Significance]It is beneficial to grasp the layout and trend of patent technology by identif-ying technical entities and predicting technology with finer granularity for patent texts.[Method/Process]The deep learning method was used to automatically identify patent technology terms entities,and the advantages and disadvan-tages of several groups of deep learning algorithms were compared by empirical analysis.At the same time,new semi-supervised labeling and self-defined labeling schemes were proposed to improve the efficiency of manual labeling.Fi-nally,the optimal model obtained by training was implemented,and the fine-grained technical prediction of synthetic biotechnology was made by combining the link prediction method.[Result/Conclusion]The empirical results show that RoBERTa-BiLSTM-CRF model is more suitable for the recognition of patent technical terms with complex seman-tics,and the F1 value reaches 86.8%.The technical recognition result is more detailed than the traditional IPC a-nalysis method.The fine-grained technical prediction shows that the synthetic methods of synthetic biology are con-stantly improving and innovating,and the synthetic research is developing towards synthetic fuels.
作者
胡雅敏
吴晓燕
廖兴滨
钱杨舸
陈方
Hu Yamin;Wu Xiaoyan;Liao Xingbin;Qian Yangge;Chen Fang(Chengdu Library and Information Center,Chinese Academy of Sciences,Chengdu 610299;Department of Library,Information and Archives Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190;Chengdu Information Technology of Chinese Academy of Sciences CO.,LTD,Chengdu,610299)
出处
《图书情报工作》
CSSCI
北大核心
2022年第24期92-103,共12页
Library and Information Service
基金
中国科学院成都文献情报中心2021年创新基金青年项目“基于知识基因的领域创新路径分析框架研究”(项目编号:E1Z0000202)研究成果之一。
关键词
技术术语识别
深度学习
技术预测
合成生物
technology terms recognitiond
deep learning
technology prediction
synthetic biology